DTaaS Overview

What is a Data Trust

DTaaS is a platform for building Data Trusts, and a Data Trust contains Data Assets and Services and Applications built to consume and produce those Data Assets. A Data Asset in DTaaS is any arbitrary data that is exposed to some set of consumers with some API. This means that Data Assets are really code that knows how to store/access data, and that data may live somewhere else.

The important distinction between a Data Trust and any other distributed system is the granular visibility and control of data assets and derived data assets that is independent of the Application that generated them. Because DTaaS is a platform for building Data Trusts, it does not define what Data Assets are, but instead provides the architecture necessary to define what they are and how they can be used. This allows Applications to be written for DTaaS that expose arbitrary functionality to users, but still be able to grant users visibility into their data assets independently of the Application that created and manages them. DTaaS decouples the problem of cataloging, managing, and sharing data assets from the problem of generating, viewing, and interacting with them. The former set of problems are managed by DTaaS and the later are left up to Applications to implement.

Object Model

DTaaS has a graph-based object model that can be used to model arbitrary application domains and apply access control to the objects in that domain. This abstract model is reified into a few pre-defined object types that we provide an SDK for that allows for further domain-specific implementation on top of those concepts.

The basic building blocks revolve around Kubernetes concepts.

DTaaS Architecture

Prototypes are Docker Images that are pushed into a registry provided by DTaaS, and then registered with DTaaS. Once a Prototype exists, Servables can be created from that Prototype. A Servable is an instantiation of the Prototype, usually with some additional context that the Prototype needs in order to run. For example, a ML Model Prototype might contain an implementation of an algorithm, plus an HTTP interface that allows for the input of an example, and outputs a predicted value, but is probably missing the model weights. A Servable of that same Model would contain training weights as the context. Once a Servable exists, it can run, which creates an Instance of that Servable, and an actual Pod in Kubernetes, along with a Service to expose it.

Servables can be used to model data assets and computation. Data assets are Services that expose an HTTP interface and some API to access the resource, and computation is a Pod that runs some task to completion and then exits.

Authentication and Authorization happens at the HTTP layer in communication with and between Servables. OAuth2 is used to obtain authorization to resources, and OpenID Connect is used to authenticate that access. DTaaS provides an implementation of these protocols and can federate across DTaaS nodes.

Access Control

Access control is represented as abstract Capabilities on objects that are enforced at the HTTP layer. Whenever an HTTP request is made to a DTaaS API or to a Servable, we intercept the request and evaluate whether or not the request is allowed based on the Capabilities that the entity making the request has.

Granular Capabilities are available for DTaaS APIs, and for Servables there is a generic Request Capability that determines whether or not an entity is allowed to make HTTP requests to a Servable or not. Granular access control for Servables is not something we provide, but can be implemented within a Servable manually. DTaaS provides an OpenID Connect compliant Identity Provider that can be used by Servables to authenticate the Identity behind an HTTP request, and from there arbitrary access control semantics can be implemented.

Users can delegate Capabilities in a revocable manner to other Users using a Trust.

Federation

The core utility of a Data Trust relies on the ability to securely Federate access to data assets across multiple nodes in a network. These nodes are co-located with data that must stay in place, with Servables working to offer access controlled, proxied access to that data to other Servables on the network. This can be accomplished either by using a Servable to expose the data via some API, or by pushing computation to the data and creating a derived Servable from the data.

DTaaS Domain Architecture

Each DTaaS node is a gateway into a Data Trust Domain (dtDomain). Users of a particular node can have Capabilities on objects that exist on other nodes, and users can grant Capabilities to other users that exist on other nodes. Objects that exist on other DTaaS nodes are available via local APIs and proxies, and interaction with those objects works the same way regardless of which node actually runs the object.

Any DTaaS node can Federate with other nodes to create a dtDomain, but a node can only be a member of one dtDomain at a time.