What is a data virtualization platform?

DataVera EKG Provider is a data virtualization system designed to handle large volumes of information represented as Enterprise Knowledge Graph according to an ontology model. See the User Guide and OpenAPI (Swagger) documentation.

Ontological models allow to efficiently process arrays of complexly structured information containing thousands of types of entities and properties. For storing ontological models there is a special class of graph DBMSs - RDF triple stores. Such DBMSs support the SPARQL query language, and logical output machines can be connected to them to execute rules (corresponding, for example, to the SHACL specification). They work well with data of complex structure, but poorly with data of large volume.

This is what drives the need for a data virtualization platform. Virtualization allows data to be physically stored in multiple relational or document-oriented DBMSs, but manipulated as if it were in a single graph. To work with ontologies in an industrial ontology framework, the ontology framework must provide the following core capabilities:

  • Support SPARQL query language and/or other types of APIs for reading and writing data
  • Support the execution of SHACL logic output rules - for quality control and data transformation (format-logic control, cleaning, normalization)
  • Consolidate data from different sources
  • Provide a model and data editor for model construction by the analytical ontologist
  • Have a search tool, import/export data for the analyst

The DataVera EKG Platform can be used as a MDM system. It provides all the necessary functions to extract data from source systems, cleanse, validate, deduplicate, generate reference records and distribute them to business applications and analytic storefronts. See a detailed description of MDM features and the scope of our MDM implementation services.

A detailed description of the platform implementation process is given in our EKG Platform Implementation Guide.

DataVera EKG Platform functionality

DataVera EKG Provider provides the following ways to receive and process requests:

  • REST interface
  • As JSON via Kafka topics and RabbitMQ queues
  • SPARQL queries (with some limitations)

The REST API provides the following basic functions:

  • Retrieving an object by identifier
  • Obtaining a group of objects by a set of conditions (filters)
  • Obtaining a data model (TBox)
  • Creating/editing/deleting an object
  • Mass loading and unloading of objects
  • Object validation, application of format-logical control rules to it
  • Data normalization and enrichment with customizable cleaning and logical output rules
  • Obtaining quality metrics for datasets. Obtaining the list of objects violating the quality check rules
  • Using calculated expressions and aggregating functions in queries
  • Setting or unsubscribing to receive information about changes in data objects of certain classes

One of the most important features of DataVera EKG Provider is support for data temporality. Temporality features allow you to work with datasets by state at any point in the past or future, rather than just a single, current state of the data. Temporality functions:

  • Obtaining the object as of any moment
  • Retrieving the entire history of object changes
  • Query for selection of objects with application of filter conditions as of the specified time

DataVera EKG Provider as a data virtualization platform provides:

  • All the main advantages of ontologies (multiple classification, multiple meanings, multiple inheritance)
  • Multiple languages of string values (any languages)
  • Control of access rights at the class level
  • Fulfillment of logical rules

DataVera EKG Provider has the following infrastructure capabilities:

  • Deployment and scaling in Kubernetes
  • Internal multi-threadingь
  • Logging in ELK
  • Metrics in Prometheus
  • Swagger API documentation