What is Data Catalog?

Data Catalog - tool for describing the metadata structure, which allows you to quickly find the necessary data storages. In which tables of different databases is a customer's e-mail stored? How are the tables of contracts and clients linked? Who is the owner of a certain table? Data Catalog class solutions answer these and other questions.

OpenMetadata and Amundsen.io - some of the most mature freely available tools of this class. They automatically extract metadata from various DBMSs and other sources of structured information, store it in a graph DBMS, and make it available for searching and browsing. Building a Data Catalog seems like an uncomplicated task. But what is the next step?

Data consolidation

Business users and data analysts not only need to search and browse metadata, but also extract, cleanse, validate, consolidate information from sources, and create new data sets for processing with ML tools and use in business processes. The DataVera EKG Platform provides all of these functions, allowing you to extract and consolidate data from multiple sources into a single reference set.

Our product integrates with OpenMetadata and Amundsen.io to automate the creation of mapping rules between source data structure elements and the reference information model. This simplifies the analyst's job of creating rules.

How do Amundsen.io and EKG Platform work together?

Here is an example. Let it be necessary to merge information about the company's customers contained in the databases of two business applications.

The first step in the process is to create a data catalog using OpenMetadata or Amundsen.io. The analyst annotates the data structure automatically extracted from the sources. It is now easy to identify which tables and columns contain which customer information.

The next step is to import the structure of the selected tables into DataVera EKG Platform as a template for mapping rules. Then you need to create the structure of the reference customer data model in EKG Platform and fill in the missing elements in the rule templates - specify which properties of the reference model correspond to which columns of the data source. After that, you can load the data into EKG and continue configuring the normalization, validation and consolidation rules.

The combined use of OpenMetadata / Amundsen.io and the DataVera EKG Platform allows you to go much further than just creating a catalog of data. Structured information can be extracted from sources, transformed and monetized. Data Catalog tools accelerate the analyst's work and simplify the creation of rules for extracting data from business applications into a data governance platform.