K3 is a prudential norm established by the National Bank of the Republic of Kazakhstan, which determines the maximum risk per borrower. The methodology for this parameter calculation is determined by the Resolution of the Board of the National Bank No. 170 dated September 13, 2017. A group of borrowers related to each other according to several criteria is considered as one borrower when calculating this norm. The bank needs to identify groups of related borrowers among its customers to determine the K3 value.

The Resolution sets several criteria by which borrowers are recognized as affiliated. Affiliation may arise because of the relationship of major participants or heads of legal entities, due to the existence of a trust management agreement between borrowers, the provision of guarantees by one borrower for the obligations of another, etc. The chains of connections that result in the formation of groups of affiliated borrowers can be quite long. Banks are faced with the difficult task of implementing an algorithm for automatically grouping borrowers, which will revise the composition of the groups when new information appears.

Let's consider a simple example. Let there be two groups of borrowers connected as shown in the figure below. Then bank get a new information that Jane Doe is now a shareholder of "Minimarkets LLC". As a result of the emergence of such a relationship, both groups of borrowers should be combined into one.

Corporate knowledge graph example

Such a problem suits ideally for solving it using graph-based inference rules. It is convenient to represent the input information in the form of a graph: borrowers, trust agreements, guaranties and other objects can be represented as graph vertices, and relations between them can be represented as edges. Types of vertices and edges form a domain "dictionary", which can be represented as a machine-readable ontological model. Then it is possible to formulate the rules in terms of this ontology, according to which the links between the elements of the model will be established.

Let's create a simple ontological model containing "Person" and "Organization" classes (entity types), as well as properties which represent relations of these classes' objects: "is a founder", "is a relative", "is a CEO" etc. In terms of such a model, it is easy to formulate rules, for example:

IF A - Organization, B - Person, B is a founder of A, THEN A is affiliated with B, B is affiliated with A

The "is affiliated" property is transitive. The inference engine will conclude that if A is affiliated with B, B is affiliated with C, then A is affiliated with C. This is enough to start building long chains of connections.

For example, if there are organizations A and C, whose founder is the same person B, then due to the above rule and the transitivity of the "is affiliated" property, it will automatically be inferred that organizations A and C are affiliated.

By adding other rules formulated in the Resolution of the Board of the National Bank, we get a model in which the affiliation relations between group members will be calculated automatically. Each member of such a group will be associated with the "is affiliated" property with all other members of the group.

How to implement the groups of borrowers calculation in the described way at the program level? A software system shall be able to process using the inference rules a large amount of frequently changing data, presented according to the ontological model. An implementation on a graph without a conceptual scheme and rules, expressing logic in the Gremlin language or similar, loses in the ease of managing logic when business requirements change and the ability to control data integrity. Traditional graph DBMSs designed for ontologies processing, RDF triple stores, are not optimal for data processing in the mode required by the conditions of the task. Our solution, DataVera EKG Platform, allows you to bypass the limitations caused by the volume and frequency of data updates, and also provides functionality that is not available in most of the graph DBMS implementations.

The platform consists of two components. The data virtualization component, DataVera EKG Provider, performs data processing. It provides a programming interface (API) using which you can work with data structured according to the ontological model. Most of the data is physically stored in a relational DBMS. DataVera EKG Provider offers:

  • access rights control,
  • keeping the entire history of changes to each object,
  • execution of the constraints and inference rules expressed according to the SHACL specification.

You can interact with data using SPARQL query language, which is a part of the technology stack for ontological models processing, using the REST API, or the subscription mechanism via Kafka. Another platform component, DataVera EKG Explorer, provides a user interface for working with data.

To demonstrate the solution to the affiliated borrower's groups recognition problem, we have created a demo stand, which components architecture is shown in the following figure. The graph DBMS, a RDF triple store (Apache Fuseki) contains the conceptual level of the ontological model in a machine-readable form. The Postgres relational DBMS contains actual data about organizations and persons. The actual data can be loaded into the platform from the bank's corporate systems using REST API or Kafka queues.

Solution architecture for k3 norm calculation

The structure of a simple ontological model fragment for our case is shown in the following figure.

Ontology structure for k3 norm calculation

Here is an example of a SHACL SPARQL rule that concludes that two agents are affiliated based on the existence of a trust management agreement between them:

SHACL SPARQL Rule example

Each SHACL rule applies to a specific ontology class(es). The rule listed above applies to objects of the Trust_agreement class. The rules are applied when any data object of this class is changed. The ID of the changed object is substituted into the SPARQL query representing the "body" of the rule instead of the $this variable. Thus, when creating or changing any data of the agreement, an affiliation relationship is established between the organizations participating in it - the principal and the manager. All facts obtained by the inference are marked in the database as the result of the rule. If in the future the object is deleted or changed so that the condition of the rule no longer holds, the inferred facts will be automatically removed from database. The EKG Provider API also provides a special method that allows you to update the entire inference for a specific object - for example, if changes to the data were made directly to the DBMS, bypassing the platform API.

The inference of some new facts can produce the following new facts. Let's return to the situation shown in the first figure of this article, where two groups of affiliated counterparties were shown. Let information appear in the database that Jane Doe is a shareholder of "Minimarkets LLC". A rule will infer the affiliation relationship between these two counterparties. Since the property "is affiliated" (hasK3GroupRelative) is transitive, establishing a new relation of this type will lead to the inference of new facts along the chain, which will eventually link all counterparties of both groups.

This is the main value of the inference rules for the task we are describing: the composition of groups of affiliated counterparties automatically changes as the facts recorded in the database change. The value of ontologies lies in the fact that the analyst can change the description of the data structure itself and the logic of the rules at any time without interfering with the program code or the physical structure of the database, without stopping the system - it is enough to make appropriate changes to the model. If at some point a new version of the Resolution establishing the criteria for affiliation is released, it will be possible to make changes to the logic of the software system in a matter of minutes.

Let's show how a user and a developer can interact with the data presented in according to the ontology. The user works with data using the DataVera EKG Explorer interface. Here the user can filter, sort, export data to Excel, including those resulted by the inference.

Interacting with data in the DataVera EKG Explorer interface

The developer and application code can interact with the platform by exchanging messages through Kafka topics or through the REST API. It is convenient to organize the export to the EKG Provider of up-to-date data on business objects from the bank's corporate systems using Kafka. The REST API allows to receive information about the inference results. Let us show a REST API query example to find out which borrowers the "Premium Estate" company is affiliated with:

Querying DataVera EKG Provider REST service

The service response contains the values of all object properties. For properties whose values are obtained using inference rules, the "Inferred" JSON collection element contains the identifier of the rule that has produced a new fact in the database.

You can also use the SPARQL interface to work with data. For example, let us query all the owners of organizations affiliated with the "Fast food delivery" company. To execute the query, we will use the Apache Fuseki panel, changing the service address to SPARQL DataVera EKG Provider endpoint:

Querying DataVera EKG Provider SPARQL endpoint

We emphasize that the answer is generated by the DataVera EKG Provider based on the data actually located in the Postgres relational database – this is the essence of the data virtualization technology. A developer or a software component can work with data that is stored in a relational DBMS, as if it were in a graph in an RDF triple store.