K3 - prudential norm established by the National Bank of the Republic of Kazakhstan, which determines the maximum amount of risk per borrower. The methodology for calculating this norm is determined by the Resolution of the Board of the National Bank of the Republic of Kazakhstan No. 170 dated 13.09.2017. For the purposes of calculating this standard, a group of borrowers related to each other by several criteria is considered as one borrower. In order to calculate the K3 ratio, the bank needs to identify groups of related borrowers among its customers.

The Resolution sets out several criteria by which borrowers are recognized as related (affiliated). Affiliation may arise as a result of kinship of major participants or managers of legal entities, due to the existence of a trust management agreement between borrowers, provision by one borrower of guarantees for the obligations of another, etc. Chains of relations resulting in the formation of groups of affiliated borrowers may be quite long. Banks face the difficult task of implementing an algorithm for automatic grouping of borrowers, which will revise the composition of groups when new information becomes available.

Let us consider a simple example. Let there be two groups of borrowers related as shown in the figure below. Then information appears that Jane Doe is a shareholder of "Minimarkets LLC". As a result of the appearance of this connection, both groups of borrowers should be merged into one.

Such a problem is ideally suited to be solved using the rules of logical inference on a graph. The initial information for solving the problem can be conveniently represented as a graph: borrowers, trust management agreements, pledged items and other objects can be represented as graph vertices, and relations between them as edges. Types of vertices and edges form a "vocabulary" of the subject area, which can be represented as a machine-readable ontological model. Then, in terms of this ontology, it is possible to formulate the rules according to which the relations between the elements of the model will be established.

Let us create a simple ontological model, which will contain classes (entity types) "Person" and "Organization", as well as properties, by which objects of these classes can be related to each other: "is a founder", "is a relative", "is a manager" etc. In terms of such a model it is easy to formulate rules, for example:

IF
A is an Organization, B is a Person,
B is a founder of A,
THEN
A is an affiliate of B,
B is affiliated with A

The property "is affiliated" must be made transitive. Then the logical inference machine will infer that if A is affiliated with B, B is affiliated with C, then A is affiliated with C. This is enough to start building long chains of affiliations.

For example, if there are organizations A and C whose founders are the same person B, then due to the above rule and the transitivity of the "is an affiliated person" property, it will be automatically calculated that organizations A and C are affiliated.

Adding other rules, the wording of which follows from the resolution of the Board of the National Bank, we obtain a model in which affiliation relations between group members will be calculated automatically. Each member of such a group will be linked by the property "is an affiliated person" with all other members of the group.

How to realize on the program level the calculation of borrower groups in the described way? What is needed is a software system that can handle a large amount of frequently changing data, represented according to an ontology model, with the help of logic inference rules. A graph-based implementation without a conceptual schema and rules, expressing the logic in Gremlin or similar language, loses in the ease of logic management as business requirements change and in the ability to control data integrity. Traditional graph DBMSs designed to store ontologies - triplet stores, RDF triple stores - are not optimal for data processing in the mode required by the task conditions. Our solution, DataVera EKG Platform, bypasses the limitations associated with the volume and frequency of data updates and provides functionality that is not available in most graph DBMS implementations.

The platform consists of two components. Information processing is provided by the data virtualization component, DataVera EKG Provider. It provides a program interface (API) with the help of which it is possible to work with data whose structure corresponds to the ontology model. The data is physically stored in a relational DBMS. The DataVera EKG Provider provides:

  • control of access rights to information,
  • saving information about the entire change history of each object,
  • application of integrity control rules and logical inference rules in accordance with the SHACL specification.

Data can be interacted with using SPARQL query language, which is part of the ontology modeling technology stack, as well as using REST API or subscription mechanism via Kafka. Another component of the platform, DataVera EKG Explorer, provides a user interface for working with data.

To demonstrate the solution to the problem of determining groups of affiliated counterparties, we have created a stand, the structure of which is shown in the following figure. The graph RDF triple store (Apache Fuseki) stores the conceptual level of the ontology model in machine-readable form. The Postgres relational DBMS holds the actual data about counterparties - organizations and persons. Import of actual data into the platform from the bank's corporate systems is performed using REST API or Kafka queues.

The structure of a fragment of an uncomplicated ontology model for our scenario is shown in the following figure.

Here is an example of a SHACL rule that concludes that two agents are affiliated based on the existence of a trust agreement between them:

Each SHACL rule applies to a specific ontology class(es). The rule shown here applies to objects of the Trust_agreement class. The rules are applied when there is a change in the data of any object of this class.

The identifier of the changed object is substituted into the SPARQL query representing the “body” of the rule instead of the $this variable. Thus, when creating or modifying any contract data, an affiliation relation is established between the participating organizations - principal and manager. All facts obtained as a result of logical inference rules are marked in the database as a result of the rule. If in the future an entity is deleted or changed so that the rule condition is no longer met, the inferred facts will be automatically deleted. The EKG Provider API also provides a special method that allows updating the entire logical output for a certain object - for example, if changes to the data were made directly in the DBMS bypassing the platform API.

The conclusion of one new fact may give rise to the emergence of the next new facts. Let's return to the situation in the first figure of this article, where two groups of affiliated counterparties were shown. Suppose information appears in the database that Jane Doe is a shareholder of "Minimarkets LLC". A rule will be triggered that will establish an affiliation relationship between these two counterparties. Since the "is affiliated" property (hasK3GroupRelative) is transitive, establishing a new relationship of this type will result in new facts along the chain that will eventually link all counterparties of both groups.

This is the main value of logical inference rules for the problem we are describing: the composition of groups of affiliated counterparties automatically changes as the facts recorded in the database change. The value of ontologies is that the description of the data structure itself and the logic of the rules can be changed by the analyst at any moment without interfering with the program code or the physical structure of the DBMS, without stopping the work of the system - it is enough to make the corresponding changes in the model. If at some point a new version of the regulation establishing the affiliation criteria is issued, it will be possible to make changes to the logic of the program system in a matter of minutes.

Let's show how the user and developer can interact with the data represented according to the model. The user works with the data through the DataVera EKG Explorer interface. Here he can filter, sort, export to Excel the data, including those obtained as a result of logical inference rules.

The developer and program code of application components interact with the platform by exchanging messages through Kafka queues or via REST API. Through Kafka it is convenient to organize export of actual data about business objects from the bank's corporate systems to EKG Provider. Through REST API it is convenient to get information about the results of logical output. For example, to find out which borrowers the company "Premium Estate" is affiliated with, you need to make a request to the REST service:

The service response will list the values of all object properties. For properties whose values are obtained as a result of logical inference rules, the Inferred attribute contains the identifier of the rule that led to the appearance of the new fact in the database.

You can also use SPARQL interface to work with the data. For example, let's say we are interested in all owners of companies affiliated with "Fast food delivery". To execute the query, let's use the Apache Fuseki panel, changing the service address to the SPARQL access point DataVera EKG Provider:

We emphasize that the response is generated by DataVera EKG Provider based on the data actually residing in the Postgres relational database - this is the essence of data virtualization technology. The developer or a software component created by him has the ability to work with data residing in a relational DBMS as if it were in a graph in an RDF triplet store.