Expected business effects:
Strategically, the implementation of MDM allows you to stop losing money on multiple integrations and start monetizing your data. If you put your master data in order once and build reliable processes to maintain its quality, you will be able to save on operating costs in the future for both the IT and business units. Even more importantly, the implementation of MDM will create a foundation for the development of the IT infrastructure, i.e. faster and more efficient implementation of new technologies and applications for business processes automation and analytics. The time and cost of implementing such innovations will be significantly reduced, and their results will be of higher quality. In particular, it makes sense to implement any artificial intelligence (AI) tools only when the data they use is in order, when the data are provided with the context and explitic semantics.
Project objectives and sequence of its implementation:
Let's take a closer look at how client data is processed after copying an information object from the source system to the platform. The adapter converts information about a business object received from any Bank application into the universal structure in which the data should be stored in the platform. After that, the adapter sends a request to the platform to write the object.
Data in source systems can be "dirty". Typical problems include duplication of spaces and hyphens, case inconsistency in different applications, substitution of identical letters of different alphabets - for example, the use of the English letters "a", "o", "e", etc. in names written in Cyrillic. To eliminate such problems, normalization occurs at the moment of writing data to the platform. The rules configured by the analyst in the visual interface of the platform are applied. Rules can remove unnecessary special characters, convert strings to a certain case, format phone numbers, etc. The properties of an object from a data source in the platform store both the original value obtained from the source and the result of normalization. This allows you to track the correctness of the cleaning.
Normalization rules eliminate problems that can be corrected automatically. But most problems require user intervention in order to make the right decision on data correction. To do this, you need to mark erroneous or suspicious values of object properties so that data stewards can correct them or contact business units for clarification of information. To solve this problem, format-logical control rules are used, which can:
In real projects, hundreds of such rules are created. It is important that the platform provides a single point for reviewing and managing all the rules. Each Bank application may implement some of these rules, and when changing the business logic, it is necessary to make changes to each of the applications; however, it is not always known exactly which rules are implemented in each system. Centralized rule management through the platform removes this uncertainty. Moreover, if the Bank independently develops certain applications, for example, a single front-end system for a front-office employee, this system can receive rule definitions from the platform and apply them on its side, or access the platform to validate specific objects. This eliminates the risk that when changing business logic, changes will be forgotten to be made on the side of one of the applications, or they will be made in different systems at different times, which will lead to the appearance of inconsistent data.
What to do if a format-logical control error is detected? The business must decide which types of violations are critical, and which can remain in the reference data until they are corrected. The values of properties that have critical violations should not be included in the reference object. Thus, the appearance of reference objects with Tax ID = 000000000000, phone +7700000000, and so on is unacceptable. When such violations are detected, the platform records a critical validation error in a special log. This error should be passed on to the employees of the business unit who interact with the client so that they can clarify the data. After entering the updated data, the object processing continues and a reference object is created.
Examples of non-critical violations include situations where the phone number does not begin with the operator code (it may be a foreign number), the document validity period is less than the current date, etc.
A similar principle is applied to the control of the mandatory presence of property values. Some properties are critically important, and without them, the reference object cannot be created: Tax ID, phone number, date of birth, etc. Other properties are needed in principle, but the reference object can be created without them: code word, place of birth, customer segment, etc.
It is important that in any case, validation violations only apply to certain fields of the client card. Violations do not block the creation of the reference record as a whole, unless a critical violation has occurred in the value of a mandatory field - for example, the Tax ID is incorrect.
As we indicated above, each real object - for example, a client - can have several cards in different sources. They must be combined into groups in order to form one reference object for each business object. To do this, you need to define a set of key fields that uniquely identify objects of each type. To identify a client, the Tax ID is sufficient (if this is a mandatory field), for the document - the number and date of issue.
In DataVera EKG Platform, the rules for grouping objects are implemented using the mechanism of logical inference rules, which can supplement the values of object properties. An example of such a rule: if objects of the "Person from data source" class have the same Tax ID, then they must be linked with a special link.
When creating data processing rules in the platform, you need to pay attention to the following:
1. Carefully select the tolerance level in the validation rules. In practice, you have to balance between the desire to have ideal data and the limitations of real business processes, in which it is not always possible to immediately obtain a complete set of data about the client. For example, in the onboarding process, the task of the business is to make the client interested in a banking product, such as a loan, without requiring him to enter more information than is required to make a preliminary decision on the possibility of issuing a loan. Too strict rules can interfere with customer service, and too weak ones will impair the ability to create analytics.
2. Historical data that is loaded into the platform when it is launched obviously contains a large number of validation errors. It may be necessary to relax the rules when loading old data in order to obtain a reference record for each client. Then, in the process of working with the platform, organize the process of verification and supplementation of the accumulated information.
3. There will be quite a few validation violations in any case. Mechanisms are needed that will allow organizing processes of a) immediate response to the emergence of new critical errors, b) continuous improvement of data by eliminating accumulated non-critical errors. To manage these processes, monitoring tools are needed that visualize data quality metrics associated with the number of validation violations and allow tracking the dynamics of their change.
4. It is important to divide the work on eliminating validation violations between the employees of the data management service and business units interacting with clients. The data management service cannot independently make decisions on data corrections, unless it is a matter of eliminating technical errors or loading information from a trusted source, such as government services. Changing most customer data requires verification of its source, which can only be provided by departments interacting with the customer. The tools and processes described in points 3 and 4 help achieve one of the most important goals of the MDM implementation project - to ensure continuous improvement in the quality of customer data.
5. When normalizing banks' customer data, AI cannot be used. Decisions on automatic data correction can only be made based on deterministic algorithms approved by responsible business employees and system analysts.
6. The set of rules and their logic can change due to changes in business requirements. There must be a process for making changes to the rules that analysts can perform without interfering with the program code. It is necessary to be able to apply a new version of the rules to the accumulated data.
After all stages of processing information from sources are completed, reference data must be generated. Since each reference object can be formed from several source objects from different systems, the question arises of how to select property values. For example, in one source system the client's phone number is +77012345678, and in another +77059876543. How to decide which phone number should be in the reference card?
The solution depends on how the Bank's business processes are organized and how they are interconnected with automated systems. In practice, two options are most often encountered:
To determine which property value is the most recent, DataVera EKG Platform stores a full history of changes to the values of all properties of each object. This allows you to assemble a reference object from property values stored in different systems. For example, if a client reported a change in residency during a visit to the office, this information will be reflected in the Core system, and if he changed the phone number for notifications, this information will come from the mobile application. In any case, the reference object will contain the latest version of the residency status and the latest, current phone number.
It may turn out that for some reason the reference object cannot be formed. This may happen if there are no values for properties that are mandatory for consolidation: for example, a phone is mandatory for a reference object, but it is not present in any card from the source systems. Another option is the presence of duplicate objects with the same values for key properties in one source system, for example, the existence of two client cards with the same Tax ID in a system. Such rules are configured by an analyst when implementing the platform. During the operation of the system, it is important to track the occurrence of such errors and promptly eliminate them by supplementing or correcting client data. The platform provides several options for this: an error log, a subscription to notifications or error summaries, and export of error count metrics for display in monitoring tool dashboards.
What you need to pay attention to when working with reference objects:
1. Do not edit reference objects directly if any problem with the data is detected. In this article, we describe a consolidating MDM design pattern, in which the primary source of any changes are business applications. This means that detected errors should be corrected at the level of these applications. Reference objects should be read-only, and only standard consolidation processes can change them.
2. Reference object identifiers should be unique and not have any semantic. It is best to use GUID.
3. In most cases, reference objects cannot be deleted, since they may contain references from other systems. If a client's card is deleted from business applications, it is better to mark the corresponding reference object with a special flag at the MDM level, but not delete it. An exception is the deletion of personal data according to legal requirements.
The data management department and, in some cases, business users, are interested in visual tools for quickly searching for data, creating analytical queries, monitoring the processes of transfer and transformation of information.
We have already mentioned some of the functional capabilities that provide monitoring and tracking of changes in data. Let's list all the tools that provide the data management department employees with a complete picture of the current state of data and their update processes:
The EKG Platform provides a user interface for working with all these functional blocks. It allows viewing both the data itself and technological information about its processing, as well as analytical views. The interface allows working with the data structure and all types of rules. The interface supports the creation of analytical samples, import/export of information to Excel, navigation between related data objects.
Important aspects of interfaces and data observability:
1. No processing step should remain inside the "black box". In order for business users to trust the data and the analytics built on their basis, it is necessary to achieve a complete understanding of the data transformation path both at the level of the basic diagram expressed in the rules and at the level of tracing the origin of specific objects.
2. The employees of the data department have tasks to build samples both for objects loaded from sources and for reference data. Both should be available in the interface and separated so that the user clearly understands what data set he is working with.
3. For each type of possible errors and violations registered by the platform, responsible employees and departments must be identified who must respond to their occurrence and eliminate the problems. Errors should not simply accumulate in logs.
Once the reference data has been created and the process of updating it has been established, the collected data must be used. To do this, it is necessary to deliver the reference data to the applications in which the Bank employees work, performing business processes. Such applications can be:
The exchange scenario diagram for the first two types of consumers looks like this:
This scenario ensures the achievement of one of the main goals of the MDM implementation project: to ensure that data is up-to-date in all systems, regardless of which of them updated the client information. For example, if a client's contact information is updated in the mobile application, it should be transferred to other bank systems, such as CRM and the credit system. The platform allows you to customize data distribution paths depending on what information was changed and during what process. This method of distributing reference data also reduces the number of errors in business processes associated with the use of outdated client data.
This scenario also helps reduce the labor costs of the front office personnel in some customer service scenarios. For example, when onboarding a new client, it is usually necessary to enter information about him into several Bank applications at once. Implementing MDM allows you to avoid creating multiple point-to-point integrations between applications and working out multiple possible options for the course of the business process. With MDM, it does not matter for what purpose the client initially applied: to apply for a loan or issue a payment card. In any case, the information entered into one of the front-end systems will be distributed to other applications in near real time, and there will be no need to re-enter it into other systems.
BI and other analytical tools are information consumers, receiving timely updates of reference data. This helps achieve another goal of the MDM project - the formation of a reliable data array for analysis and use in ML models.
The emergence of a central data management platform allows us to begin implementing a new type of applications - data-centric applications. They are user interfaces designed to perform specific tasks within certain business processes. These applications do not have their own DBMS: instead, they use the data management platform and the reference data stored in it as the only source of information. This allows us to significantly reduce the time for developing and implementing such applications, and avoid complex integration.
Data-centric applications must be ready for changes in data, their structure and processing rules from the platform. This is easy to achieve, since all these types of information are available through the platform's application programming interface (API), including in subscription mode, when the platform notifies applications of any changes. Using this mechanism, you can make applications adaptive. For example, when a new client property appears, the program can automatically display the input field for the value of this property in the client card and apply the validation rules specified in the platform to it. Creating applications in this way allows you to significantly save on their support and revision to implement new business requirements, and many requirements can be implemented almost instantly.
For example, an application for client onboarding or a tool for creating marketing campaigns can be made data-centric. In the first case, the operator will enter the information necessary to create a new client in all business applications through the front-end of the application. This information will be recorded directly in the data management platform, and the generated reference object will be transferred almost instantly to all applications responsible for the execution of specific business processes. In the second case, the marketer will be able to use complete information about the client, collected from all possible sources, and including both socio-demographic and behavioral data of clients, to select customer segments for marketing impacts.
When implementing the exchange of client data between the data management platform (MDM) and business applications, you need to pay attention to the following.
1. After MDM implementation, some previously implemented point-to-point exchange flows between business applications may become unnecessary. They should be disabled to avoid distributing information along parallel paths.
2. In the data replication mode described above, echo suppression is important. Let's say that as a result of data change in system A, a reference object was updated, which was replicated to system B. The data change in system B will be noticed by the incoming adapter of the platform. A mechanism is needed that will allow us to distinguish the change that occurred as a result of updating client data initiated by MDM from changes made by the user in the application itself. Without going into technical details, we note that DataVera EKG Platform has implemented several options for solving this problem.
3. When simultaneously updating data for one client from several systems, collisions may occur, which the platform must be able to resolve. Having data origin tracking tools in the platform helps avoid questions about why information entered by one of the systems did not make it into the reference object.
4. To receive data on the business application side, it is usually necessary to create adapters that update data in the application database using its own SDK or API. Such adapters are created by specialists who support respective applications. When developing such an adapter, it is important to take into account many points regarding the transactionality of update operations, the possibility (or impossibility) of parallel processing of multiple update requests, the adapter's ability to withstand the predicted load, maintaining data integrity, and correctly informing the platform about errors that have occurred.
5. When designing data exchange with MDM, it is important to take into account the possibility of temporary unavailability of source and receiver systems. The platform and adapters must automatically bring the data to a consistent state after the availability of the systems is restored.
6. The platform must be deployed in an environment that provides fault tolerance and horizontal scaling, such as Kubernetes. Data warehouses used by the MDM system must also be clustered and provide fault tolerance.
MDM implementation is a complex organizational task that affects and transforms the Bank's key business processes. For successful implementation, it is necessary to strictly adhere to project management practices, which consist of task decomposition, constant progress monitoring, resource allocation, etc. Let us describe the key points that are important for MDM implementation projects.
1. The project should be divided into sprints, each of which produces a useful and usable result. The first step should be the creation of a corporate model of reference data that is valuable for analysts. The second step, after setting up the procedures for loading data from sources, is the formation of data quality metrics in each source and a list of specific violations, which allows you to begin work on correcting the data in the sources. In the third step, after the initial set of reference records has been formed, it becomes possible to build analytics on the client base that is important for business units, and so on. The presence of such results ensures constant interest in the project and its support from the management.
2. The implementation of most tasks within the project requires close interaction between employees of business units, business analysts, system analysts and the integrator implementing the platform. The need to allocate significant resources of the Bank's employees in a timely manner and manage them requires the appointment of a project manager from the customer who has the authority to access such resources. The lack of these resources at any stage of the project can cause a significant decrease in the quality of the project results.
3. MDM implementation is a very complex project consisting of a mosaic of technologies, processes, artifacts and software solutions. Effective project management, understanding its results and ensuring the ability to use them require project visibility and a high degree of awareness of all participants not only about the areas of work in which each of them is involved, but also about the project as a whole. The need to hold presentations and general meetings leads to an increase in the share of "overhead costs" in the labor costs of project participants, but this cannot be done without compromising quality.
4. Designing and implementing MDM are complex intellectual tasks that require a systematic approach and teamwork of different specialists. Avoid the temptation to automate the creation of a reference data model or rules: all design decisions should be made only by experts, and only after discussion and approval. Any design decision must be justified and recorded. Otherwise, the results of the system's operation will be inexplicable to business users, trust in the data will decrease, and the possibilities for its useful use will be reduced.
5. During the project, documentation and/or a knowledge base for it should be formed. Most likely, there will not be time to create such resources at the end of the project. In addition, many elements of knowledge subject to recording need to be recorded "on the run", before the focus of attention has moved on to other issues.
6. Functional, load, and integration testing play a huge role in project implementation. Testing tasks usually account for more than half of the project's labor intensity. This is due to the fact that data exchange scenarios with MDM have a high variability, as well as the need for regression tests after each change in rules and settings. Testing processes need to be automated, but the share of manual labor in them still remains high. Employees of business units and analysts, along with IT specialists of the customer and integrator, should participate in testing.