Dave McComb (President and Co-founder, Semantic Arts, US) declares that last decades the digital industry was choosing the application-centric approach as the mainstream. It means that any new business idea or requirement usually to an implementation of a new application (most of them having its own new database and using object-oriented MVC architecture). This increase the enterprise IT architecture complexity coming from the variety of applications using different data models, and having the hard-coded business logic. Handling this complexity is one of the main challenges in digitalization. Dave McComb proposes an alternative way of design and implementation business requirements in software industry: the data-centric approach. Data-centric approach supposes that applications are less important than the data processed by applications. As the result of such view, domain data models and (in many cases) business logic that operates with them might be moved from the applications to the central corporate repository. Data-centric approach is based on data modeling, which require a formal, expressive, general-purposed and reliable technology. Therefore graph-based technologies and Ontologies in particular become very attractive model-driven solutions for representing domain knowledge projected to the data structure and processing logic in different business areas and industries. Ontologies allow to maintain expressiveness of data models and their processing logic in a centralized and manageable way. They become accessible not only for developers, but for business/system analysts, architects and other design decision-makers. This work includes high-level review of some scientific papers, research and other sources concerning Ontologies use in the domains of Data-Modeling, Robotics, Biomedicine, Industry 4.0, Financial sector and LLM. Selected list of sources and their interpretation does not claim scientific validity, instead, the author believes it illustrates the diverse applications and perspectives of ontologies in the modern world.
For decades relational data models and Relational databases were considered as default methods in modeling subject area entities, relationships and attributes. Relational databases give well known capabilities to create and store data in table-oriented way. On the other hand, Ontologies as alternative approach of modeling domain area deserve close attention. The important aspects of comparison between Relational databases and Ontologies are listed below:
Researchers highlight in [1] that the ontologies and databases are not the excluding alternatives, but can complement each other. Following criterions are proposed to use when choosing between both technologies:
Object-oriented modeling in software development became the standard approach for expressing and controlling complexity in many fields. Though Object-oriented modeling starts from representing domain model of target domain, eventually at later stage of modeling it shifts mostly to the details related with code structure (or objects structure in runtime) rather than domain model. This shift leads to the situation when domain model is largely hard-coded at the code level. It means that understanding of how application works is a prerogative of developers. Nowadays we observe the advantages and drawbacks of such approach. The key points of comparison between the Object-oriented application data model design approach and the Ontologies are highlighted below:
The following aspects of the Robotics knowledge domain can be modeled using Ontologies, which is proven in practice:
Robot structure and fundamental concepts. There are several domain ontologies created for Robotics field and especially for core structure and core entities:
Processes, tasks and mission planning in robotics. In Robotics implementation of mission or task planning process is considered as a crucial issue, which need to apply unified approach. Obviously, the pre-programmed logic for defining which next task should be performed is not flexible. Well-known core ontology for robotics and automation (CORA) includes concepts for task planning and process, but it describes these concepts in the high-level manner. The several perspective attitudes for task planning in Robotics field are:
ORPP application ontology [5] is based on SUMO and CORA ontologies. ORPP includes additional concepts related with task planning and process execution. The Figure 3 describes basic concepts of ORPP and its relationships with upper and domain ontologies.
RTPO [6] is a less popular ontology for modeling task planning and execution in Robotics. Despite this, RTPO provides capabilities of dynamic adaptation to changes in environments and conform behavior and task planning according to those changes. RTPO also supports interpretation of complex relationships between actions, available resources and environment states. The Figure 4 provides snapshot of the concept hierarchy in RTPO [6].
CRAM (Cognitive Robot Abstract Machine) [7] is a software toolbox for design, implementation and control of the autonomous robots, including planning and performing daily service activities. The Figure 5 displays the architecture of CRAM. KnowRob as part of CRAM architecture can be extended by specific domain and application ontologies. CRAM is a widely used tool in Robotics industry in different use cases including task planning.
Knowledge representation. Representing knowledge in robotics is an essential challenge. Especially in cases when robots meet uncertainty of acting environment and interaction with unknown objects, or an unknown situation. In such cases a human apply the commonsense knowledge, an ability to identify situation or uncertain object by finding similar behavior patterns, or ability to classify/categorize object guiding by fundamental knowledge and reasoning capabilities. This challenge is in the core of the research field of cognitive robotics, where knowledge representation and reasoning techniques are employed to support “autonomous robot[s] in a dynamic and incompletely known world” [https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2024.1328934/full]. Moreover, this challenge is considered as one of the main stop factors that slows down the wide usage of Robots, for example in the households and other service robotics scenarios. Ontologies partially can capture this complexity in use cases that need to represent knowledges as set of related concepts, particularly in graph-based model which represents the objects and relations using edges and vertices. Nowadays KnowRob is one of the most popular knowledge processing systems that combines knowledge representation and reasoning methods with the knowledge acquisition techniques and for grounding the knowledge in a physical system. It can serve as a common semantic framework for integrating information from different sources. KnowRob combines static encyclopedic knowledge, common-sense knowledge, task descriptions, environment models, object information and information about observed actions that has been acquired from various sources (manually axiomatized, derived from observations, or imported from the web) [https://www.knowrob.org/knowrob]. The Figure 6 provides a conceptual model of KnowRob system.
Surrounding environment interactions. Navigation is one of the main aspects of Robot interaction with environment. Lidar Odometry and Mapping (LOAM) and Simultaneous Localization and Mapping (SLAM) methods allows robots to build internal model of the the environment as the map. LOAM has limited use of ontologies so far. LOAM primarily focuses on geometric and topological methods for odometry, mapping, and building 3D models of the environment, which is accomplished using accurate point cloud processing algorithms. In SLAM, the ontologies used more widely. SLAM describes the wide sub-domain that characterize the maps information, semantic evaluation of surrounding objects, locations of robots, robot characteristics and states related with its location. SLAM algorithms help robots to optimize their localization and orientation in uncertain environments. For instance, in [9] the authors present advantages of OntoSLAM ontology that covers following relevant knowledge subcategories: Robot Information (physical and structural capabilities), Environment Mapping (differentiation objects from their environments objects or identify shape information of the objects), Time information (processing own movement or movement of other objects regarding certain period of time), Workspace Information (defines general properties and category of space where robot acts). OntoSLAM combines different knowledge related with the robot structure, capabilities, locations and mapping, and brings not only information about the shapes and size of identified objects, but adds semantic information of the surrounding objects and environment. It helps robots to understand action context, particular roles, functional purpose and many other conceptual properties of observed objects and environments. This knowledge significantly increases complexity and use cases of interaction robots with real-world environment. OntoSLAM is based the following upper-level ontologies:
Figure 1 – Taxonomy of ROA and relation with CORA, SUMO. Blue boxes are concepts from SUMO. Orange boxes are concepts from CORA. Yellow boxes are concepts from ROA Ontology. Almost all relations are imported from SUMO/CORA, with exception of structure and association.
Figure 2 – Robot Motion taxonomy
Figure 3 – ORPP concepts and relations with SUMO, CORA
Figure 4 – Snapshot of the RTPO hierarchy
Figure 5 – CRAM architecture
Figure 6 – KnowRob conceptual model
Due to complex nature of biology and biomedicine, the Ontologies became a best choice for knowledge modeling. Table 1 from research [12] presents the main features of ontologies in biology and biomedicine.
Table 1. Ontologies features relevant for biomedicine and biology
Ontology feature | Utility in research |
---|---|
Classes and relations | The use of standard identifiers for classes and relations in ontologies enables data integration across multiple databases because the same identifiers can be used across multiple, disconnected databases, files, or web sites. |
Domain vocabulary | Through labels associated with classes and relations, ontologies provide a domain vocabulary that can be used for such applications as the natural language processing, creation of user interfaces, etc. |
Metadata and descriptions | Textual definitions, descriptions, examples and other metadata associated with classes in ontologies enable domain experts to understand the precise meaning of class in the ontology. The definitions and related metadata should allow consistent understanding of the meaning of classes in ontologies. |
Axioms and formal definitions | Formal definitions and axioms enable automated and computational access to (some parts of) the meaning of a class or relation. |
Open Biological and Biomedical Ontologies (OBO) is a unified set of ontologies with single concepts of design and functional interoperability. Below are listed several popular ontologies from OBO:
Gene Ontology. The structure of molecular biology determines operating with huge amount of data. This data describes gene products, their biological roles and cellular location in which they act [13]. The gene ontology is based on ontology which unlocks the representing complex of complex knowledge about genes in highly structured and organized level. It allows to organize knowledge in the machine-readable and machine-interpretable form. From another perspective, the Gene ontology is also a controlled vocabulary which is used to annotate genes. Annotations are prepared by trained domain specialists who analyze huge number of publications and their experimental findings. As a result, the findings are translated into Gene Ontology terms [13]. Gene Ontology covers the three areas:
Cell ontology. The Cell Ontology is a structured controlled vocabulary for cell types in multicellular organisms. The Cell Ontology is designed as a structured controlled vocabulary for cell types. This ontology was constructed for use in the organism models and other bioinformatics databases, where there is a need for a controlled vocabulary of cell types. This ontology is not organism-specific. It covers cell types from prokaryotes to mammals. However, it excludes plant cell types, which are covered by PO [14].
Disease Ontology. Standardized descriptions of Human disease. As presented in Figure 7, the disease are classified by the agent, target, origin and syndrome [15]. The Disease Ontology semantically integrates disease and medical vocabularies through extensive cross mapping of DO terms to MeSH, ICD, NCI's thesaurus, SNOMED and OMIM [15].
Foundational Model of Anatomy Ontology (FMA). FMA describes Human body structure. the Figure 8 displays a hierarchical structure of FMA [16]. In [17] the authors have illustrated complexity of this ontology. One of the FMA applications after its enrichment with other ontologies or user-defined entities can be in reasoning using transitive relations. For instance, the structure of Heart partially represented in FMA by the following hierarchy:
Heart
-Left side of heart
--Left atrium
---Wall of left atrium
----Wall proper of left atrium proper
-----Wall of outflow part of left atrium
------Pectinate muscle of left atrium
All the listed components are related with each other using the partOf relation, in the direction from bottom to top. Hence Pectinate muscle of left atrium is part of Wall of outflow part of left atrium is part of Wall proper of left atrium proper and so on. PartOf relation has transitive nature, which expressed by assertion: “if A related with B and B related with C, then A related with C”. For example: “If a≤b and b≤c, then a≤c”. Using transitivity of PartOf relation, the automated reasoning provides answers to the questions like "Is Pectinate muscle of left atrium part of Wall proper of left atrium proper?" This is obvious for domain expert knowledge, but implicit for computer program and would require adding some lines of code if not using ontologies. In contrast, Ontologies provide powerful automated inference mechanisms without adding new “if-then” statements to the program code. The ontology reasoner can infer the fact that Pectinate muscle of left atrium is part of Wall proper of left atrium proper automatically by processing partOf relations.
Figure 7 – DO base structure
Figure 8 – Structure of FMA ontology
Due to rapid development of digital technologies within Industry 4.0 transition, the manufacturing industries, communication technologies (ICT) and the Internet of Things becomes more interconnected[18]. The advantages of digitalization leads to the growth of efficiency and productivity in many production areas. However, increased volume, variety and availability of data resulting from increasing number of interconnected physical objects and information systems, becomes a challenge when developing reliable and certifiable methodologies of transition from traditional manufacturing system to smart system [19]. Data in the increasingly complex manufacturing systems become much more fragmented, siloed and disparate. As the ERP, PLM (Product Life Cycle Management), MES (Manufacturing Execution Systems) systems has their own data models and different interfaces or protocols, common manufacturing data cannot be transmitted easily among the various systems. This challenge needs to be tackled using the strong common methodology [18]. Ontologies modeling can partially play this role for redesign and centralizing data models. The idea proposed in research [18], shown on the Figure 9, is in using InPro ontology intended to simplify process and storage of all the manufacturing data related with several aspects of the production workflow in single graph database. Figure 10 illustrates the core ontological model of the InPro [18]. Here are few other applied in practice ontologies and tools:
SSN and SOSA ontologies. Sensing, sampling and actuation are considered as fundamental functions in IoT and Industry 4.0. To perform more complex actions based on sensor data, semantic interpretation of observed mere values of sensor data is required. It helps to search, reuse, integrate, and interpret observed data in huge number of different context and use cases. Semantic Sensor Network (SSN) and Sensor, Observation, Sample, and Actuator (SOSA) ontologies provides flexibility and coherence to represent entities, relations and activities involved in sensing, sampling and actuation. SOSA provides lightweight core for SSN and the minimal interoperability level. The figure 11 presents an overview of conceptual modules of SOSA/SSN ontologies. Withing following figures 12-17 ontology structures of observation, sampling and actuating activities is shown [20].
MASON. Product Life Cycle management requires a unified way of describing manufacturing processes. Versatile manufactories need to dynamically adjust their production capacities in coordination with suppliers, customers and other units inside the company [22]. In the paper [21] authors state that the manufacturing domain is a sum of product, process and resource concepts. Manufacturing domain determines relationships between these concepts driven by following main elements: an information systems, rules and common dictionary [22]. Manufacturing's Semantics Ontology is proposed as an upper-level ontology in manufacturing domain [22]. The Figure 18 displays the main classes and object properties of MASON ontology [22]. MASON ontology is applied in practical implementation within manufacturing use cases as cost estimation and multiagent systems [22]. For cost estimation all needed information (resources and details of assembly process) and rules can be collected from instantiated MASON ontology [22]. For multiagent systems where different agents need to interact with each other, MASON ontology provide a cognitive model of agents, which ensure knowledge base and reasoning capabilities [22].
PSL. The manufacturing process is a sequence of activities that constitute assembly and transformation of raw materials to a final product. Process Specification Language (PSL) is a popular formal representation (partially ontological) of manufacturing processes [23]. Such areas of manufacturing process modeling as scheduling, process planning, simulation, workflow, project management are covered by PSL. The PSL ontology was developed by National Institute of Standards and Technology (NIST) [23]. One of the reasons for the invention of PSL was related to the issue that existing approaches of process modeling do not contain unified specification of the semantics of the process terminology [24]. When within different projects, process terminology and specifications are unaligned, then analysis, reuse and sharing of knowledge are hindered. The Figure 19 illustrates differences of process terminology between two processes [24]. PSL essentially uses and extends the Knowledge Interchange Format (KIF) as a standard for exchanging manufacturing process information [20]. The Figure 20 illustrates that PSL Core is based on Foundational Theories, namely the set theory and situation calculus [24]. Definitional extension is an extension whose new linguistic items can be completely defined in terms of the foundational theory and PSL Core [24]. Nondefinitional extensions, of course, are extensions that involve at least one notion that cannot be defined in terms of PSL Core and the chosen foundational theory [24]. The Figure 21 provides another variant of perspective to PSL framework, where ovals represent extensions beyond of PSL Core, rounded rectangles represent part of PSL Core including primary concepts – Activities, Activity Occurrences, Time Points and Objects [23].
RAMI 4.0. The reference architecture model industry 4.0 (RAMI 4.0) is a tool which provides an approach to manufacturing assets classification in the virtual world [25]. Within RAMI 4.0, an asset can be contextualized through its entire life cycle by specifying its data, information, communication protocols, and business functions [25]. RAMI 4.0 developed as integration of [26]:
The Figure 9 presents layers of RAMI 4.0 [26]. The layers of RAMI 4.0 are a model for designing the implementation of Industry 4.0 systems, but they are not part of knowledge graph [26]. RAMI 4.0 layers provides an abstract reference model to separate functions and responsibility via groups presented as layers. However, the data exchange between the layers can be implemented regarding storing and reasoning processes in ontology-oriented way.
Figure 9 – Conceptual architecture of InPro ontology
Figure 10 – The core ontological model of InPro
Figure 11 - Overview of the SOSA/SSN ontology modules
Figure 12 - Overview of the SOSA classes and properties (observation perspective)
Figure 13 - Overview of the SSN classes and properties (observation perspective)
Figure 14 - Overview of the SOSA classes and properties (actuation perspective)
Figure 15 - Overview of the SSN classes and properties (actuation perspective)
Figure 16 - Overview of the SOSA classes and properties (sampling perspective)
Figure 17 - Overview of the SSN classes and properties (sampling perspective)
Figure 18 – MASON ontology, core classes and object properties
Figure 19 – Sample of terminology diversity in two processes
Figure 20 - PSL Semantic Architecture
Figure 21 – PSL Semantic Architecture extended perspective
Figure 22 – Layers of RAMI 4.0
The most well-known ontology in finance domain is the Financial Industry Business Ontology (FIBO). FIBO is standardized and sponsored by Object Management Group (OMG) and Enterprise Data Management Council (EDMC) [27]. The Figure 23 shows a fragment of FIBO class hierarchy with common domain entities – accounts and customer types. The main goals of FIBO ontology are sharing and reusing basic data model of financial domain –in a unified, reusable, machine-readable, application-independent form. Standard FIBO data model has the potential in reducing the excess number of data models (of the same domain entities) related with particular application-specific architecture. Using unified data model in this field allow to reduce costs in development, maintenance common services, and simplify financial data exchange process between the financial institution all over the world. The FIB DM (Financial Industry Business Data Model) described in [28] provides tools for transforming FIBO ontology to traditional data model representation, which can be available in well-known software and formats. Another famous ontology in the finance domain is the Financial Industry Regulatory Ontology (FIRO). FIRO was developed as a response for increased challenges that face financial industry in terms of Governance, Risk and Compliance. FIRO offered a methodology and framework to partially automate and optimize processes associated with AML and other regulatory activities [29]. The Table 2 lists the modules of FIRO ontology [29]. Combining ontologies (such as FIBO, FIRO and others) with established standards like ISO 2022, XBRL creates significant opportunities. This synergy can contribute in tackling one of the most challenging problems in finance domain, related with anti-fraud and AML: the detection of suspicious transactions/operations by computing complex chain of relations between transaction assets, recipients and senders or discovering the affiliated agents. The paper [31] presents a Financial Fraud Detection and Deterrence (FFD) ontology that enables the identification of suspicious transactions based on bank customers' activities. The work [32] presents an example of using the graph-based machine learning for timely reaction in identifying the Money Laundering operations.
Figure 23 – Fragment of FIBO ontology structure
Table 2 – The FIRO modules
FIRO Module Name | FIRO Module Description |
---|---|
FIRO-H | This ontology describes the high-level concepts and their relationships, based on the financial industry regulatory initiatives. This includes concepts, such as Obligation, Prohibition, Exemption or Sanction. |
FIRO-S | This ontology models the general structure of a parliamentary, legislative and judiciary document. For this purpose, the Akoma Ntoso Standard [30] is being used as the main source for defining this ontology |
FIRO-[Domain] | This ontology describes the concepts and their relationships for domain-dependent regulations. Currently, FIROAML for the Anti-Money Laundering regulation is under development. |
FIROOp[Purpose] | This ontology merges all the three previous ones, in order to support a particular purpose and task in the regulatory change management process. |
Recently LLM has become a state-of-the-art technology in AI. Large-scale use of LLM in many fields generates new challenges. Training LLM models on actual or local corporate data is one of the most complex issues faced by the AI industry. As a solution of this problem the Retrieval Augmented Generation (RAG) approach is successfully applied in many cases. RAG suggests that before sending user prompt to the LLM, the context is created by using the relevant information search in the specific knowledge base or database. If a search succeeds, its result is added to user prompt providing the additional contextual information from user-defined knowledge base or database. The knowledge base or database shall be divided to set of chunks, each of them representing a small set of facts. Ingesting these facts into the prompt allows to provide a LLM the necessary information which was not know at the learning time. The RAG approach often operates with the private data extracted from databases, such as an account balance. The work [33] illustrates substantial benefits of using GraphRAG against baseline RAG. GraphRAG uses the LLM to create knowledge graph based on private database, this graph is then used alongside graph machine learning to perform prompt augmentation at query time [33]. Despite the fact that the graph created by the LLM is not normalized (and probably not optimal), the very fact of using ontological modeling suggests a very promising future for graph technologies. In another paper [34], the researchers have justified that GraphRAG is a strong method for Query-Focused Summarization. The work [36] shows an example of using GraphRAG to simplify performing the code manipulation tasks for LLMs.
Despite the promising potential of ontologies, there are several difficulties and limitations to scaling it in all the domains mentioned in this work:
As we have shown below, many use cases in advanced industries (that contribute to the development of technological progress) are already using ontologies as the best form of digital representation of the real-world domains knowledge. Ontologies as knowledge representation formalism are based on scientific theories, in particular on mathematical logic. This fact allows to design and develop unified, data-centric, application-independent, reusable, scalable, compatible, logically verifiable and standardized solutions. Ontology-based solutions in their current form can optimize costs of development and maintenance by reducing the number of applications and integration interactions, and simplifying changes management. Definitive conclusion from Gartner regarding knowledge graphs and ontologies states [35]:
“...But one standout in this year's Hype Cycle is the pivotal position of knowledge graphs. Positioned on the "Slope of Enlightenment," knowledge graphs are increasingly understood for their benefits to enterprises, leading to more pilot projects. They are recognized as critical enablers for effectively applying generative AI in enterprise environments.”