Abstract: The present disclosure provides a system and a method for efficient retrieval of information from large knowledge graphs. The system provides enrichment as well as faster retrieval of knowledge from large knowledge graphs. Data may be used as a reference, for one such huge database of knowledge graph for consumption. An input user query helps the system to predict user intention. Further, the system extracts related entities ordered by similarity between intention and prediction.
DESC:RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as but are not limited to, copyright, design, trademark, integrated circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
FIELD OF INVENTION
[0002] The embodiments of the present disclosure generally relate to systems and methods for information retrieval from knowledge graphs. More particularly, the present disclosure relates to a system and a method for efficient retrieval of information from knowledge graphs (KGs) that is efficient and provides faster retrieval of data.
BACKGROUND OF INVENTION
[0003] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
[0004] Generally, information retrieval (IR) from knowledge graphs (KGs), and enriching KGs may help in building language technology that can excel at entity recognition, relationship prediction, and other similar natural language processing (NLP) tasks. However, difficulties may be encountered while retrieving entities of interest based on the input labels/description and/or some of the properties related to the entities of interest.
[0005] Currently, a conventional method may include a protocol and resource description framework (RDF) query language for retrieving entities of interest. However, the conventional method may lack user-friendly interfaces and users may be required to write complex technical queries. Further, the users may lack options in observing results in real-time and changing the algorithm parameters or using a new algorithm altogether. Other existing solutions may not perform dynamic and nested search results retrieval or graph-like retrieval.
[0006] Another convention system and/or method may have a data query service that may provide its own platform to query on the database. Said database may be a central storage repository, consisting mainly of discrete and unique items, where an item can be an entity or a concept. Entities represent objects, persons, events, places, etc. Herein, the data may be seamless and faster for entity retrieval as it uses a blaze graph (graph database) internally. However, due to the integration of data in the server, data may take time to return query results. Further, for a complex query, where the result size is huge, a timeout for the query request may be initiated and therefore integration with any application program interface (API) might be a challenge. Further, data may utilize its setup, but there may be a different result for the same query by changing result size and sometimes the results may be unrelated to the query. Few of the queries e.g., prior having an unknown result (these are optimizers which provides more weightage to query time than result) may generate different results.
[0007] A data search may include multiple docker images for data. Multiple docker images may not provide indexing of the complete data. Further, mapping of the index cannot be changed and database file with claims having a nested structure for the data cannot be queried in an effective way. Also, the data search may include labels and descriptions of the entity but may not be able to return other nested property value pairs. Also, the data search may be unable to update the entities values on the indices after the data dump update. Additionally, a user cannot change the mapping of the index and a “tool” required for querying may be unavailable as the data search uses a paid plugin on the tool. The data search for the tool may be integrated with paid or without an x-pack for query analysis. Further, mapping caters to the nested structure of claims and references unlike their data search which makes querying over nested properties impossible.
[0008] An entities search engine may include filtering claims by specifying property, for instance, a value pair (P31:Q5) alongside the import statement. Filtering may be observed on data filedump directly inside a script and an option of exporting the same to data search (without claims) may be provided. However, lesser flexibility may be extended to the user and performance issues related to extracting/querying much more complex data may be encountered.
[0009] Data may include importing subset of data through query API via a filter module. Next step is to export the same to the data search (only labels and aliases). Time estimates may include loading all 8 million humans into a web services data search (no mention of number and type of machines used) index which may take about 20 minutes. However, lesser flexibility may be provided to the user and performance issues related to extracting/querying much more complex data may be observed.
[0010] In another approach, data search may include importing data file dump using a script and then loading into the data search (without indexing claims). Additionally, transformation of the data may not exist and lesser flexibility may be extended to the user. Further, performance issues while extracting/ querying much more complex data may be observed.
[0011] Another conventional method includes creating and querying personalized versions of data on a laptop. KGs in tab-separated files may include an edge-identifier, a head, an edge-label, and a tail to be inserted in the system. Graph analytics commands support scalable computation of centrality metrics such as a page rank, degrees, and connected components. Further, shortest paths and advanced commands support lexicalization of graph nodes and computation of multiple variants of text and graph embeddings over the whole graph. Queries may be translated into a query language and executed on a database. However, this method uses data search for querying data and is much faster compared to other systems as per the response time specified. Also, an auto cache mechanism may be provided by the data search that may be readily avalailble for utlization.
[0012] There is, therefore, a need in the art to provide a system and a method that can mitigate the problems associated with the prior arts.
OBJECTS OF THE INVENTION
[0013] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.
[0014] It is an object of the present disclosure to provide a system and a method that provides enrichment as well as a faster retrieval of knowledge from large knowledge graphs (KGs).
[0015] It is an object of the present disclosure to provide a system and a method that maps data and uses data search to query and index data for faster retrieval of data.
[0016] It is an object of the present disclosure to provide a system and a method that uses a service that updates the index for efficient retrieval of data from KGs.
[0017] It is an object of the present disclosure to provide a system and a method that provides efficient retrieval of entities to a user in an offline setting.
[0018] It is an object of the present disclosure to provide a system and a method that utilizes a data search based user interface (UI) for accesssing similar data entities in a graph-like fashion.
[0019] It is an object of the present disclosure to provide a system and a method that utilizes an application programming interface (API) to load data into KGs based on the type of entities.
[0020] It is an object of the present disclosure to provide a system and a method that utilizes dynamic attribute mapping on data search indexing.
SUMMARY
[0021] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0022] In an aspect, the present disclosure relates to a system for an efficient retrieval of data from knowledge graphs. The system may include one or more processors that may be coupled with a memory, where the memory stores instructions to be executed by the one or more processors. The one or more processors may receive one or more queries from one or more users via one or more computing devices. The one or more computing devices may be connected to the system via a network. The one or more queries may be based on one or more entities. The one or more processors may index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs. The one or more processors may predict, via an artificial intelligence (AI) engine, one or more related entities associated with the one or more entities based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
[0023] In an embodiment, the one or more entities may include at least one of a company, an organisation, a university, a lab facility, a business enterprise, a defence facility, a website, a platform, and a secured facility.
[0024] In an embodiment, the one or more processors may utilize at least one of a nested field type of mapping and a dynamic key mapping to generate the one or more transformed knowledge graphs.
[0025] In an embodiment, the dynamic key mapping may include mapping of at least one of labels, descriptions, aliases, and sitelinks.
[0026] In an embodiment, the one or more processors may generate one or more vectors associated with the generated one or more transformed knowledge graphs.
[0027] In an embodiment, the one or more processors may utilize one or more property subclasses associated with the one or more entities to generate the one or more vectors.
[0028] In an embodiment, the AI engine may utilize the generated one or more vectors to predict the one or more related entities via a cosine similarity technique.
[0029] In an embodiment, the one or more processors may utilize an application programming interface (API) to access one or more data and one or more attributes of the one or more entities.
[0030] In an embodiment, the one or more processors may utilize the API to access one or more data and one or more attributes of the predicted one or more related entities.
[0031] In an embodiment, the one or more processors may be coupled with a user interface (UI) to receive the one or more queries from the one or more users.
[0032] In an embodiment, the UI may be configured with a tool to enable the one or more processors to process the one or more queries.
[0033] In an aspect, the present disclosure relates to a method for an efficient retrieval of data from knowledge graphs. The method may include receiving, by one or more processors, one or more queries from one or more users via one or more computing devices. The one or more queries may be based on one or more entities. The method may include indexing, by the one or more processors, one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs. The method may include predicting, by the one or more processors via an AI engine, one or more related entities associated with the one or more entities based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
[0034] In an embodiment, the method may include utilizing, by the one or more processors, at least one of a nested field type of mapping and a dynamic key mapping for generating the one or more transformed knowledge graphs.
[0035] In an embodiment, the method may include generating, by the one or more processors, one or more vectors associated with the generated one or more transformed knowledge graphs.
[0036] In an embodiment, the method may include utilizing, by the one or more processors, an API to access one or more data and one or more attributes of the one or more entities.
[0037] In an embodiment, the method may include utilizing, by the one or more processors, the API to access one or more data and one or more attributes of the predicted one or more related entities.
[0038] In an embodiment, the method may include utilizing, by the one or more processors, a tool to process the one or more queries.
[0039] In an aspect, the present disclosure relates to a user equipment (UE) for an efficient retrieval of data from knowledge graphs. One or more processors of the UE may be communicatively coupled to one or more processors in a system. The one or more processors of the UE may be coupled with a memory that stores instructions to be executed by the one or more processors that may cause the UE to transmit one or more queries to the one or more processors in the system via a network. Further, the one or more processors may receive the one or more queries from the UE. The one or more queries may be based on one or more entities. The one or more processors may index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs. The one or more processors may predict, via an AI engine, one or more related entities associated with the one or more entities based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
BRIEF DESCRIPTION OF DRAWINGS
[0040] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes the disclosure of electrical components, electronic components, or circuitry commonly used to implement such components.
[0041] FIG. 1 illustrates an exemplary network architecture (100) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0042] FIG. 2A illustrates an exemplary block diagram (200) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0043] FIGs. 2B and 2C illustrate exemplary schematic diagrams (220, 230) of a user interface (UI) with entity identification (ID), language, and description fields for two different entities with similar labels, in accordance with embodiments of the present disclosure.
[0044] FIG. 2D illustrates an exemplary schematic diagram (240) of a UI with additional fields such as property, rank, and statement group, in accordance with an embodiment of the present disclosure.
[0045] FIGs. 2E and 2F illustrate exemplary schematic diagrams (250, 260) of a UI for indexing data documents with a flattened field type mapping and a nested field type mapping respectively, in accordance with embodiments of the present disclosure.
[0046] FIG. 3A illustrates an exemplary flow diagram representation (300) of a process for transformation on each nested claim to avoid mapping exception in data search that includes dynamic keys for nested claims, in accordance with an embodiment of the present disclosure.
[0047] FIG. 3B illustrates an exemplary flow diagram representation (310) of transformation on labels, descriptions, aliases, and sitelinks to avoid mapping exception in data search which includes dynamic keys for labels, descriptions, aliases, and site links, in accordance with an embodiment of the present disclosure.
[0048] FIG. 4A illustrates an exemplary flow diagram representation (400) of a process for accessing data with various options and filters to retrieve data via a UI, in accordance with an embodiment of the present disclosure.
[0049] FIG. 4B illustrates an exemplary schematic diagram representation (410) of a UI with a section describing “Add Query” or “Execute Query” based on static variables, in accordance with an embodiment of the present disclosure.
[0050] FIG. 4C illustrates an exemplary schematic diagram representation (420) of a UI with a query section that provides an option to define queries and add multiple queries, in accordance with an embodiment of the present disclosure.
[0051] FIGs. 4D-4E illustrate an exemplary schematic diagram representations (430) of a UI that illustrates deletion of an already added query, in accordance with an embodiment of the present disclosure.
[0052] FIG. 5A illustrates an exemplary flow diagram representation (500) of a process for accessing the data via a UI, an application programming interface (API), and a tool, in accordance with an embodiment of the present disclosure.
[0053] FIG. 5B illustrates an exemplary flow diagram representation (510) of a process for adding entity to a knowledge database (DB), in accordance with an embodiment of the present disclosure.
[0054] FIG. 6 illustrates an exemplary computer system (600) in which or with which embodiments of the present disclosure can be utilized, in accordance with embodiments of the present disclosure.
[0055] The foregoing shall be more apparent from the following more detailed description of the disclosure.
BRIEF DESCRIPTION OF THE INVENTION
[0056] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0057] The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0058] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[0059] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0060] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[0061] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0062] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0063] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGs. 1-6.
[0064] As illustrated in FIG. 1, the network architecture (100) may include a system (110) for efficient retrieval of data from knowledge graphs. The system (110) may be connected to one or more computing devices (104-1, 104-2…104-N) through a network (106). A person skilled in the art will understand that the one or more computing devices (104-1, 104-2…104-N) may be collectively referred as computing devices (104) and individually referred as a computing device (104). Further, the computing devices (104) may also be known as user equipment (UE) (104) that may include, but not be limited to, a mobile, a laptop, etc. A person of ordinary skill in the art will appreciate that the computing devices or UEs (104) may not be restricted to the mentioned devices and various other devices may be used. Further, the computing devices (104) may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as a camera, audio aid, microphone, or keyboard. The computing devices (104) may include a mobile phone, smartphone, virtual reality (VR) devices, augmented reality (AR) devices, a laptop, a general-purpose computer, a desktop, personal digital assistants, a tablet computer, and a mainframe computer. Additionally, input devices for receiving input from a user such as a touchpad, touch-enabled screen, electronic pen, and the like may be used. One or more users (102-1, 102-2…102-N) may operate the computing devices (104). A person skilled in the art will understand that the one or more users (102-1, 102-2…102-N) may be collectively referred as users (102) and individually referred as a user (102). One or more entities (108) collectively referred as entities (108) may be connected to the system (110). The system (110) may include an artificial intelligence (AI) engine (112) for predicting one or more related entities associated with the entities (108) based on the generated one or more transformed knowledge graphs.
[0065] In an embodiment, the network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0066] In an embodiment, the system (110) may receive one or more queries from one or more users (102) via one or more computing devices (104). Further, the system (110) may index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs.
[0067] In an embodiment, the system (110) may predict, via the AI engine (112), one or more related entities associated with the one or more entities (108) based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
[0068] Referring to FIG. 1, the system (110) may receive one or more queries from the users (102) via the computing devices (104). The system (110) may index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs.
[0069] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0070] FIG. 2A illustrates an exemplary block diagram (200) of a proposed system (110), in accordance with an embodiment of the present disclosure.
[0071] Referring to FIG. 2A, the system (110) may comprise one or more processor(s) (202) that may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.
[0072] In an embodiment, the system (110) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output (I/O) devices, storage devices, and the like. The interface(s) (206) may also provide a communication pathway for one or more components of the system (110). Examples of such components include, but are not limited to, processing engine(s) (208), an AI engine (210), and a database (212).
[0073] The processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (110) may comprise machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (110) and the processing resource. In other examples, the processing engine(s) (208), the AI engine (210), and the database (212) may be implemented by electronic circuitry. A person ordinary skilled in the art will understand that the AI engine (210) of FIG. 2A may be similar to the AI engine (112) of FIG. 1 in its functionality.
[0074] In an embodiment, the one or more processors (202) may receive one or more queries from the users (102) via the computing devices (104). The one or more queries may be based on one or more entities (108). The one or more entities (108) may include, but not be limited to, a company, an organisation, a university, a lab facility, a business enterprise, a defence facility, a website, a platform, and a secured facility.
[0075] In an embodiment, the one or more processors (202) may index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs. Further, the one or more processors may predict, via the AI engine (210), one or more related entities associated with the one or more entities (108) based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
[0076] In an embodiment, the one or more processors (202) may utilize a nested field type of mapping and one or more dynamic key mapping to generate the one or more transformed knowledge graphs. The one or more dynamic key mapping may include mapping of, but not limited to, labels, descriptions, aliases, and site links.
[0077] In an embodiment, the one or more processors (202) may generate one or more vectors associated with the generated one or more transformed knowledge graphs. Further, the one or more processors (202) may utilize one or more property subclasses associated with the one or more entities (108) to generate the one or more vectors.
[0078] In an embodiment, the AI engine (210) may be configured to utilize the generated one or more vectors to predict the one or more related entities via a cosine similarity technique.
[0079] In an embodiment, the one or more processors (202) may utilize an application programming interface (API) to access one or more data and one or more attributes of the one or more entities.
[0080] In an embodiment, the one or more processors (202) may utilize an API to access one or more data and one or more attributes of the predicted one or more related entities.
[0081] In an embodiment, the one or more processors (202) may be configured a user interface (UI) to receive the one or more queries from one or more users (102). Further, the UI may be configured with a tool to enable the one or more processors (202) to process one or more complex queries.
[0082] FIGs. 2B and 2C illustrate exemplary schematic diagrams (220, 230) of a UI with entity identification (ID), language, and description fields for two different entities with similar labels, in accordance with embodiments of the present disclosure.
[0083] As illustrated in FIG. 2B, items in data may be uniquely identified by a Q followed by a number. As an example, an entity identification (ID), language, and description fields are shown in FIG. 2B. For example, item Q17738 may represent the 1977 film “Star Wars”. A text field called ‘Label’ may represent similar names of an entity in English as well as in multiple languages. Yet another text field ‘Description’ may represent detailed information about an entity of interest.
[0084] As illustrated in FIG. 2B, another entity with similar labels but with different description is shown as an example. Two different entities may have similar labels, but description along with labels is generally different for different entities. Label and description together may help to distinguish between different entities. As illustrated in FIG. 2B and FIG. 2C, both refer to the same named entity ‘Delhi’, but the difference may be visible through their descriptions.
[0085] FIG. 2D illustrates an exemplary schematic diagram (240) of a UI with additional fields such as property, rank, and statement group, in accordance with an embodiment of the present disclosure.
[0086] As illustrated in FIG. 2D, the filed name ‘Property’ may also play an important role in addition to the fields ‘Label’ and ‘Description’ for finding relevant entities. Further, a rank and a statement group may also be associated with the field ‘Property’. Additionally, more data constructs namely ‘qualifiers’, ‘references’ ( includes collapsed and opened references) may also be asosciated with the field ‘Property’.
[0087] A data file dump attributes may be described as below:
ID: The canonical ID of the entity
Labels: Contains labels in different languages
Language: language code
Value: label as per language code
Aliases: Contains also knows as (a.k.a.) in different languages
Language: language code
Value: a.k.a. as per language code
Descriptions: Contains descriptions in different languages
Language: language code
Value: description as per language code
Site links: Contains site links to pages on different sites describing the item
Badges: Any “badges” associated with the page (such as “featured article”). Badges are given as a list of item IDs.
Site: site global ID
Title: page title
Claims: Contains any number of statements, groups by property
Structure value: The value representing the value to be associated with the property. The Property specified in the structure value is same as the Property the Statement is associated with.
Property: The ID of the property this value is about
Data value: If the valuetype is value, there is a datavalue field that contains the actual value the value associates with the Property
value_globecoordinate
value_monolingualtext
value_quantity
value_string
value_time
value_base-entityid
Qualifiers: Qualifiers provide a context for the primary value, such as the point in time of measurement. Qualifiers are given as lists of values, each associated with one property
Property: The ID of the property this value is about
Data value: If the valuetype is value, there is a datavalue field that contains the actual value the value associates with the Property
value_globecoordinate
value_monolingualtext
value_quantity
value_string
value_time
value_base-entityid
References: References record provenance information for the data in the structure value value and qualifiers. They are given as a list of reference records.
Values: The value representing the value to be associated with the property. The Property specified in the value is same as the Property the reference of a statement is associated with.
Property: The ID of the property this value is about.
Data value: If the valuetype is value, there is a data value field that contains the actual value the value associates with the Property
value_globecoordinate
value_monolingualtext
value_quantity
value_string
value_time
value_base-entityid
[0088] In another instance, data search may be a distributed open-source search making use of an analytics engine. Search allows to store, search, and analyse huge volumes of data quickly in near real-time and gives back answers in milliseconds. Search may be able to achieve fast search responses as data search searches an index instead of searching the text directly. Further, data search may use a structure based on documents instead of tables and schemas, and comes with an extensive representational state transfer application programming interfaces (REST APIs) for storing and searching data. At its core, the data search may act as a server that can process file requests and return file data. The entities indexed in data search as index may also be used by a machine learning (ML) API to convert text into vectors and further use cosine similarity to find related entities other than text matching.
[0089] In an exemplary embodiment, an indexing, a nested field type, and dynamic keys approach may be utilized in which an index may be a collection of documents that have similar characteristics. An index may be the highest-level entity that can be query against in the search. The index may be similar to a database in a relational database schema. Any documents in an index may be typically logically related. Further, an index may be identified by a name that is used to refer to the index while performing indexing, search, update, and delete operations against the documents. An index in search may be called as an inverted index, which may be a mechanism by which all search engines work. Further, an index may be a data structure that stores a mapping from content, such as words or numbers, to its locations in a document or a set of documents. Basically, an index may be a hash map-like data structure that directs user from a word to a document. Given an object, a flattened filed type mapping used by the search may parse out its leaf values and index them into one field as keywords. The object’s contents may be then searched through simple queries and aggregations. If names or types of the subfields are not known in advance, then the names may be mapped dynamically by allowing the user to explore data quickly.
[0090] In an embodiment, the present disclosure recommends indexing data documents to explicitly state nested field type in mapping and setting number of shards to be more than 10, which may decrease load on a single machine while keeping replicas initially 0 and may overall help to decrease the indexing time. Further, the nested field type may have an advantage over the default flattened field type as flattened field type when queried (P551:Q668) returns Q97046594 where P551 (residence) does not point towards Q668 (India) as shown in FIG. 2E, 250. Whereas in case of the nested field type, the user (102) may receive all the documents where P551 (residence) points towards Q668 (India) as expected is shown in FIG. 2F, 260.
[0091] FIG. 3A illustrates an exemplary flow diagram representation (300) of a process for transformation on each nested claim to avoid mapping exception in data search that includes dynamic keys for claims, in accordance with an embodiment of the present disclosure.
[0092] In an instance, a data dump may have dynamic keys which can explode data type mapping because these keys can be thousand in number and can be present or absent in some or other items or documents, therefore be inconsistent and unstructured. Due to this reason, the system (110) may iterate through all the documents one by one in the data file dump and perform transformation on each nested claim to avoid mapping exception in the search. FIG. 3A is one such illustration of the transformation performed. Here, P31 can be removed from the structure because property field inside structure value has value P31.
[0093] FIG. 3B illustrates an exemplary flow diagram representation (310) of transformation on labels, descriptions, aliases, and sitelinks to avoid mapping exception in data search which includes dynamic keys for labels, descriptions, aliases, and site links, in accordance with an embodiment of the present disclosure.
[0094] In an exemplary embodiment, the data dump may have dynamic keys which can explode data type mapping because these keys can be thousand in number and can be present or absent in some or other items or documents, therefore inconsistent and unstructured. Due to this reason, the system (110) may iterate through all the documents one by one in the data file dump and perform transformation on labels, descriptions, aliases, and sitelinks to avoid mapping exception in the search.
[0095] In an exemplary embodiment, predicting user intention through user provided query may also be incorporated in the system (110). For a user provided query, for instance of (P31), a public research university (Q62078547) query provides all data entities where P31: Q62078547 is present (one such document being Q1194650). Here, user intention seems to be getting a list of all the documents/entities which are instance of approximately a kind of university. Hence, the prediction of intention could be right or wrong because a user (102) might be interested in that specific instance only and might not be interested in letting the system (110) predict user intention. In such an instance, a toggle may be provided to the user (102) to switch between the strict behaviour and the predictive behaviour of the system (110). When the system (110) is predicting the instance of a particular kind of university, the system (110) may need to create a list of similar value or a list of all types of universities (one of which is public research university). When the user (102) looks at the data page of a public research university, the user (102) may observe that this entity and other related entities such as academic institutions have been ranked or put into hierarchy (expected behaviour in case of a knowledge graph) with the help of a property subclass. Therefore, if a user (102) traverses through one entity such as a public research university and explores values in the property subclass, the user (102) may receive a public university, a research university, and an academic institution. Now, if the user (102) explores each of the entities and the value of the property subclass, the user (102) may receive more such entities with the help of which the user (102) can reach at the bottom of this hierarchical tree and would get all types of universities from such traversal.
[0096] Before trying to predict directly using such a method, the system (110) may be built with an entity to entity hop (E2EH) table where one such table would help deduce number of hops required to go from one entity such as a public research university to another entity such as an academic institution. The user (102) may need to receive 2 hops (default being 1 and another 1 to get to academic institution and 0 if no possible hop).
[0097] Here instead of +1 for every parent to child or child to parent traversal, the user (102) may also opt for
I = I + [(-1) ^n] *(k) where I is initial value, k>0 and n=2 for parent to child t reversal and n=1 for child to parent traversal.
Here, n and k can be fine-tuned as per needs of the user (102) and may run some initial processes to come up with best possible values for n and k.
[0098] The system (110) may create one such table only for property subclass of, which helps rank or put into hierarchy, values that come under property instance of. As shown in Table 1 below (here k=1 and n=2 for both parent to child and child to parent traversal) for limited entities. This can be performed for all the entities possible.
public research university public university research university academic institution . . .
public research university 1 2 2 2 . . .
public university 2 1 0 0 . . .
research university 2 0 1 0 . . .
academic institution 2 0 0 1 . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Table 1
[0099] Table 1 might look as shown above, however, going through the entities in the knowledge graph (KG) file, final table may help resolve intentions quite easily and quickly. This is one time traversal which can be updated if there are any updates in the hierarchy of any of the required entities or any updates in the KG file. To make this concept more general, a user-provided query: must – property 1: value 1 provides all data entities where Property 1: Value 1 must be present.
[00100] Here, user intention seems to be obtaining a list of all the documents/entities that belong to Property 1 ~ Value 1. When predicting Property 1 ~ Value1, the system (110) may need to create a list of similar value or a list of values (one of which is Value 1). When user (102) looks at data page of Value 1, this entity and other related entities may have been ranked or put into hierarchy with the help of some property - Property 2/Property 1. Therefore, if user (102) traverses through one entity such as Value 1 and explores values in the property Property2/Property1, the system (102) provides a Value 2, Value 3, Value 4, and so on. Exploring each of these entities and the value of their property Property2/Property1, the user (102) may receive more such entities, with the help of which the users (102) can reach at the bottom of this hierarchical tree and would get all related entities of Value 1 from such traversal. Number of tables would be equal to number of properties to be applied this concept. Therefore, there may be a separate table in case of instance of/subclass of, separate table for Property 1/Property 2, and so on. Same concept may be further applied on property in the same manner to predict properties close to user mentioned property. For example, property may have parts related to facility, property field of work related to property occupation and so on. Hence, the E2EH table illustrated as Table 1 may be utilised in two ways, first to select column Value 1 and filter that column up to a threshold and pick index values which will be threshold hops to the left and threshold hops to the right of Value 1 in a KG. Now, this list of values may be queried against specified user property ‘Property1’ one by one to get all the predicted documents. The next may include to creation of a vector of each column and inserting/updating the vector with the indexed data dump, nested along the column value (here Q62078547) inside a KG as shown below.
“value_base-entityid” : {
“entity-type” : “item”,
“id”: “Q62078547”,
“E2EH” : {
“table” : “subclass_of”,
“vector”: [1,2,2,2]
}
}
[00101] This way, the system (110) may perform cosine similarity between vectors as per user mentioned query (P31:Q62078547) and other vectors under E2EH object and a table subclass. Now by ranking the result in descending order (which is, how related that document is to the mentioned user query), the user (102) may have all the predicted documents. However, the system (110) may need to decide upon a threshold/cut-off value for the result, since entities which are at 15 or 20 hops away from the user mentioned query may not be allowed.
[00102] FIG. 4A illustrates an exemplary flow diagram representation (400) of a process for accessing data with various options and filters to retrieve data via a UI, in accordance with an embodiment of the present disclosure.
[00103] The frontend of the UI may be divided into three sections, in which a first section may be an execution section where the user (102) get the abilities to ‘Add Query’ or ‘Execute Query’ based on static variables given below and further illustrated in FIG. 4B, 410:
Lang code: language code for labels, descriptions, aliases
Size: number of documents
Fields: attributes for the resulting documents
Order by: lets the user order resulted documents based on sitelinks or social media followers.
[00104] The second section may be a query section where the user (102) may define queries as depicted in FIG. 4C, 420.
[00105] FIGs. 4D-4E illustrate an exemplary schematic diagram representations (430) of a UI that illustrates deletion of an already added query, in accordance with an embodiment of the present disclosure.
[00106] Further, the user (102) may add multiple queries with desired operations (such as must, must not, should, and filter) and additionally define a subquery defining operation on the qualifiers. Here, the user (102) may also delete an already added query as depicted in FIG. 4D, 430. Further, the user (102) may lookout for autocomplete and auto suggest features for input fields as depicted in FIG. 4E, 440.
[00107] The third section may be a ‘display section’ where all the documents may be displayed based on the queries added into the query section. Here, the user (102) may download all the resulted documents file to execute in the users personal space.
[00108] Further, data may be accessed via a tool interface. The tool interface may provide support for testing queries in case of further modifications based to a requirement. The tool interface may additionally handle complex queries related to entity retrieval.
[00109] In addition, a score attribute may be provided to rank the results as per query parameters and text passed from the frontend. These results can be retrieved using a variety of natural language processing (NLP) techniques to keep the tool updated with state-of-the-art techniques. The user (102) can view the results in real time with an alternate option to change the algorithms to do the matching.
[00110] FIG. 5A illustrates an exemplary flow diagram representation (500) of a process for accessing the data via a UI, an API, and a tool, in accordance with an embodiment of the present disclosure.
[00111] In an exemplary embodiment, the data dump extraction mapping and indexing in the search may be performed using a mapping server, which maps the dynamic attributes and indexes the dynamic attributes. Further, a service may be provided which periodically checks if the latest dump is available and downloads the latest dump from the server. The service may also update the latest entities into the search index, interface to access the data via a UI, an API, or tool. The API may be accessed to get file data of the entity, kind of entities or attribute of entities. Finally, the system (110) may add an entity to a knowledge data base (DB) for either full load or partial load (enrichment of existing entity).
[00112] As illustrated in FIG. 5A, the API layer may query on a search cluster. A tool and a UI layer may interact with the search cluster and provide a KG update.
[00113] FIG. 5B illustrates an exemplary flow diagram representation (510) of a process for adding an entity to a knowledge database, in accordance with an embodiment of the present disclosure.
[00114] As illustrated in FIG. 5B, a schema convert for KG input may be updated with data and an input for full load or partial load. The schema convert may further add an entity to the knowledge database.
[00115] FIG. 6 illustrates an exemplary computer system (600) in which or with which the proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[00116] As shown in FIG. 6, the computer system (600) may include an external storage device (610), a bus (620), a main memory (630), a read-only memory (640), a mass storage device (650), a communication port(s) (660), and a processor (670). A person skilled in the art will appreciate that the computer system (600) may include more than one processor and communication port(s). The communication port (660) may be chosen depending on a network, such as a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (600) connects. The main memory (630) may be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (640) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chip for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor (570). The mass storage device (650) may be any current or future mass storage solution, which can be used to store information and/or instructions.
[00117] The bus (620) may communicatively couple the processor(s) (670) with the other memory, storage, and communication blocks. Optionally, operator and administrative interfaces, e.g., a display, keyboard, and cursor control device may also be coupled to the bus (620) to support direct operator interaction with the computer system (600). Other operator and administrative interfaces can be provided through network connections connected through the communication port (660). In no way should the aforementioned exemplary computer system (600) limit the scope of the present disclosure.
[00118] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the disclosure and not as a limitation.
ADVANTAGES OF THE INVENTION
[00119] The present disclosure provides a system and a method that provides enrichment as well as a faster retrieval of knowledge from large knowledge graphs (KGs).
[00120] The present disclosure provides a system and a method that maps data and uses search to query and index data for faster retrival of data.
[00121] The present disclosure provides a system and a method that uses a service that updates the index for efficient retrieval of data from knowledge graphs.
[00122] The present disclosure provides a system and a method that provides efficient retrieval of entities to a user in an offline setting.
[00123] The present disclosure provides a system and a method that utilizes a search based user interface (UI) for accesssing the similar data entities in a graph like fashion.
[00124] The present disclosure provides a system and a method that utilizes an application programming interface (API) to load data into knowledge graph based on the type of entities.The present disclosure provides a system and a method that utilizes dynamic attribute mapping on search indexing.
,CLAIMS:1. A system (110) for an efficient retrieval of data, the system (110) comprising:
one or more processors (202) coupled with a memory (204), wherein said memory (204) stores instructions which when executed by the one or more processors (202) causes the one or more processors (202) to:
receive one or more queries from one or more users (102) via one or more computing devices (104), wherein the one or more queries are based on one or more entities (108);
index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs; and
predict, via an artificial intelligence (AI) engine (114), one or more related entities associated with the one or more entities (108) based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
2. The system (110) as claimed in claim 1, wherein the one or more entities (108) comprise at least one of: a company, an organisation, a university, a lab facility, a business enterprise, a defence facility, a website, a platform, and a secured facility.
3. The system (110) as claimed in claim 1, wherein the one or more processors (202) are configured to utilize at least one of: a nested field type of mapping and a dynamic key mapping to generate the one or more transformed knowledge graphs.
4. The system (110) as claimed in claim 3, wherein the dynamic key mapping comprises mapping of at least one of: labels, descriptions, aliases, and sitelinks.
5. The system (110) as claimed in claim 1, wherein the one or more processors (202) are configured to generate one or more vectors associated with the generated one or more transformed knowledge graphs.
6. The system (110) as claimed in claim 5, wherein the one or more processors (202) are configured to utilize one or more property subclasses associated with the one or more entities (108) to generate the one or more vectors.
7. The system (110) as claimed in claim 5, wherein the AI engine (114) is configured to utilize the generated one or more vectors to predict the one or more related entities via a cosine similarity technique.
8. The system (110) as claimed in claim 1, wherein the one or more processors (202) are configured to utilize an application programming interface (API) to access one or more data and one or more attributes of the one or more entities (108).
9. The system (110) as claimed in claim 8, wherein the one or more processors (202) are configured to utilize the API to access one or more data and one or more attributes of the predicted one or more related entities.
10. The system (110) as claimed in claim 1, wherein the one or more processors (202) are coupled with a user interface (UI) to receive the one or more queries from the one or more users (102).
11. The system (110) as claimed in claim 10, wherein the UI is configured with a tool to enable the one or more processors (202) to process the one or more queries.
12. A method for an efficient retrieval of data, the method comprising:
receiving, by one or more processors (202), one or more queries from one or more users (102) via one or more computing devices (104), wherein the one or more queries are based on one or more entities (108);
indexing, by the one or more processors (202), one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs; and
predicting, by the one or more processors (202), via an artificial intelligence (AI) engine (114), one or more related entities associated with the one or more entities (108) based on the generated one or more transformed knowledge graphs for the efficient retrieval of data.
13. The method as claimed in claim 12, comprising utilizing, by the one or more processors (202), at least one of: a nested field type of mapping and a dynamic key mapping for generating the one or more transformed knowledge graphs.
14. The method as claimed in claim 12, comprising generating, by the one or more processors (202), one or more vectors associated with the generated one or more transformed knowledge graphs.
15. The method as claimed in claim 12, comprising utilizing, by the one or more processors (202), an application programming interface (API) to access one or more data and one or more attributes of the one or more entities (108).
16. The method as claimed in claim 15, comprising utilizing, by the one or more processors (202), the API to access one or more data and one or more attributes of the predicted one or more related entities.
17. The method as claimed in claim 12, comprising utilizing, by the one or more processors (202), a tool to process the one or more queries.
18. A user equipment (UE) (104) for an efficient data retrieval, the UE (104) comprising:
one or more processors communicatively coupled to one or more processors (202) in a system (110), wherein the one or more processors are coupled with a memory, and wherein said memory stores instructions which when executed by the one or more processors causes the UE (104) to:
transmit one or more queries to the one or more processors (202) via a network (116),
wherein the one or more processors (202) are configured to:
receive the one or more queries from the UE (104), wherein the one or more queries are based on one or more entities (108);
index one or more knowledge graphs with the received one or more queries to generate one or more transformed knowledge graphs; and
predict, via an artificial intelligence (AI) engine (114), one or more related entities associated with the one or more entities (108) based on the generated one or more transformed knowledge graphs for the efficient data retrieval.
| # | Name | Date |
|---|---|---|
| 1 | 202221019659-STATEMENT OF UNDERTAKING (FORM 3) [31-03-2022(online)].pdf | 2022-03-31 |
| 2 | 202221019659-PROVISIONAL SPECIFICATION [31-03-2022(online)].pdf | 2022-03-31 |
| 3 | 202221019659-POWER OF AUTHORITY [31-03-2022(online)].pdf | 2022-03-31 |
| 4 | 202221019659-FORM 1 [31-03-2022(online)].pdf | 2022-03-31 |
| 5 | 202221019659-DRAWINGS [31-03-2022(online)].pdf | 2022-03-31 |
| 6 | 202221019659-DECLARATION OF INVENTORSHIP (FORM 5) [31-03-2022(online)].pdf | 2022-03-31 |
| 7 | 202221019659-FORM-26 [16-05-2022(online)].pdf | 2022-05-16 |
| 8 | 202221019659-DRAWING [02-03-2023(online)].pdf | 2023-03-02 |
| 9 | 202221019659-CORRESPONDENCE-OTHERS [02-03-2023(online)].pdf | 2023-03-02 |
| 10 | 202221019659-COMPLETE SPECIFICATION [02-03-2023(online)].pdf | 2023-03-02 |
| 11 | 202221019659-FORM-8 [03-03-2023(online)].pdf | 2023-03-03 |
| 12 | 202221019659-FORM 18 [03-03-2023(online)].pdf | 2023-03-03 |
| 13 | 202221019659-ENDORSEMENT BY INVENTORS [03-03-2023(online)].pdf | 2023-03-03 |
| 14 | 202221019659-FORM-26 [17-03-2023(online)].pdf | 2023-03-17 |
| 15 | Abstract1.jpg | 2023-03-27 |
| 16 | 202221019659-FER.pdf | 2025-03-20 |
| 17 | 202221019659-FORM 3 [05-06-2025(online)].pdf | 2025-06-05 |
| 1 | SearchStrategyMatrixE_21-02-2024.pdf |