An Automated System And Method For Collating Data In An Ontology

Abstract: The present invention discloses an automated system (100) and method for collating data in an ontology. A receiver (108) receives data from a plurality of sources. A parser (112) parses the uncollated data and generates parsed text data. An extraction module (114) extracts at least one tag and at least one category for the parsed text data. The extraction module (114) includes a tokenizer (116), and a normalization module (118). The tokenizer (116) formats the parsed text data by breaking into a number of tokens based on regular expressions. The normalization module (118) then normalizes the tokens. A comparator module (122) compares the normalized tokens with the tags and generates a flag.

Patent Information

Application #

Filing Date

29 September 2018

Publication Number

14/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

info@krishnaandsaurastri.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-06-18

Renewal Date

Applicants

Bharat Electronics Limited

Corporate Office, Outer Ring Road, Nagavara, Bangalore – 560045

Inventors

1. Dipak Rai

BSTC/Central D&E Bharat Electronics Limited, Jalahalli PO, Banglore – 560013, Karnataka,

Specification

Claims:We Claim:

1. A method for collating data in an ontology, said method comprising steps of:
receiving, by a receiver (108), data from a plurality of sources;
storing, in a database (106), said received data, pre-defined categories and a set of tags for each pre-defined category, wherein the received data includes uncollated data;
parsing, by a parser (112), said uncollated data, and generate parsed text data;
extracting, by an extraction module (114), at least one tag and at least one category for said parsed text data;
formatting, by a tokenizer (116), said parsed text data by breaking into a number of tokens based on regular expressions;
normalizing, by a normalization module (118), said tokens; and
comparing, by a comparator module (122), said normalized tokens with said tags, and generating a flag.

2. The method as claimed in claim 1, further includes a step of updating, by an updater (124), said created category and tag in said database (106).

3. The method as claimed in claim 1, wherein said stored tags are synonymous to corresponding category or nearest attributes, for identifying the corresponding category in an operational context.

4. The method as claimed in claim 1, wherein said step of receiving said data includes a step of categorizing, by a categorization module (110), said received data to parent and child categories, if the parent and child categories have common tags between them.

5. The method as claimed in claim 4, wherein said step of categorizing includes a step of categorizing said received data to a child category, if the child category contains tags which are unique and are not present in another categories.

6. The method as claimed in claim 1, wherein each of said tags represent one or more categories.

7. An automated system (100) for collating data in an ontology, said system (100) comprising:
a memory (102) configured to store pre-determined rules;
a processor (104) configured to cooperate with said memory (102) to receive said pre-determined rules, said processor (104) configured to generate system processing commands based on said pre-determined rules;
a receiver (108) configured to receive data from a plurality of sources;
a database (106) configured to cooperate with said receiver (108) to receive and store said data, wherein said data includes uncollated data, said database (106) further configured to store pre-defined categories and a set of tags for each pre-defined category;
a parser (112) configured to cooperate with said database (106) to receive said uncollated data, said parser (112) configured to parse said uncollated data, and generate parsed text data;
an extraction module (114) configured to cooperate with said parser (112) and said database (106), said extraction module (114) configured to extract at least one tag and at least one category for said parsed text data, said extraction module (114) comprises:
a tokenizer (116) configured to format said parsed text data by breaking into a number of tokens based on regular expressions; and
a normalization module (118) configured to cooperate with said tokenizer to receive said tokens, said normalization module (118) configured to normalize said tokens; and
a comparator module (122) configured to cooperate with said extraction module (114) and said database (106) to receive said normalized tokens and said tags, said comparator module (122) configured to compare said normalized tokens with said tags, and generate a flag.

8. The system (100) as claimed in claim 7, further includes an updater (124) configured to update said created category and tag in said database (106).

9. The system (100) as claimed in claim 7, wherein said stored tags are synonymous to corresponding category or nearest attributes, for identifying the corresponding category in an operational context.

10. The system (100) as claimed in claim 7, wherein said receiver (108) comprises a categorization module (110) configured to categorize said received data to parent and child categories, if the parent and child categories have common tags between them.

11. The system (100) as claimed in claim 11, wherein said categorization module (110) configured to categorize said received data to a child category, if the child category contains tags which are unique and are not present in another categories.

12. The system (100) as claimed in claim 7, wherein each of said tags represent one or more categories.
Dated this 29th day of September, 2018
FOR BHARAT ELECTRONICS LIMITED
By their Agent

(GIRISH VIJAYANAND SHETH) (IN/PA 1022)
KRISHNA & SAURASTRI ASSOCIATES LLP
, Description:FORM 2
THE PATENTS ACT, 1970
(39 OF 1970)
&
THE PATENTS RULES, 2003

COMPLETE SPECIFICATION
[SEE SECTION 10, RULE 13]

AN AUTOMATED SYSTEM AND METHOD FOR COLLATING DATA IN AN ONTOLOGY

BHARAT ELECTRONICS LIMITED,
CORPORATE OFFICE, OUTER RING ROAD, NAGAVARA, BANGALORE – 560045, KARNATAKA, INDIA

THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.

TECHNICAL FIELD
[001] The present invention relates generally to collating data received from various data sources, and, particularly but not exclusively, to an automated system for collating data in an ontology.
BACKGROUND
[002] Typically, an ontology is a rigorous and exhaustive organization of knowledge domains that is usually hierarchal and contains all the relevant entities and their relations. In a field of unstructured information management, classifying the data into relevant categories is achieved by a qualitative comparison of the incoming data against a pre-existing set of data, categories, or labels. In many cases, the pre-existing set of data has a hierarchical relationship between itself. Such a hierarchical arrangement is known as taxonomy. If the pre-existing set of categories are not enough, then new categories are added in a hierarchical relationship to the pre-existing set of data. However, the problem arises in a hierarchical design of the taxonomy. This brings a lot of rigidity, resulting in a missed collation opportunity in a case of an incoming text or data with attributes synonymous, but not exactly equal to the taxonomical categories.
[003] US20160224645 discloses a system and method for ontology-based data integration that relies on a third-party dependent tagging system to annotate and integrate distributed data sources.
[004] US9406018 discloses a system and method for data integration and information retrieval by bringing semantically related data together for a given context. In US9406018, a concept of multilayered ontology is envisaged where the development of an ontology is left to collection and collation of knowledge from a variety of sources including at least, for example, existing data or concept dictionaries with the organization, expert input, etc.
[005] However, none of the prior art documents provide an ontology model which can be incorporated by any system, enterprise or application-based systems, alike to collate unstructured data, and is not based on third party systems.
[006] Therefore, there is a need for an automated system that provides collation of data in an ontology which automatically collates incoming data.

SUMMARY
[007] This summary is provided to introduce concepts related to providing collating data in an ontology. This summary is neither intended to identify essential features of the present invention nor is it intended for use in determining or limiting the scope of the present invention.
[008] For example, various embodiments herein may include one or more automated systems and methods for collating data in an ontology are provided.
[009] In one of the embodiments, the present invention discloses a method for collating data in an ontology. The method includes a step of receiving, by a receiver, data from a plurality of sources. Further, the method includes a step of storing, in a database, the received data, pre-defined categories and a set of tags for each pre-defined category, wherein the received data includes uncollated data. The method includes a step of parsing, by a parser, the uncollated data, and generating parsed text data. The method includes a step of extracting, by an extraction module, at least one tag and at least one category for the parsed text data. Furthermore, the method includes a step of formatting, by a tokenizer, the parsed data by breaking into a number of tokens based on regular expressions. The method includes a step of normalizing, by a normalization module, the tokens. Subsequently, the method includes a step of comparing, by a comparator module, the normalized tokens with the tags, and generating a flag.
[0010] In another implementation, an automated system is configured to collate data in an ontology. The system includes a memory, a processor, a receiver, a database, a parser, an extraction module, a comparator module, and a creation module. The memory is configured to store pre-determined rules. The processor is configured to generate system processing commands based on the pre-determined rules. The receiver is configured to receive data from a plurality of sources. The database is configured to store the received data, wherein the data includes uncollated data. Further, the database further stores pre-defined categories and a set of tags for each pre-defined category. The parser is configured to parse the uncollated data, and generate parsed text data. The extraction module is configured to extract at least one tag and at least one category for the parsed text data. The extraction module includes a tokenizer and a normalization module. The tokenizer is configured to format the parsed text data by breaking into a number of tokens based on regular expressions. The normalization module is configured to normalize the tokens. The comparator module is configured to compare the normalized tokens with the tags.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
[0011] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
[0012] Figure 1 illustrates a block diagram depicting an automated system for collating data in an ontology, according to an exemplary implementation of the present invention.
[0013] Figure 2 illustrates a schematic diagram depicting a schema of tags, according to an exemplary implementation of the present invention.
[0014] Figure 3 illustrates a network graph depicting a data ontology, according to an exemplary implementation of the present invention.
[0015] Figure 4 illustrates a schematic diagram depicting representation of the collated data against an ontology of a relational database system, according to an exemplary implementation of the present invention.
[0016] Figure 5 illustrates a flowchart depicting a workflow of the text processing collation, according to an exemplary implementation of the present invention.
[0017] Figure 6 illustrates a flowchart depicting a workflow of the decomposed steps of Figure 5, according to an exemplary implementation of the present invention.
[0018] Figure 7 illustrates a flowchart depicting a method for collating data in an ontology, according to an exemplary implementation of the present invention.
[0019] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present invention. Similarly, it will be appreciated that any flowcharts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
[0020] In the following description, for the purpose of explanation, specific details are set forth in order to provide an understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these details. One skilled in the art will recognize that embodiments of the present invention, some of which are described below, may be incorporated into a number of systems.
[0021] The various embodiments of the present invention provide an automated system and method for data collation in an ontology.
[0022] Furthermore, connections between components and/or modules within the figures are not intended to be limited to direct connections. Rather, these components and modules may be modified, re-formatted or otherwise changed by intermediary components and modules.
[0023] References in the present invention to “one embodiment” or “an embodiment” mean that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
[0024] In one of the embodiments, the present invention discloses a method for collating data in an ontology. The method includes a step of receiving, by a receiver, data from a plurality of sources. Further, the method includes a step of storing, in a database, the received data, pre-defined categories and a set of tags for each pre-defined category, wherein the received data includes uncollated data. The method includes a step of parsing, by a parser, the uncollated data, and generating parsed text data. The method includes a step of extracting, by an extraction module, at least one tag and at least one category for the parsed text data. Furthermore, the method includes a step of formatting, by a tokenizer, the parsed data by breaking into a number of tokens based on regular expressions. The method includes a step of normalizing, by a normalization module, the tokens. Subsequently, the method includes a step of comparing, by a comparator module, the normalized tokens with the tags.
[0025] In another implementation, the method includes a step of updating, by an updater, the created category and tags in a database for which the system provide graphical user interface.
[0026] In another implementation, the stored tags are synonymous to corresponding category or nearest attributes, for identifying the corresponding category in an operational context.
[0027] In another implementation, the step of receiving the data includes a step of categorizing, by a categorization module, the received data to parent and child categories, if the parent and child categories have common tags between them.
[0028] In another implementation, the step of categorizing includes a step of categorizing the received data to a child category, if the child category contains tags which are unique and are not present in another categories.
[0029] In another implementation, each of the tags represent one or more categories.
[0030] In another embodiment, an automated system is configured to collate data in an ontology. The system includes a memory, a processor, a receiver, a database, a parser, an extraction module, a comparator module, and a creation module. The memory is configured to store pre-determined rules. The processor is configured to generate system processing commands based on the pre-determined rules. The receiver is configured to receive data from a plurality of sources. The database is configured to store the received data, wherein the data includes uncollated data. The database further stores pre-defined categories and a set of tags for each pre-defined category. The parser is configured to parse the uncollated data, and generate parsed text data. The extraction module is configured to extract at least one tag and at least one category for the parsed text data. The extraction module includes a tokenizer and a normalization module. The tokenizer is configured to format the parsed text data by breaking into a number of tokens based on regular expressions. The normalization module is configured to normalize the tokens. The comparator module is configured to compare the normalized tokens with the tags, and generate a flag. Further, the comparator module is configured to collate the data belonging to the token or set of tokens against the set of categories and tags stored as a knowledge domain in the database of the system.
[0031] In another implementation, the system includes an updater configured to update the created category and tag in the database.
[0032] In another implementation, the stored tags are synonymous to corresponding category or nearest attributes, for identifying the corresponding category in an operational context.

[0033] In another implementation, the categorization module is configured to categorize the received data to a child category, if the child category contains tags which are unique and are not present in another categories.

[0034] In another implementation, the categorization module is configured to categorize the received data into multiple categories, if the child category contains tags which are also present in other categories.

[0035] Figure 1 illustrates a block diagram of an automated system for collating data in an ontology (100) (hereinafter referred as “system”), according to an exemplary implementation of the present invention. The system (100) is configured to gather raw data from heterogeneous sources. The system (100) is further configured to translate the raw data into information, and collate into contextually relevant group or domains. The system (100) is configured to generate a semantic arrangement of knowledge domains through a set of categories and tags, wherein each of the knowledge domain itself is represented as a category and each of the categories is represented by a set of tags or labels. These tags/ labels are themselves either synonymous to the category or are the nearest attributes uniquely identifying the category in a standard or operational context. In an embodiment, the system (100) is configured to provide the information derived out of the collated data over a graphical user interface. If the receiving data fails to be collated against existing ontology, the system (100) automatically prompts the user to update the existing ontology In one embodiment, the system (100) can be used in any domain, such as space and medicine to auto-collate data in its most granular form. Further, the system (100) is configured to generate an ontology that simulates a semantic network of information processing paradigm, where the tags behave as the neurons connecting the data to a meaningful category or knowledge domain.
[0036] The system (100) includes a memory (102), a processor (104), a database (106), a receiver (108), a parser (112), an extraction module (114), a comparator module (122), and an updater (124).
[0037] The memory (102) is configured to store pre-determined rules related to parsing, extracting, and normalizing data. In an embodiment, the memory (102) can include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory (102) also includes a cache memory to work with the system (100) more effectively.
[0038] The processor (104) is configured to cooperate with the memory (102) to receive the pre-determined rules. The processor (104) is further configured to generate system processing commands. In an embodiment, the processor (104) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor (104) is configured to fetch the pre-determined rules from the memory (102) and execute different modules of the system (100).
[0039] The receiver (108) is configured to receive data from a plurality of sources. In an embodiment, the receiver (108) is configured to receive data from web platforms, questionnaires, and various different distributed data sources. In one embodiment, the receiver (108) includes a categorization module (110). The categorization module (110) is configured to categorize the received data to parent and child categories, if the parent and child categories have common tags between them. The categorization module (110) further configured to categorize the received data to a child category, if the child category contains tags, which are unique and are not present in another categories.
[0040] The database (106) is configured to cooperate with the receiver (108) to receive and store the data. In an embodiment, the received data is uncollated data. The database (106) is further configured to store pre-defined categories, and a set of tags for each pre-defined category. In one embodiment, the database (106) includes a look-up table configured to store the received data, pre-defined categories, and a set of tags for each pre-defined category. In another embodiment, the stored tags are synonymous to corresponding category or nearest attributes, for identifying the corresponding category in an operational context. In yet another embodiment, each of the tags represents one or more categories. In one embodiment, the database (106) is configured to store a set of tags which identifies the category synonymously, but not exclusively because data itself may not be exclusive contextually.
[0041] The parser (112) is configured to cooperate with the database (106) to receive the uncollated data. The parser (112) is further configured to parse the uncollated data, and generate parsed text data.
[0042] The extraction module (114) is configured to cooperate with the parser (112) to receive the parsed text data. The extraction module (114) is further configured to extract at least one tag and at least one category for the parsed text data.
[0043] In an embodiment, the extraction module (114) further includes a tokenizer (116), and a normalization module (118).
[0044] The tokenizer (116) is configured to format the parsed text data by breaking into a number of tokens based on regular expressions. In an exemplary embodiment, as a text corpus is received, the text corpus is broken down into tokens through the tokenizer (116) based on regular expressions, i.e. the standard keywords search in the text corpus push to the normalization module (118), and sentence boundary disambiguation for extracting tokens from sentence boundary.
[0045] The normalization module (118) is configured to cooperate with the tokenizer (116) to receive the tokens. The normalization module (118) is further configured to normalized the tokens. In an embodiment, the tokens is normalized to achieve conversion of lower case, and removal of stopwords (for example a, an, the, her, their, etc.). The normalized tokens then move onto the text collation.
[0046] The comparator module (122) is configured to cooperate with the extraction module (114) and the database (106) to receive the normalized tokens and the tags. The comparator module (122) is further configured to compare the normalized tokens with the tags, and generate a flag, thereby collating the incoming data belonging to the matched token against the category or categories of the tag.
[0047] In an embodiment, the system (100) further includes an updater (124). The updater (124) is configured to update the created category and tags in the database (106) for which the system (100) provides graphical user interface.
[0048] Figure 2 illustrates a schematic diagram depicting a schema of tags (200), according to an exemplary implementation of the present invention.
[0049] Specifically, Figure 2 illustrates a schema of tags (200), which represents three-table structure toxicology. In an exemplary embodiment, the three-table structure toxicology includes three tables, i.e. a BOOKMARK table (202), a TAGMAP table (204), and a TAG table (206), which are in many to many relationship, as depicted in Figure 2. Each of the table have sub-fields for defining a category. For example, the table (202) represents a category “BOOKMARK” having ID, name, URL, description, and Time_Created. The ID includes a unique identification number of the category. In an embodiment, the system (100) automatically generates the identification number for each category. In one embodiment, a user can manually provide the identification number to each category. In one embodiment, the Time_Created refers to a time when the category is created. The table (204) represents TAGMAP having ID, BOOKMARK_ID, and TAG_ID. The table (204) maps the category with corresponding tags. For example, the table (204) includes ID which is a unique identification number for the tag mapping of a specific category and tags. The table (204) also includes IDs of the category and the tag, for example BOOKMARK_ID, and TAG_ID. Further, the table (206) represents “TAG” having TAG_ID and name. In an embodiment, the TAG_ID includes a unique identification number for a tag, the name includes a tag name.
[0050] In another exemplary embodiment, the toxicology is implemented by existing content management systems, for example “WordPress” is a free and open-source content management system (CMS) based on PHP and MySQL, for managing posts/blogs. Each tag can be used together with different bookmarks. In an embodiment, each bookmark can be used together with different tags. In Figure 2, the schema (200) identifies each URL or bookmarks through the set of tags by using following expression:
{Tagmap} = {bookmarks} U {tags}
where, bookmarks and tags are separate entities.
[0051] Figure 3 illustrates a network graph (300) depicting a data ontology, according to an exemplary implementation of the present invention. Specifically, Figure 3 represents a semantic representation of a knowledge domain coverage. In an exemplary embodiment, the knowledge domain coverage includes multiple knowledge domains, for example space, medicine, education, entertainment, artists, and the like. In Figure 3, knowledge domains are depicted as KnowledgeDomain1 (301), KnowledgeDomain2 (302), KnowledgeDomain3 (303), KnowledgeDomain4 (304), and KnowledgeDomain9 (305). Each knowledge domains (301, 302, 303, 304, 305) has multiple tags. More specifically, KnowledgeDomain1 (301) has three tags (T1a, T1b, T1c), KnowledgeDomain2 (302) has four tags (T2a, T2b, T2c, T2d), KnowledgeDomain3 (303) has one tag (T3), KnowledgeDomain4 (304) has four tags (T4a, T4b, T4c, T4d), and KnowledgeDomain9 (305) has three tags (T5a, T5b, T5c).
[0052] For example, KnowledgeDomain1 (301) may refer to a medicine category, is connected to four categories, i.e. KnowledgeDomain2 (302), KnowledgeDomain3 (303), KnowledgeDomain4 (304), and KnowledgeDomain9 (305). The KnowledgeDomain2 (302) may refer to a disease, the KnowledgeDomain3 (303) may refer to a health surveillance, the KnowledgeDomain4 (304) may refer to surgeries, and the KnowledgeDomain9 (305) may refer to an allergy. Each categories has multiple tags, for example KnowledgeDomain2 (302) includes four tags, i.e. physical disorder, infection, mental health, and genetic disease. These tags can be similar in multiple categories. In an embodiment, these categories can be interconnected with each other. The medicine category can refer the disease category, the health surveillance category, and the allergy category.
[0053] In an embodiment, knowledge domains (301, 302, 303, 304, 305) refers to categories. The fluidity and porousness provided by the tags is depicted, as the tags show interconnected categories giving a semantic nature to a hierarchical arrangement of categories. In an embodiment, one or more nodes can be added to the network graph (300) without disturbing the existing ontology.
[0054] In an exemplary embodiment, the system (100) is configured to allow the categories, i.e. knowledge domains (301, 302, 303, 304, 305), and their corresponding synonymous or nearest attributes tags to co-exist a single entity as shown below:
{categories, tags}
where, categories and tags are a pair of elements as a single entity in a single set.
[0055] In an embodiment, a separate entity for tags allows an ontology to cover a broad knowledge domain. A tag can represent multiple categories at the same time if the system (100) wants it to, or can be unique only to a category. In one embodiment, a tag can never exist independently.
[0056] Figure 4 illustrates a schematic diagram (400) depicting representation of the collated data against an ontology of a relational database system, according to an exemplary implementation of the present invention. The collated data against the ontology is depicted in a relational database (402) having data ontology. This is extendable to other databases such as a graph or a non-relational database. In Figure 4, the relational database (402), and a database having incoming persisted data (404) are extendible to a database having data tagged through a relational join (406).
[0057] Figure 5 illustrates a flowchart (500) depicting a workflow of the text processing collation, according to an exemplary implementation of the present invention.
[0058] At step (508), receiving incoming text corpus. In an implementation, the receiver (108) is configured to receive the incoming text corpus from a plurality of sources. The received incoming text are then tokenized, which is shown in step (510) using the techniques of checking standard boundaries, eliminating sentence boundaries, and splitting text on regex, as shown in step (502). In another implementation, a tokenizer (116) is configured to check standard boundaries, eliminate sentence boundaries, and split text on regex, of the tokenize incoming text corpus (510), and further breaking down in a number of tokens (512). Subsequently, normalizing tokens of the incoming text corpus, as shown in a step (504). In one embodiment, the normalization module (118) is configured to normalize tokens. For normalization, the normalization module (118) normalize the tokens by converting the text to lower case, and eliminating stop words from the text, as shown in a step (506). Further, storing data in a database (106) is shown in a step (516). At step (516), finding objects by fetching locations from the database (106), wherein the objects can include people and places. The tokens and objects are then collated at a step (514).
[0059] Figure 6 illustrates a flowchart (600) depicting a workflow of the decomposed steps of Figure 5, according to an exemplary implementation of the present invention.
[0060] At step (602), collation of data in an ontology. At step (604), starting a process for collating the data. A list of categories along with their tags are fetched from the database (106) by using following expression:
FETCH [CATEGORIES, TAGS] ---- (606)
For each category, following steps (608) are performed:
i. For each category:
a. Iterate over tags:
ii. For each tag:
a. Compare token and tag contextually
iii. IF MATCH found:
a. Update COLLATED_FLAG=YES
b. exit loop
iv. ELSE
a. iterate to next category
The aforementioned steps check if the data has been successfully collated, which is shown in a step (610). In an embodiment, the comparator module (122) is configured to generate a collated flag, herein referred as COLLATED_FLAG. A collation category can also be updated existing stored data. If the COLLATED_FLAG is FALSE then following steps are performed:
Storing (614) and (616) in the database (106).
i. Save incoming text as uncollated.
ii. Alert user for uncollated data.
iii. Suggest user to create a new category in the ontology out of the existing set of un-collated text tokens.
[0061] Figure 7 illustrates a flowchart depicting a method for collating data in an ontology, according to an exemplary implementation of the present invention.
[0062] The flowchart (700) starts at a step (702), receiving data from a plurality of sources. In an implementation, a receiver (108) is configured to receive data from a plurality of sources. At step (704), storing, in a database (106), the received data, pre-defined categories and a set of tags for each pre-defined category. At step (706), parsing the uncollated data, and generate parsed text data. In an implementation, a parser (112) is configured to parse the uncollated data received from the database (106), and is further configured to parse the uncollated data, and generate parsed text data. At step (708), extracting at least one tag and at least one category for the parsed text data. In an implementation, the extraction module (114) is configured to extract at least one tag and at least one category for the parsed text data. At step (710), formatting the parsed text data by breaking into a number of tokens based on regular expressions. In an implementation, a tokenizer (116) is configured to format the parsed text data by breaking into a number of tokens based on regular expressions. At block (712), normalizing the tokens. In another implementation, a normalizing module (118) is configured to receive the tokens from the tokenizer (116), and normalize the tokens by converting the text to lower case, and eliminating stop words from the text. At block (714), comparing the normalized tokens with the tags, and generating a flag. In another implementation, a comparator module (122) is configured to receive the normalized tokens and the tags, and is further configured to compare the normalized tokens with the tags, and generate a flag.
[0063] It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all examples recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.

Documents

Application Documents

#	Name	Date
1	201841036914-STATEMENT OF UNDERTAKING (FORM 3) [29-09-2018(online)].pdf	2018-09-29
2	201841036914-FORM 1 [29-09-2018(online)].pdf	2018-09-29
3	201841036914-FIGURE OF ABSTRACT [29-09-2018(online)].pdf	2018-09-29
4	201841036914-DRAWINGS [29-09-2018(online)].pdf	2018-09-29
5	201841036914-DECLARATION OF INVENTORSHIP (FORM 5) [29-09-2018(online)].pdf	2018-09-29
6	201841036914-COMPLETE SPECIFICATION [29-09-2018(online)].pdf	2018-09-29
7	201841036914-FORM-26 [27-12-2018(online)].pdf	2018-12-27
8	Correspondence by Agent_Power of Attorney_07-01-2019.pdf	2019-01-07
9	201841036914-Proof of Right (MANDATORY) [20-02-2019(online)].pdf	2019-02-20
10	Correspondence By Agent_Form1_25-02-2019.pdf	2019-02-25
11	201841036914-Proof of Right (MANDATORY) [27-03-2019(online)].pdf	2019-03-27
12	Correspondence By Agent_Form1_01-04-2019.pdf	2019-04-01
13	201841036914-FORM 18 [24-12-2020(online)].pdf	2020-12-24
14	201841036914-FER.pdf	2022-01-03
15	201841036914-FER_SER_REPLY [16-06-2022(online)].pdf	2022-06-16
16	201841036914-DRAWING [16-06-2022(online)].pdf	2022-06-16
17	201841036914-COMPLETE SPECIFICATION [16-06-2022(online)].pdf	2022-06-16
18	201841036914-CLAIMS [16-06-2022(online)].pdf	2022-06-16
19	201841036914-ABSTRACT [16-06-2022(online)].pdf	2022-06-16
20	201841036914-PatentCertificate18-06-2024.pdf	2024-06-18
21	201841036914-IntimationOfGrant18-06-2024.pdf	2024-06-18

Search Strategy

1	SearchHistory201841036914E_06-10-2021.pdf