Abstract: In current approaches for feeding use cases in data analytics to take business related decision on urgent basis becomes challenge. This disclosure relates to a method of generation of a cognitive data dictionary. A request to prepare a final dataset report is received based on data obtained from source systems. A complex structured query is created based on an entity relationship among database components. A column structure of the database components is processed to register naming convention associated with a key column from one or more tables. A name associated with the key column of table structure of a data dictionary is determined. A report linking the table structure with the name of the key column is generated. The final dataset report is generated based on the complex structured query. A learned model is generated based on the final dataset report. The learned model includes use cases for the user. [To be published with FIG. 2]
Claims:We Claim:
1. A processor implemented method (300), comprising:
receiving, via one or more hardware processors, at least one request to prepare a final dataset report based on at least one data obtained from a plurality of source systems (202A-N) (302);
creating in response to the received at least one request, via the one or more hardware processors, at least one complex structured query based on an entity relationship built among a plurality of database components (304), wherein building the entity relationship among the plurality of database components comprises:
processing, via the one or more hardware processors, at least one column structure of the plurality of database components to register at least one naming convention associated with at least one key column from a plurality of tables, wherein the at least one key column from the plurality of tables is a unique identifier comprising a unique value (304A);
determining, via the one or more hardware processors, at least one name associated with the at least one key column present in at least one table structure of a data dictionary (204) (304B); and
generating, via the one or more hardware processors, a report linking the at least one table structure with the at least one name associated with the at least one key column (304C);
generating, via the one or more hardware processors, the final dataset report based on the at least one complex structured query (306); and
generating, via the one or more hardware processors, a learned model based on the final dataset report, wherein the learned model is configured to generate at least one use case for a user (308).
2. The processor implemented method (300) as claimed in claim 1, wherein at least one attribute is identified which defines at least one parameter of the final report, and wherein the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table.
3. The processor implemented method (300) as claimed in claim 1, wherein the data dictionary (204) is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure, wherein the data dictionary (204) is updated with the latest information, when a new feature is added or deleted, and wherein the data dictionary (204) comprises at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature.
4. The processor implemented method (300) as claimed in claim 1, wherein the plurality of database components corresponds to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof.
5. The processor implemented method (300) as claimed in claim 1, wherein the at least one column structure comprises at least one of: (a) datatype, (b) length, and (c) plurality of properties associated with at least one column satisfying a criteria of a stored data, and wherein the at least one table structure corresponds to a name defined for the at least one column.
6. A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive, at least one request to prepare a final dataset report based on at least one data obtained from a plurality of source systems;
create in response to the received at least one request, at least one complex structured query at least one complex structured query based on an entity relationship built among a plurality of database components, wherein building the entity relationship among the plurality of database components comprises:
process, at least one column structure of the plurality of database components to register at least one naming convention associated with at least one key column from a plurality of tables, wherein the at least one key column from the plurality of tables is a unique identifier comprising an unique value;
determine, at least one name associated with the at least one key column present in at least one table structure of a data dictionary; and
generate, a report linking the at least one table structure with the at least one name associated with the at least one key column;
generate, the final dataset report based on the at least one complex structured query; and
generate, a learned model based on the final dataset report, wherein the learned model is configured to generate at least one use case for a user.
7. The system (100) as claimed in claim 6, wherein at least one attribute is identified which defines at least one parameter of the final report, and wherein the at least one attribute corresponds to at least one of: (a) a column, or (b) a feature from at least one database table.
8. The system (100) as claimed in claim 6, wherein the data dictionary (is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure, wherein the data dictionary is updated with the latest information, when a new feature is added or deleted, and wherein the data dictionary comprises at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature.
9. The system (100) as claimed in claim 6, wherein the plurality of database components corresponds to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof.
10. The system (100) as claimed in claim 6, wherein the at least one column structure comprises at least one of: (a) datatype, (b) length, and (c) plurality of properties associated with at least one column satisfying a criteria of a stored data, and wherein the at least one table structure corresponds to a name defined for the at least one column.
Dated this 28th day of December 2021
Tata Consultancy Services Limited
By their Agent & Attorney
(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086
, Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR GENERATION OF A COGNITIVE DATA DICTIONARY
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The disclosure herein generally relates to data management, and, more particularly, to a method and system for generation of a cognitive data dictionary.
BACKGROUND
[002] In an enterprise data warehouse, due to presence of a huge number of database components, an exhaustive and huge manual eyeballing or a detail data analysis is involved for finding a primary key and a foreign key relationship between database components. Currently, a physical structure analysis of the huge number of database components includes a manual mode of finding an entity relationship between the database components and also requires establishing a proper join condition to build use case related queries. This process is time consuming and feeding one or more use cases in data analytics on an urgent basis becomes a challenge. Further, manual update on a data dictionary carries manual errors. Existing approaches are still not competent for the repetitive work to produce error free report in minimal time and with minimal effort.
SUMMARY
[003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of generating a cognitive data dictionary is provided. The processor implemented method includes at least one of: receiving, via one or more hardware processors, at least one request to prepare a final dataset report based on at least one data obtained from one or more source systems; creating in response to the received at least one request, via the one or more hardware processors, at least one complex structured query based on an entity relationship built among one or more database components; generating, via the one or more hardware processors, the final dataset report based on the at least one complex structured query; and generating, via the one or more hardware processors, a learned model based on the final dataset report. The entity relationship built among the one or more database components include: (a) processing, via the one or more hardware processors, at least one column structure of the one or more database components to register at least one naming convention associated with at least one key column from one or more tables; (b) determining, via the one or more hardware processors, at least one name associated with the at least one key column present in at least one table structure of a data dictionary, and (c) generating, via the one or more hardware processors, a report linking the at least one table structure with at least one name associated with the at least one key column. The at least one key column from the one or more tables is a unique identifier includes a unique value. The learned model is configured to generate at least one use case for a user.
[004] In an embodiment, at least one attribute is identified which defines at least one parameter of the final report. In an embodiment, the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table. In an embodiment, the data dictionary is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure. In an embodiment, the data dictionary is updated with the latest information, when a new feature is added or deleted. In an embodiment, the data dictionary includes at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature. In an embodiment, the one or more database components correspond to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof. In an embodiment, the at least one column structure include at least one of (a) datatype, (b) length, and (c) one or more properties associated with at least one column satisfying a criteria of a stored data. In an embodiment, the at least one table structure corresponds to a name defined for the at least one column.
[005] In another aspect, there is provided a system for generation of a cognitive data dictionary. The system includes a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, at least one request to prepare a final dataset report based on at least one data obtained from one or more source systems; create in response to the received at least one request, at least one complex structured query based on an entity relationship built among one or more database components; generate, the final dataset report based on the at least one complex structured query; and generate, a learned model based on the final dataset report. The entity relationship built among the one or more database components include: (a) process, at least one column structure of the one or more database components to register at least one naming convention associated with at least one key column from one or more tables; (b) determine, at least one name associated with the at least one key column present in at least one table structure of a data dictionary, and (c) generating, via the one or more hardware processors, a report linking the at least one table structure with at least one name associated with the at least one key column. The at least one key column from the one or more tables is a unique identifier includes a unique value. The learned model is configured to generate at least one use case for a user.
[006] In an embodiment, at least one attribute is identified which defines at least one parameter of the final report. In an embodiment, the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table. In an embodiment, the data dictionary is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure. In an embodiment, the data dictionary is updated with the latest information, when a new feature is added or deleted. In an embodiment, the data dictionary includes at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature. In an embodiment, the one or more database components correspond to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof. In an embodiment, the at least one column structure include at least one of (a) datatype, (b) length, and (c) one or more properties associated with at least one column satisfying a criteria of a stored data. In an embodiment, the at least one table structure corresponds to a name defined for the at least one column.
[007] In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes at least one of: receiving, at least one request to prepare a final dataset report based on at least one data obtained from one or more source systems; creating in response to the received at least one request, at least one complex structured query based on an entity relationship built among one or more database components; generating, the final dataset report based on the at least one complex structured query; and generating, a learned model based on the final dataset report. The entity relationship built among the one or more database components include: (a) processing, at least one column structure of the one or more database components to register at least one naming convention associated with at least one key column from one or more tables; (b) determining, at least one name associated with the at least one key column present in at least one table structure of a data dictionary, and (c) generating, via the one or more hardware processors, a report linking the at least one table structure with at least one name associated with the at least one key column. The at least one key column from the one or more tables is a unique identifier includes a unique value. The learned model is configured to generate at least one use case for a user.
[008] In an embodiment, at least one attribute is identified which defines at least one parameter of the final report. In an embodiment, the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table. In an embodiment, the data dictionary is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure. In an embodiment, the data dictionary is updated with the latest information, when a new feature is added or deleted. In an embodiment, the data dictionary includes at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature. In an embodiment, the one or more database components correspond to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof. In an embodiment, the at least one column structure include at least one of (a) datatype, (b) length, and (c) one or more properties associated with at least one column satisfying a criteria of a stored data. In an embodiment, the at least one table structure corresponds to a name defined for the at least one column.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates a block diagram of an exemplary system for generation of a cognitive data dictionary, according to an embodiment of the present disclosure.
[012] FIG. 2 illustrates an exemplary functional block diagram of the system of FIG.1, according to some embodiments of the present disclosure.
[013] FIG. 3 is an exemplary flow diagram illustrating method of generating the cognitive data dictionary, according to an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[014] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[015] There is a need to create error free queries for generating one or more use cases in data analytics and to obtain proper business-related decisions. In an embodiment of the present disclosure, a data dictionary is generated for creation of one or more complex and error free queries which helps in generation of a final dataset report. The final dataset report is utilized to generate a model with one or more use cases in a data analytics to obtain business-related decisions. The data dictionary can automatically establish a primary key (PK) and a foreign key (FK) relationship among one or more database (DB) tables from a physical structure information defined using a Node JS and a machine learning (ML).Net technique. A structured metadata table includes the physical structure information of one or more components associated with one or more database (DB) tables in an integrated infrastructure or an enterprise data warehouse platform. In an embodiment, the data dictionary is alternatively referred as a cognitive data dictionary or a smart data dictionary (SDD). The embodiment of the present disclosure provides an automation for a pattern matching i.e., a text pattern recognition to bring in intelligent query building capabilities. The SDD is powered by a cognitive robotic process automation (RPA).
[016] Referring now to the drawings, and more particularly to FIG. 1 through 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[017] FIG. 1 illustrates a block diagram of an exemplary system for generation of the cognitive data dictionary, according to an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processor(s) 102, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 104 operatively coupled to the one or more processors 102. The memory 104 includes a database. The one or more processor(s) processor 102, the memory 104, and the I/O interface(s) 106 may be coupled by a system bus such as a system bus 108 or a similar mechanism. The one or more processor(s) 102 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more processor(s) 102 is configured to fetch and execute computer-readable instructions stored in the memory 104. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like.
[018] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface device(s) 106 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a camera device, and a printer. Further, the I/O interface device(s) 106 may enable the system 100 to communicate with other devices, such as web servers and external databases. The I/O interface device(s) 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. In an embodiment, the I/O interface device(s) 106 can include one or more ports for connecting number of devices to one another or to another server.
[019] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 110 and a repository 112 for storing data processed, received, and generated by the plurality of modules 110. The plurality of modules 110 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types.
[020] Further, a database in the repository 112 stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., data/output generated at each stage of the data processing) 100, specific to the methodology described herein. More specifically, the database stores information being processed at each step of the proposed methodology.
[021] Additionally, the plurality of modules 110 may include programs or coded instructions that supplement applications and functions of the system 100. The repository 112, amongst other things, includes a system database 114 and other data 116. The other data 116 may include data generated as a result of the execution of one or more modules in the plurality of modules 110. Further, the database stores information pertaining to inputs fed to the system 100 and/or outputs generated by the system (e.g., at each stage), specific to the methodology described herein. Herein, the memory for example the memory 104 and the computer program code configured to, with the hardware processor for example the processor 102, causes the system 100 to perform various functions described herein under.
[022] FIG. 2 illustrates an exemplary functional block diagram of the system 100 of FIG.1, according to some embodiments of the present disclosure. The system 100 includes a data dictionary 204, a link report generator 206, an analyzing unit 208, and a delivery unit 210. The system 100 is configured to receive at least one request to prepare a final dataset report based on at least one data obtained from one or more source systems 202A-N. In an embodiment, the one or more source systems 202A-N corresponds to but not limited to (a) a member/user account, (b) client/user, (c) user survey, (d) insurance claims, (e) user call, and (f) social media. In an embodiment, at least one attribute is identified which defines at least one parameter of the final report. In an embodiment, the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table. The data dictionary 204 is powered by a cognitive robotic process automation (RPA). The data dictionary 204 is created by fetching at least one of: (a) the at least one table structure by connecting the database (for tables), or (b) a file structure by connecting a Unix platfo
rm (for files). In an embodiment, the data dictionary 204 is updated with the latest information, when a new feature is added or deleted. In an embodiment, the data dictionary 204 includes at least one metadata from a structured metadata table that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature. In an embodiment, the one or more database components correspond to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof. The structured metadata table includes a physical structure information of one or more database (DB) components associated with one or more database (DB) tables in an integrated infrastructure or an enterprise data warehouse platform.
[023] The data dictionary 204 creates one or more complex and error free queries to feed one or more use cases in a data analytics to obtain proper business-related decisions. The system 100 provides an automation for the pattern matching i.e., a text pattern recognition to build at least one complex structured query. The at least one complex structured query is created based on an entity relationship among the one or more database (DB) components. The data dictionary 204 of the system 100 automatically build one or more entities relationship among the one or more database components which includes: (a) at least one column structure of the one or more database components is processed to register at least one naming convention associated with at least one key column from one or more tables; (b) at least one name associated with the at least one key column which is present in at least one table structure of the data dictionary 204 is determined, and a report linking the at least one table structure with the at least one name associated with the at least one key column is generated. In an embodiment, the at least one key column from the one or more tables is a unique identifier includes a unique value which is not repeated. In an embodiment, the link report generator 206 is configured to generate the report linking the at least one table structure with the at least one key column name. In an embodiment, the final report is created linking the at least one table structure in a graphical user interface (GUI) manner. In an embodiment, the at least one column structure include at least one of (a) datatype, (b) length, and (c) one or more properties associated with at least one column which satisfies a criteria of a stored data. In an embodiment, the at least one table structure corresponds to a name defined for the at least one column. The data dictionary 204 can automatically establish the primary key (PK) and the foreign key (FK) relationship among one or more database (DB) tables from a physical structure information defined using a Node JS and a machine learning (ML).Net technique. The PK-FK relationship is established by the text pattern recognition in one or more naming conventions of a key attributes between the one or more database (DB) components in entire data warehouse. The data dictionary 204 establishes automatic validation such as an exception handling and the text pattern recognition.
[024] For example, an error handling is performed by at least one of: (a) If the system 100 picks up an exception, such as a discrepancy between name of user on a form and on an ID document, can pass to a human employee for further processing. The system 100 utilizes the machine learning (ML) to monitor and learn how the human employee validates the user’s identity and improvise an ability to process a same scenario in a future attempt without an input from the user; (b) the text pattern recognition is utilized for handling incorrect naming convention and spelling. The ML.Net technique is utilized to automate one or more search engine within the data dictionary 204 containing the metadata of the one or more DB components. The cognitive RPA determine that one or more key columns are streamlined with the correct naming convention by the text pattern matching. Upon identifying or rectifying (if any incorrect spellings) the ML.Net trains the data dictionary 204 for an automatic search to establish the PK-FK relationship between the one or more DB components.
[025] The final dataset report based on the at least one complex structured query. The analyzing unit 208 further includes a configurable model generator 212 which is configured to generate a learned model based on the final dataset report. The delivery unit 210 further includes a use cases generator 214 which is configured to generate at least one use case for a user based on the learned model, In an embodiment, one or more use cases from the use cases generator 214 corresponds but not limited to at least one of: (a) one or more insights, (b) prediction, (c) alerts, and (d) trends.
[026] In an exemplary embodiment, considering a business query or a challenge for which the use case corresponds to why there are high number of repeat caller? Expectation is to receive a detailed repeat call report from a team. The caller can call repeatedly for various reasons: (a) Each time the caller reaches out with new queries, (b) at a first time the caller does not get a proper clarification hence reaching out again, (c) an assigned agent is not capable enough to answer a question due to lack of information or lack of understanding, and (d) getting an updated status of a ticket. For example, the use case involves three stakeholders such as an entity (i.e., business), a data scientist, and a data architect. The steps involved in generation of the cognitive data dictionary to generate a real-time query are: (a) a data exploration, (b) a data preparation, (c) a predictive model, and (d) a business handover.
[027] The one or more steps of the data exploration process are as:
a. The data scientist understands an exposure of a requirement i.e., understands an exact scenario defined in a business use case and by browsing a contextual data present in the one or more database components such as tables, views etc.
b. The data scientist identifies one or more attributes which defines one or more parameters of a final report. The one or more attributes corresponds to columns or features in the one or more database tables. In an embodiment, the one or more parameters corresponds to one or more key descriptive factors e.g., an agent details, member details, call details, call recording (e.g., if not encrypted), call reason, call status, request status.
c. The data scientist prepares a requirement of dataset to satisfy the end report for the use case. The requirement of dataset can be defined as a model report including data which satisfies a need for the business in the form of the one or more-use cases.
[028] The one or more steps of the data preparation process are as:
a. The data architect receives the requirement from the data scientist to generate a final report.
b. The data architect needs to build a structured query language (SQL) query to retrieve a dataset from a data warehouse. The query is built based on the entity relationship among the one or more database components. The data dictionary 204 is maintained to keep entire structure of each and the one or more database components in the data warehouse. In an embodiment, the one or more database components also corresponds to structured containers containing huge amount of business data. The data dictionary 204 supports to create the complex structured query which results a large dataset to be used for building a dashboard/report that fulfills the actual business need. The data dictionary 204 include an additional feature to update the metadata automatically upon any changes. The metadata corresponds to data i.e., the table names, defined columns of the database tables/Views, datatype & length, features etc. The metadata are stored in the data dictionary 204. Initially all the metadata were fetched from the database directly. If any of the information gets changed, new features/components get added/deleted, the data dictionary 204 is updated with the latest information. For example, a script runs in a background with a checks applied on top of a previous day’s record: <>.
c. A relationship among the one or more entities are automatically built by:
i. analyzing through one or more column structures of the one or more database components. For example, sample- .
ii. registering one or more naming conventions of one or more key columns of the one or more tables. The naming conventions are registered based on the business need and database standard-names are defined for every column which is maintained across the domain referring to the industry standard; For example, ) of the key columns of all the tables. The one or more key columns are the unique identifier containing an unique value and are not repeated. Example: Employee Id is an unique identifier for an employee. Two different employee cannot have same employee Id which is referred as the primary key. If this employee reference is needed in department table to know in which department the employee works, then Employee_Id column can be present in the department table. Hence, in the department table this Employee Id column are referred as a foreign key. In the data dictionary 204 a searching feature is incorporated where the primary key and the foreign key relationship can be determined automatically among one or more database components.
iii. determining one or more same key column names present in the one or more table structures in the data dictionary 204. Column is a part of a table. The table structure includes one or more column names are defined. The column structure includes datatype, length and one or more properties of the column satisfying one or more criteria of the stored data i.e., nullable or not, unique value or not etc. The table structure is a superset, and the column structure is corresponding subset.
iv. generating the final report linking the table names/structures having the one or more same key column names. For example, the final report shows the entity relationship among the one or more tables i.e., if the unique identifier or the primary key of the A table is present in 100 other tables as a referential key or the foreign key, then the automatic feature identifies all 100 tables and produce a report showing the link between table A and 100 child tables. This process can be implemented using Node JS and .Net technology
d. the report is built and handed over to the data scientist. The data scientist builds the machine learning model to predict the repeat callers. The data scientist shares the repeat caller report as a final output to the business/entity.
[029] FIG. 3 is an exemplary flow diagram illustrating method of generating the data dictionary by the cognitive robotic process automation (RPA), according to an embodiment of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processors 102 and is configured to store instructions for execution of steps of the method by the one or more processors 102. The flow diagram depicted is better understood by way of following explanation/description. The steps of the method of the present disclosure will now be explained with reference to the components of the system as depicted in FIGS. 1 and 2.
[030] At step 302, at least one request to prepare a final dataset report is received based on at least one data obtained from one or more source systems. In an embodiment, at least one attribute is identified which defines at least one parameter of the final report. In an embodiment, the at least one attribute corresponds to at least one of (a) a column, or (b) a feature from at least one database table. At step 304, at least one complex structured query is created based on an entity relationship among one or more database components. The entity relationship built among the one or more database components include: (a) at step 304A, at least one column structure of the one or more database components is processed to register at least one naming convention associated with at least one key column from one or more tables, (b) at step 304B, at least one name associated with the at least one key column present in at least one table structure of a data dictionary is determined, and (c) at step 304C, a report linking the at least one table structure with the at least one name associated with the at least one key column is generated. In an embodiment, the data dictionary is created by fetching at least one of: (a) the at least one table structure, or (b) a file structure. In an embodiment, the data dictionary is updated with the latest information, when a new feature is added or deleted. In an embodiment, the data dictionary includes at least one metadata that corresponds to at least one of: (i) the table names, (ii) defined columns of the at least one database table, (iii) datatype and length, and (iv) feature. In an embodiment, the one or more database components correspond to at least one of: (a) a table, (b) a procedure, (c) a cursor, and (d) a trigger, or a combination thereof. In an embodiment, the at least one column structure include at least one of (a) datatype, (b) length, and (c) one or more properties associated with at least one column satisfying a criteria of a stored data. In an embodiment, the at least one table structure corresponds to a name defined for the at least one column. At step 306, the final dataset report is generated based on the at least one complex structured query. The at least one key column from the one or more tables is a unique identifier includes a unique value. At step 308, a learned model is generated based on the final dataset report. The learned model is configured to generate at least one use case for the user.
[031] In an exemplary embodiment, below mentioned is an exemplary pseudo code for the above-mentioned method steps involved in building of the at least one complex structured query is described herein.
[032] A GUI is created which include: (a) Select objects from an object selection pane; (b) Add objects to a design pane and select columns; (c) Execute the query. The query builder takes input parameters, interpret, and then generate a query. A “drag-and-drop” feature are added and without typing any SQL syntax, the SQL queries can be created by using only mouse. If the query builder supports a Windows only, then MS-SQL servers is required, but as there are different kind of servers and database systems should support. This supports JDBC drivers but not limited to Oracle, MySQL. For writing, compiling, and executing code, Eclipse is used. The GUI is created for ease of the user access by creating and editing SQL statements without typing SQL syntax. The functionalities found as best suited for different databases and environments. The query builder interprets the input, doing necessary calculations and create a string that represent a SQL-query. The inputs given to the query builder makes easy to understand that how to create functions for processing the input. Another function is used to create a number identifying each table in the database.
Clause creation:
Select - The first step is to create a SELECT-statement. This statement specify from what tables data is to be retrieved from. All input parameters are put together to create a unique table name. The input to the SELECT function is a list of counter names and with this information all the output from the first processing functions are kept in a string and a complete SELECT-statement are created.
From - The second statement are the FROM-statement. A table are created containing information about necessary parameters (e.g., time) in the queries. This table are always a part of the output, and this is a fixed part in the FROM statement. There is no need for output from the earlier functions to make this work and put the table in the FROM-statement.
Join- A table is then joined together with differently identified table components in the JOIN statement. In this section, all related tables are joined together. The JOIN-function uses the same parameters as SELECT, creating complete table names to specify which table’s data should be retrieved from.
On/Where -To specify certain conditions for the JOIN-statement, the ON- and WHERE statements are used. The ON-function specifies what key attributes (columns) that all the tables should contain and join only those with matching attributes (columns).
GROUP BY and ORDER BY - These two clauses are specifying how the data should be grouped and how the data should be ordered when returned from the queries. The users always want the output to be sorted default in ascending order. After all these calculations and creation of string in all the different functions, each function returns a string to the main function where they are assembled to the main SQL query. The entire code is written by calling the different functions and create strings and then put together and returned as a query.
[033] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[034] The embodiment of present disclosure herein addresses unresolved problem of creating complex and error free queries to extract the required dataset. The embodiments of the present disclosure, the data dictionary can automatically establish relationship of the primary key and the foreign key among the DB tables from the physical structure information defined in the data dictionary. The data dictionary is powered by the cognitive RPA and establishes the automatic validation such as a exception handling and the text pattern recognition. An automatic process significantly faster than manual process which consumes more time and eliminates error from the user when running the same operations every time. The embodiment of the present disclosure provides an automatic updating of one or more contents of the smart data dictionary. Moreover, the embodiments herein further provide an efficient approach to create complex and error free queries to extract required dataset for feeding the use cases in data analytics which helps the organization to take proper business-related decisions. The automatic establishment of entity relationship among the one or more database components and producing a linking report. The embodiments herein further provide an automatic error handling using the text pattern recognition technique. The embodiments herein further provide a reduced manual effort in building the one or more queries. As the output is focused on a framework is all about streamlining one or more metadata associated with one or more structural parameters for reducing manual effort, human error, and execution time.
[035] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[036] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[037] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[038] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[039] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
| # | Name | Date |
|---|---|---|
| 1 | 202121061340-FER.pdf | 2025-02-10 |
| 1 | 202121061340-STATEMENT OF UNDERTAKING (FORM 3) [28-12-2021(online)].pdf | 2021-12-28 |
| 2 | 202121061340-REQUEST FOR EXAMINATION (FORM-18) [28-12-2021(online)].pdf | 2021-12-28 |
| 2 | 202121061340-FORM-26 [20-04-2022(online)].pdf | 2022-04-20 |
| 3 | Abstract1.jpg | 2022-03-23 |
| 3 | 202121061340-PROOF OF RIGHT [28-12-2021(online)].pdf | 2021-12-28 |
| 4 | 202121061340-COMPLETE SPECIFICATION [28-12-2021(online)].pdf | 2021-12-28 |
| 4 | 202121061340-FORM 18 [28-12-2021(online)].pdf | 2021-12-28 |
| 5 | 202121061340-FORM 1 [28-12-2021(online)].pdf | 2021-12-28 |
| 5 | 202121061340-DECLARATION OF INVENTORSHIP (FORM 5) [28-12-2021(online)].pdf | 2021-12-28 |
| 6 | 202121061340-FIGURE OF ABSTRACT [28-12-2021(online)].jpg | 2021-12-28 |
| 6 | 202121061340-DRAWINGS [28-12-2021(online)].pdf | 2021-12-28 |
| 7 | 202121061340-FIGURE OF ABSTRACT [28-12-2021(online)].jpg | 2021-12-28 |
| 7 | 202121061340-DRAWINGS [28-12-2021(online)].pdf | 2021-12-28 |
| 8 | 202121061340-FORM 1 [28-12-2021(online)].pdf | 2021-12-28 |
| 8 | 202121061340-DECLARATION OF INVENTORSHIP (FORM 5) [28-12-2021(online)].pdf | 2021-12-28 |
| 9 | 202121061340-FORM 18 [28-12-2021(online)].pdf | 2021-12-28 |
| 9 | 202121061340-COMPLETE SPECIFICATION [28-12-2021(online)].pdf | 2021-12-28 |
| 10 | 202121061340-PROOF OF RIGHT [28-12-2021(online)].pdf | 2021-12-28 |
| 10 | Abstract1.jpg | 2022-03-23 |
| 11 | 202121061340-FORM-26 [20-04-2022(online)].pdf | 2022-04-20 |
| 11 | 202121061340-REQUEST FOR EXAMINATION (FORM-18) [28-12-2021(online)].pdf | 2021-12-28 |
| 12 | 202121061340-STATEMENT OF UNDERTAKING (FORM 3) [28-12-2021(online)].pdf | 2021-12-28 |
| 12 | 202121061340-FER.pdf | 2025-02-10 |
| 13 | 202121061340-OTHERS [04-07-2025(online)].pdf | 2025-07-04 |
| 14 | 202121061340-FER_SER_REPLY [04-07-2025(online)].pdf | 2025-07-04 |
| 15 | 202121061340-CLAIMS [04-07-2025(online)].pdf | 2025-07-04 |
| 16 | 202121061340-ORIGINAL UR 6(1A) FORM 26-160725.pdf | 2025-07-18 |
| 1 | 202121061340E_04-01-2024.pdf |