Abstract: A system (100) for master data creation using template-based data formatting is disclosed. The processing subsystem (105) includes a receiving module (120) to receive a plurality of files from a user for master data creation. A processing module (130) pre-processes the unstructured data for ingestion, extract a plurality of components from the unstructured data, map one or more relationships between each of the plurality of components, identify a template from a plurality of predetermined templates based on the plurality of components, and map the plurality of components with the identified template. A notification generation module (140) notifies an occurrence of errors to the user upon creating the master data. A verification module (150) enables the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. A pre-processing module (160) converts the master data into a required format acceptable. FIG. 1
Description:FIELD OF INVENTION
[0001] Embodiments of the present disclosure relate to the field of laboratory information management system and manufacturing execution system, and more particularly, a system and a method for master data creation using templated-based data formatting.
BACKGROUND
[0002] Laboratory Information Management Systems (LIMS) and Manufacturing Execution Systems (MES) are crucial in a pharmaceutical industry for efficient operations and quality standards. LIMS manages laboratory workflows, sample tracking, and data management thereby enhancing efficiency and regulatory compliance in laboratory, quality control facilities, and testing centres. Further, MES optimizes manufacturing processes by coordinating activities, monitoring real-time data, and ensuring quality control throughout production lifecycle. Together, LIMS and MES provide a comprehensive solution for managing laboratory and manufacturing operations, enhancing overall efficiency, quality, and compliance.
[0003] Currently, the pharmaceutical industry is undergoing a significant transformation to digitization for quality control improvement. Further, converting unstructured data generation is crucial for analysis, decision-making, and regulatory submissions.
[0004] However, process of converting unstructured documents to electronic systems is challenging due to complexity of data. Since, the data present in the pharmaceutical industry is a vast and diverse collection of information, including product specifications, manufacturing procedures, quality control parameters and, that requires specialized knowledge and expertise for standardization. Typically, product specifications may include composition, formulation, and dosage with variations based on drug type.
[0005] Moreover, this process include errors, inconsistencies, and inaccuracies, affecting quality and reliability of the data. Converting the unstructured documents often requires significant manual effort, which may be time-consuming, labour-intensive, and prone to errors, leading to delays and inefficiencies.
[0006] Further, errors in manual data entry may have severe consequences, especially in fields like pharmaceuticals where precision is paramount. For example, if a medication dosage is incorrectly entered as "50mg" instead of 5mg when converting the unstructured document to structured document, it may lead to adverse effects, toxicity, or even fatalities. Users who receive incorrect dosage may experience severe side effects or adverse reactions, including organ damage, allergic reactions, or other harmful effects that compromise health and well-being. Incorrect dosages may also impact efficacy of medication, as the users may not receive intended therapeutic benefits, leading to treatment failure or suboptimal outcomes.
[0008] Hence, there is a need for an improved system and method for master data creation using templated-based data formatting which addresses the aforementioned issue(s).
OBJECTIVE OF THE INVENTION
[0009] An objective of the invention is to automate process of converting data from an unstructured data into a custom format (structured data or master data) compatible with laboratory information management system (LIMS) and manufacturing execution system (MES).
[0010] Another objective of the invention is to notify user when errors are detected in the master data.
BRIEF DESCRIPTION
[0011] In accordance with an embodiment of the present disclosure, a system for master data creation using templated-based data formatting is provided. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem a receiving module is configured to receive a plurality of files from a user for master data creation wherein the plurality of files includes unstructured data. Further, the processing subsystem includes a processing module operatively coupled to the receiving module wherein the processing module is configured to pre-process the unstructured data for ingestion by a large language model. Further, the processing module is also configured to extract a plurality of components from the unstructured data using natural language processing techniques. Furthermore, the processing module is configured to map one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components. Moreover, the processing module is configured to identify a template from a plurality of predetermined templates based on the plurality of components. Moreover, the processing module is also configured to map the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format. Further, the processing subsystem includes a notification generation module operatively coupled to the processing module wherein the notification generation module is configured to notify an occurrence of errors to the user upon creating the master data. Furthermore, the processing subsystem includes a verification module operatively coupled to the notification generation module wherein the verification module is configured to enable the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. Moreover, the processing subsystem includes a pre-processing module operatively coupled to the verification module wherein the pre-processing module is configured to convert the master data into a required format acceptable for the laboratory information management system and manufacturing execution system.
[0012] In accordance with another embodiment of the present disclosure, a method for master data creation for template-based data formatting is provided. The method includes receiving, by a receiving module, a plurality of files from a user for master data creation wherein the plurality of files includes unstructured data. The method also includes preprocessing, by a processing module, the unstructured data for ingestion by large language model. Further, the method includes extracting, by the processing module, a plurality of components from the unstructured data using natural language processing techniques. Further, the method also includes mapping, by the processing module, one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components. Furthermore, the method includes identifying, by the processing module, a template from a plurality of predetermined templates based on the plurality of components. Furthermore, the method includes mapping, by the processing module, the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format. Further, the method includes notifying, by a notification generation module, an occurrence of errors to the user upon creating the master data. Further, the method also includes enabling, by a verification module, the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. Furthermore, the method includes converting, by a pre-processing module, the master data into a required format acceptable for the laboratory information management system and manufacturing execution system.
[0013] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0015] FIG. 1 is a block diagram representation of a system for master data creation for template-based data formatting in accordance with an embodiment of the present disclosure;
[0016] FIG. 2 is a block diagram representation of an exemplary embodiment of the master data creation for template-based data formatting of FIG. 1 in accordance with an embodiment of the present disclosure;
[0017] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure;
[0018] FIG. 4(a) illustrates a flow chart representing the steps involved in a method for master data creation for template-based data formatting in accordance with an embodiment of the present disclosure; and
[0019] FIG. 4(b) illustrates continued steps of the method of FIG. 4(a) in accordance with an embodiment of the present disclosure.
[0020] Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0021] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0022] The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or subsystems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0023] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0024] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0025] Embodiments of the present disclosure relates to a system for template-based data formatting. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes a receiving module operatively coupled to the processing subsystem configured to receive a plurality of files from a user for master data creation wherein the plurality of files includes unstructured data. Further, the processing subsystem includes a processing module operatively coupled to the receiving module wherein the processing module is configured to pre-process the unstructured data for ingestion by large language model. Further, the processing module is also configured to extract a plurality of components from the unstructured data using natural language processing techniques. Furthermore, the processing module is configured to map one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components. Moreover, the processing module is configured to identify a template from a plurality of predetermined templates based on the plurality of components. Moreover, the processing module is also configured to map the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format. Further, the processing subsystem includes a notification generation module operatively coupled to the processing module wherein the notification generation module is configured to notify an occurrence of errors to the user upon creating the master data. Furthermore, the processing subsystem includes a verification module operatively coupled to the notification generation module wherein the verification module is configured to enable the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. Moreover, the processing subsystem includes a pre-processing module operatively coupled to the verification module wherein the pre-processing module is configured to convert the master data into a required format acceptable for the laboratory information management system and manufacturing execution system.
[0026] FIG. 1 is a block diagram of a system (100) for template-based data formatting in accordance with an embodiment of the present disclosure. The system (100) includes a processing subsystem (105) hosted on a server (108). In one embodiment, the server (108) may include a cloud-based server. In another embodiment, parts of the server (108) may be a local server coupled to a user device (not shown in FIG.1). The processing subsystem (105) is configured to execute on a network (115) to control bidirectional communications among a plurality of modules. In one example, the network (115) may be a private or public local area network (LAN) or Wide Area Network (WAN), such as the Internet. In another embodiment, the network (115) may include both wired and wireless communications according to one or more standards and/or via one or more transport mediums. In one example, the network (115) may include wireless communications according to one of the 802.11 or Bluetooth specification sets, or another standard or proprietary wireless communication protocol. In yet another embodiment, the network (115) may also include communications over a terrestrial cellular network, including, a global system for mobile communications (GSM), code division multiple access (CDMA), and/or enhanced data for global evolution (EDGE) network.
[0027] The system (100) includes a receiving module (120) configured to receive a plurality of files from a user for master data creation. As used herein, the plurality of files may be a comma-separated values files (.csv), extensible markup language files (.xml), text files (.txt), image files (joint photographic experts group, portable network graphics), document files (portable document format, word document) and audio files. The plurality of files includes unstructured data. Typically, examples of the unstructured data includes, but is not limited to prescription drugs, over-the-counter medications, and medical devices on drug labels, package inserts, and patient information leaflets, including warnings, precautions, dosage instructions, side effects, contraindications, chemical reactions conditions, reagents used, reaction yields, and an unexpected observations or challenges encountered during synthesis process. For example, to find efficacy of a drug : take a round bottom flask, add 10 milliliters of solvent A, added 10 milliliters of solvent B, monitor reaction between the solvent A and solvent B. This unstructured data is used for master data creation. Typically, the master data is structured and stored in a custom format, ensuring the master data is organized according to requirements or conventions, specifically tailored to needs of organization or electronic systems used in pharmaceutical industry. As used herein, the master data is created for at least one of the Laboratory Information Management Systems (LIMS), Manufacturing Execution System (MES). Typically, the LIMS AND the MES are software-based techniques that streamline laboratory workflows, data management, and compliance in research, development, and quality control laboratories, and manage manufacturing processes in pharmaceutical production facilities.
[0028] Further, the system (100) includes a processing module (130) operatively coupled to the receiving module (120). The processing module (130) is configured to pre-process the unstructured data for ingestion by large language model to improve quality of the unstructured data. Typically, the processing module (130) performs various techniques to pre-process the unstructured data. Examples of the techniques includes, but is not limited to text cleaning, tokenization, chunking and parsing, stop word removal spell check, entity recognition, part of speech tagging, vectorization, data normalization and feature engineering. Further, the processing module (130) utilizes an identifier assigned to each of the plurality of files for the master data creation to ensure each of the plurality of file is uniquely identifiable and easily referenced or retrieved when needed. Typically, the identifiers facilitate traceback of the master data to source file, ensuring its origin, lineage, and quality and integrity. As used herein, the identifiers are unique labels or codes assigned to the plurality of files serving as a key for easy identification and retrieval.
[0029] Furthermore, the processing module (130) is also configured to extract a plurality of components from the unstructured data using natural language processing (NLP) techniques. Typically, the plurality of components are extracted using named entity extraction, key phrase extraction, sentiment analysis, parts of speech tagging, dependency parsing, entity relations and temporal information extraction. Examples of the plurality of components may include, but is not limited to drug name, dosage, and expiry date. Typically, the named entity extraction is used to identify drug name and disease. For example, say “X reduces symptoms using Y”, treatment with drug X. The named entity extraction identifies “X as the drug name”, and “Y as the disease”.
[0030] Moreover, the processing module (130) is configured to map one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components. Typically, mapping of one or more relationships between each of the plurality of components utilizes various plurality of techniques. The plurality of techniques includes, but is not limited to, named entity relationships, key phrase co-occurrence, sentiment-entity associations, dependency parsing and semantic role labelling, temporal associations, entity relations and knowledge bases and network analysis. Typically, the named entity relationships identifies a relationship between named entities such as drugs, genes, and diseases.
[0031] Additionally, the processing module (130) is configured to identify a template from a plurality of predetermined templates based on the plurality of components. Typically, creating the predetermined templates for common document structures including clinical trial reports, and drug labels.
[0032] Further, the processing module (130) is configured to map the plurality of components with the identified template using contextual analysis technique thereby creating the master data. The master data is in a custom format. For example, the identified template includes a drug name, reaction to the drug, patient demographics and event severity. Mapping the plurality of components includes drug name: A, reaction to the drug: gastro problem, patient demographics: female-42 years. Further, the contextual analysis technique ensures that the plurality of components is mapped correctly interpreted within the data being processed. Examples of the contextual analysis techniques includes, but is not limited to semantic understanding, contextual disambiguation, entity resolution, domain knowledge integration, relationship interference, and error correction and detection.
[0033] Furthermore, the processing subsystem (105) includes a notification generation module (140) operatively coupled to the processing module (130). The notification generation module (140) is configured to notify an occurrence of errors to the user upon creating the master data. Further, the notification generation module (140) is responsible for monitoring the creation of master data and detecting any errors, inconsistencies, or unexpected issues. Furthermore, the processing module (130) identifies nature and specifics of the occurrence of the errors, such as missing data, invalid entries, or formatting issues. The notification generation module (140) generates notifications to alert the user about the error. In one embodiment, the notifications is sent via email, SMS, pop-up messages, or dashboard alerts, providing details about the error and potential solutions. In another embodiment, the notification may also offer guidance on how to resolve the error, such as providing troubleshooting steps or directing the user to relevant resources. Further, the user acknowledges the notification and take appropriate action to address the error.
[0034] Moreover, the processing subsystem (105) includes a verification module (150) operatively coupled to the notification generation module (140). The verification module (150) is configured to enable the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. Typically, the verification module (150) offers a user-friendly interface for data managers, analysts, or subject matter experts to access and review the master data. Further, the users compare the master data with original source documents or reference data or unstructured document to ensure accuracy and consistency, identifying discrepancies or the errors.
[0035] Additionally, the processing subsystem (105) includes a pre-processing module (160) operatively coupled to the verification module (150). The pre-processing module (160) is configured to convert the master data into a required format acceptable for the laboratory information management system and manufacturing execution system. Typically, the pre-processing module (160) analyses LIMS and MES system format specifications, including data structure, field requirements, data types, and formatting rules. The pre-processing module (160) then maps the master data to corresponding the LIMS and MES formats, ensuring data alignment and seamless integration without loss of information. This ensures data is correctly aligned and integrated into target systems (LIMS AND MES) without any loss of information.
[0036] It is to be noted that the system may comprise, but is not limited to, a mobile phone, desktop computer, portable digital assistant (PDA), smart phone, tablet, ultra-book, netbook, laptop, multi-processor system, microprocessor-based or programmable consumer electronic system, or any other communication device that a user may use. In some embodiments, the system may comprise a display module (not shown) to display information (for example, in the form of user interfaces). In further embodiments, the system may comprise one or more of touch screens, accelerometers, gyroscopes, cameras, microphones, global positioning system (GPS) devices, and so forth.
[0037] In one embodiment, the various functional components of the system may reside on a single computer, or they may be distributed across several computers in various arrangements. The various components of the system may, furthermore, access one or more databases, and each of the various components of the system may be in communication with one another. Further, while the components of FIG. 1 are discussed in the singular sense, it will be appreciated that in other embodiments multiple instances of the components may be employed.
[0038] FIG. 2 is a block diagram of an exemplary embodiment of the system for template-based data formatting of FIG. 1 in accordance with an embodiment of the present disclosure. The system (100) includes a feedback module (170) operatively coupled to the verification module (150). The feedback module (170) is configured to enable the user to provide feedback with respect to the master data. Typically, the feedback module (170) offers a user-friendly interface for the users to provide the feedback on the master data. Users may input comments, suggestions or corrections based on the occurrence of the errors. Further, the feedback from the users is fed into the processing module (130). Typically, integration of the user feedback into the processing module (130) enhances the system's ability to continuously improve quality and accuracy of the master data creation.
[0039] In an example, consider a scenario where user ‘X’ (180) is conducting research to develop a new drug. The user ‘X’ (180) records experimental procedures, observations, and results in a paragraph format (plurality of files). Now, the user utilizes template-based data processing to convert the unstructured data to structured data or master data. The receiving module (120) accepts a plurality of files from the user ‘X’. The processing module (130) preprocesses the unstructured data to clean, standardize the unstructured data making it suitable for analysis. Further, the processing module (130) uses natural language processing techniques to extract relevant plurality of components from the unstructured data. The plurality of components includes, but is not limited to name of a drug, dosage, outcome, reactions, and patient name. Further, the processing module (130) identifies relationship between the plurality of components say the drug and its effects. Upon extracting, user X selects a template based on the plurality of components extracted. Further, the processing module (130) maps the plurality of templates extracted to the template identified. For example, drug names are assigned to the drug name filed, patient information is assigned to the patent details field. Furthermore, the notification generation module (140) detects errors upon alerting the user. Moreover, the verification module (150) enables the user to verify the master data created. Additionally, the pre-processing module (160) converts the master data into an acceptable format (LIMS and MES).
[0040] FIG. 3 is a block diagram of a computer or a server in accordance with an embodiment of the present disclosure. The server (200) includes processor(s) (230), and memory (210) operatively coupled to the bus (220). The processor(s) (230), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
[0041] The memory (210) includes several subsystems stored in the form of executable program which instructs the processor (230) to perform the method steps illustrated in FIG. 1. The memory (210) includes a processing subsystem (105) of FIG.1. The processing subsystem (105) further has following modules: receiving module (120), a processing module (130), a notification generation module (140), a verification module (150), and a pre-processing module (160).
[0042] The receiving module (120) operatively coupled to the processing subsystem (105) configured to receive a plurality of files from a user for master data creation wherein the plurality of files inlcudes unstructured data. Further, the processing subsystem (105) includes a processing module (130) operatively coupled to the receiving module (120) wherein the processing module (130) is configured to pre-process the unstructured data for ingestion by large language model to improve quality of the unstructured data. Further, the processing module (130) is also configured to extract a plurality of components from the unstructured data using natural language processing techniques. Furthermore, the processing module (130) is configured to map one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components. Moreover, the processing module (130) is configured to identify a template from a plurality of predetermined templates based on the plurality of components. Moreover, the processing module (130) is also configured to map the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format. Further, the processing subsystem (105) includes a notification generation module (140) operatively coupled to the processing module (130) wherein the notification generation module (140) is configured to notify an occurrence of errors to the user upon creating the master data. Furthermore, the processing subsystem (105) includes a verification module (150) operatively coupled to the notification generation module (140) wherein the verification module (150) is configured to enable the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system. Moreover, the processing subsystem (105) includes a pre-processing module (160) operatively coupled to the verification module (150) wherein the pre-processing module (160) is configured to convert the master data into a required format acceptable for the laboratory information management system and manufacturing execution system.
[0043] The bus (220) as used herein refers to internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (220) includes a serial bus or a parallel bus, wherein the serial bus transmits data in bit-serial format and the parallel bus transmits data across multiple wires. The bus (220) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus, and the like.
[0044] FIG. 4 (a) illustrates a flow chart representing the steps involved in a method (300) for template-based data formatting in accordance with an embodiment of the present disclosure. FIG. 4(b) illustrates continued steps of the method of FIG. 5(a) in accordance with an embodiment of the present disclosure. The method (300) includes receiving, by a receiving module, a plurality of files from a user for master data creation wherein the plurality of files includes unstructured data in step 305. Typically, the unstructured data does not have a predefined model or organization, making it more challenging to analyse and process than structured data. The unstructured data includes text documents, images, videos, audio files, social media posts, emails, and other forms of data that don't have a clear schema, making it more difficult to fit into traditional database or spreadsheet structures. Further, the master data is a structured data. Typically, the structured data is organized and formatted according to a predefined schema, typically stored in databases or structured file formats like tables in relational databases or spreadsheets.
[0045] In one embodiment, the user is allowed to input path of the plurality of files to the receiving module. As used herein, the path is a location or address of the plurality of files in a computer.
[0046] The method (300) also includes preprocessing, by a processing module, the unstructured data for ingestion by large language model in step 310. As used herein, the large language model includes, but is not limited to Mixtral, generative pre-trained transformer (GPT 4), and Gemini Pro.
[0047] In one embodiment, the processing module (130) utilizes an identifier assigned to each of the plurality of files for the master data creation to ensure each of the plurality of file is uniquely identifiable and easily referenced or retrieved when required.
[0048] Further, the method (300) includes extracting, by the processing module, a plurality of components from the unstructured data using natural language processing techniques in step 315. Examples of the plurality of components may include, but is not limited to drug name, dosage, and expiry date.
[0049] In one embodiment, the processing module (130) is configured to use a reinforcement learning technique to enhance accuracy in the master data by incorporating the feedback from the user.
[0050] In one embodiment, the master data is created for at least one of the information management system (LIMS), manufacturing execution system (MES).
[0051] Further, the method (300) includes mapping, by the processing module, one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components in step 320. Typically, the named entity relationship is a technique that identifies a relationship between the plurality of components.
[0052] Furthermore, the method (300) includes identifying, by the processing module, a template from a plurality of predetermined templates based on the plurality of components in step 325.Typically, templates are predefined structures or formats that standardize and organize representation of data extracted from unstructured data.
[0053] Moreover, the method (300) includes mapping, by the processing module, the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format in step 330. Typically, the contextual analysis technique aids in interpreting semantic context of the plurality of components from the unstructured data.
[0054] Additionally, the method (300) includes notifying, by a notification generation module, an occurrence of errors to the user upon creating the master data in step 335. As used herein, the occurrence of errors refers to detection of mistakes, inaccuracies, or inconsistencies in master data.
[0055] In one embodiment, the notification generation module (140) enables the user is allowed to rectify errors in the master data.
[0056] Further, the method (300) includes enabling, by a verification module, the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system in step 340. Typically, the system provides a user-friendly interface or dashboard where the users access and review the master data created from the unstructured data. This interface displays the master data in a readable and organized format, making it easy for users to navigate and analyze.
[0057] In one embodiment, the verification module (150) is configured to upload the master data into the laboratory information management system and manufacturing execution system upon verifying.
[0058] Furthermore, the method (300) includes converting, by a pre-processing module, the master data into a required format acceptable for the laboratory information management system and manufacturing execution system in step 345.
[0059] Various embodiments of the system and method for template-based data processing as described above provide an understanding of unstructured data, upon processing thereby solving the unstructured data challenges by utilizing the template-based data formatting. The template-based data formatting for creation of master data from the unstructured data enhances data quality and accuracy. The system (100) also uses natural language processing to extract plurality of components and map relationships between the plurality of components, ensuring consistency and compatibility with existing systems (LIMS and MES). The system (100) also features error detection and user validation, promoting data integrity and compliance. Further, the system (100) streamlines data processing, improves efficiency, and enhances the reliability of the master data.
[0060] The techniques described in this disclosure may be implemented, at least in part, in hardware, software, firmware, or any combination thereof. For example, various aspects of the described techniques may be implemented within one or more processors, including one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or any other equivalent integrated or discrete logic circuitry, as well as any combinations of such components. The term “processor” or “processing subsystem” may generally refer to any of the foregoing logic circuitry, alone or in combination with other logic circuitry, or any other equivalent circuitry. A control unit including hardware may also perform one or more of the techniques of this disclosure.
[0061] Such hardware, software, and firmware may be implemented within the same device or within separate devices to support the various techniques described in this disclosure. In addition, any of the described units, modules, or components may be implemented together or separately as discrete but interoperable logic devices. Depiction of different features as modules or units is intended to highlight different functional aspects and does not necessarily imply that such modules or units must be realized by separate hardware, firmware, or software components. Rather, functionality associated with one or more modules or units may be performed by separate hardware, firmware, or software components, or integrated within common or separate hardware, firmware, or software components.
[0062] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.
[0063] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0064] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
, Claims:1. A system (100) for template-based data formatting comprising:
characterized in that,
a processing subsystem (105) hosted on a server, wherein the processing subsystem (105) is configured to execute on a network (115) to control bidirectional communications among a plurality of modules comprising:
a receiving module (120) is configured to receive a plurality of files from a user for master data creation wherein the plurality of files comprises unstructured data;
a processing module (130) operatively coupled to the receiving module (120) wherein the processing module (130) is configured to:
pre-process the unstructured data for ingestion by a large language model;
extract a plurality of components from the unstructured data using natural language processing techniques;
map one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components;
identify a template from a plurality of predetermined templates based on the plurality of components; and
map the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format;
a notification generation module (140) operatively coupled to the processing module (130) wherein the notification generation module (140) is configured to notify an occurrence of errors to the user upon creating the master data;
a verification module (150) operatively coupled to the notification generation module (140) wherein the verification module (150) is configured to enable the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system; and
a pre-processing module (160) operatively coupled to the verification module (150) wherein the pre-processing module (160) is configured to convert the master data into a required format acceptable for the laboratory information management system and manufacturing execution system.
2. The system (100) as claimed in claim 1, wherein the processing module (130) utilizes an identifier assigned to each of the plurality of files for the master data creation to ensure each of the plurality of file is uniquely identifiable and easily referenced or retrieved when required.
3. The system (100) as claimed in claim 1, wherein the notification generation module (140) enables the user is allowed to rectify errors in the master data.
4. The system (100) as claimed in claim 1, wherein the verification module (150) is configured to upload the master data into the laboratory information management system and manufacturing execution system upon verifying.
5. The system (100) as claimed in claim 1, comprising a feedback module (170) operatively coupled to the verification module (150) wherein the feedback module (170) is configured to enable the user to provide a feedback with respect to the master data.
6. The system (100) as claimed in claim 5, wherein the feedback from the users is fed into the processing module.
7. The system (100) as claimed in claim 1, wherein the processing module (130) is configured to use a reinforcement learning technique to enhance accuracy in the master data by incorporating the feedback from the user.
8. The system (100) as claimed in claim 1, wherein the master data is created for at least one of the information management system (LIMS), manufacturing execution system (MES).
9. The system (100) as claimed in claim 1, wherein the errors comprises at least one of the duplicate records, invalid data format, and incomplete entries in the master data.
10. A method (300) for template-based data formatting comprising:
characterized in that,
receiving, by a receiving module, a plurality of files from a user for master data creation wherein the plurality of files comprises unstructured data; (305)
preprocessing, by a processing module, the unstructured data for ingestion by large language model; (310)
extracting, by the processing module, a plurality of components from the unstructured data using natural language processing techniques; (315)
mapping, by the processing module, one or more relationships between each of the plurality of components upon identifying, connections, interactions, and dependencies between the plurality of components; (320)
identifying, by the processing module, a template from a plurality of predetermined templates based on the plurality of components; (325)
mapping, by the processing module, the plurality of components with the identified template using contextual analysis technique thereby creating the master data wherein the master data is in a custom format; (330)
notifying, by a notification generation module, an occurrence of errors to the user upon creating the master data; (335)
enabling, by a verification module, the user to verify the master data created before uploading into the laboratory information management system and manufacturing execution system; (340) and
converting, by a pre-processing module, the master data into a required format acceptable for the laboratory information management system and manufacturing execution system. (345)
Dated this 25th day of April 2024
Signature
Jinsu Abraham
Patent Agent (IN/PA-3267)
Agent for the Applicant
| # | Name | Date |
|---|---|---|
| 1 | 202441032944-STATEMENT OF UNDERTAKING (FORM 3) [25-04-2024(online)].pdf | 2024-04-25 |
| 2 | 202441032944-REQUEST FOR EARLY PUBLICATION(FORM-9) [25-04-2024(online)].pdf | 2024-04-25 |
| 3 | 202441032944-PROOF OF RIGHT [25-04-2024(online)].pdf | 2024-04-25 |
| 4 | 202441032944-POWER OF AUTHORITY [25-04-2024(online)].pdf | 2024-04-25 |
| 5 | 202441032944-FORM-9 [25-04-2024(online)].pdf | 2024-04-25 |
| 6 | 202441032944-FORM FOR STARTUP [25-04-2024(online)].pdf | 2024-04-25 |
| 7 | 202441032944-FORM FOR SMALL ENTITY(FORM-28) [25-04-2024(online)].pdf | 2024-04-25 |
| 8 | 202441032944-FORM 1 [25-04-2024(online)].pdf | 2024-04-25 |
| 9 | 202441032944-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [25-04-2024(online)].pdf | 2024-04-25 |
| 10 | 202441032944-EVIDENCE FOR REGISTRATION UNDER SSI [25-04-2024(online)].pdf | 2024-04-25 |
| 11 | 202441032944-DRAWINGS [25-04-2024(online)].pdf | 2024-04-25 |
| 12 | 202441032944-DECLARATION OF INVENTORSHIP (FORM 5) [25-04-2024(online)].pdf | 2024-04-25 |
| 13 | 202441032944-COMPLETE SPECIFICATION [25-04-2024(online)].pdf | 2024-04-25 |
| 14 | 202441032944-STARTUP [26-04-2024(online)].pdf | 2024-04-26 |
| 15 | 202441032944-FORM28 [26-04-2024(online)].pdf | 2024-04-26 |
| 16 | 202441032944-FORM 18A [26-04-2024(online)].pdf | 2024-04-26 |
| 17 | 202441032944-FORM-26 [28-05-2024(online)].pdf | 2024-05-28 |
| 18 | 202441032944-FER.pdf | 2024-07-29 |
| 19 | 202441032944-FORM 3 [08-08-2024(online)].pdf | 2024-08-08 |
| 20 | 202441032944-FER_SER_REPLY [04-12-2024(online)].pdf | 2024-12-04 |
| 21 | 202441032944-US(14)-HearingNotice-(HearingDate-29-04-2025).pdf | 2025-03-12 |
| 22 | 202441032944-FORM-8 [02-04-2025(online)].pdf | 2025-04-02 |
| 23 | 202441032944-FORM-26 [25-04-2025(online)].pdf | 2025-04-25 |
| 24 | 202441032944-Correspondence to notify the Controller [25-04-2025(online)].pdf | 2025-04-25 |
| 25 | 202441032944-Written submissions and relevant documents [13-05-2025(online)].pdf | 2025-05-13 |
| 1 | Document1E_26-07-2024.pdf |