A System And Method Of Direct Context Based Structured Extraction And

< Back

A System And Method Of Direct Context Based Structured Extraction And Refinement Of Target Information From A Document In A Computing System

Abstract: A system and a method for structured extraction and refinement of target information from a new document (2041-M) in a computing system (100) is disclosed. In accordance therewith, a server (1021-N) of the computing system (100) executing a computing platform engine (170) trains the computing platform engine (170) with a number of documents (2101-L) including a first number of components of information (2121-L) analogous to a second number of components of target information (224) to identify the first number of components in accordance with an ML model (220). In accordance with the training (506), the server identifies, in the new document, the second number of components of the target information distributed thereacross, contextually generates the target information with the identified second number of components structured in a specific format and refines the target information in the specific format in accordance with a downstream processing requirement of a client device (1061-M). Figure to be published along with the Abstract: Fig. 5

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 September 2023

Publication Number

12/2025

Publication Type

INA

Invention Field

PHYSICS

Status

Parent Application

Applicants

Intellect Design Arena Limited

No. 244, Anna Salai, Chennai- 600 006 Tamil Nadu, India Email id: ip@eshwars.com

Inventors

1. Kishore Kumar Uthirapathy

Intellect Design Arena Ltd. Plot No. 3/G3, SIPCOT IT Park, Siruseri Chennai, Tamil Nadu 600130

2. Mandar Buddhikot

Intellect Design Arena Ltd. Plot No. 3/G3, SIPCOT IT Park, Siruseri Chennai, Tamil Nadu 600130

3. Peethambaram. M

Intellect Design Arena Ltd. Plot No. 3/G3, SIPCOT IT Park, Siruseri Chennai, Tamil Nadu 600130

4. Raman Jatkar

Intellect Design Arena Ltd. Plot No. 3/G3, SIPCOT IT Park, Siruseri Chennai, Tamil Nadu 600130

Specification

Description:FIELD OF INVENTION
The present invention relates generally to the field of document information extraction and, more particularly, to direct context-based structured extraction and refinement of target information from a document in a computing system.

BACKGROUND OF THE INVENTION
With an increasing number of businesses transitioning to online operations, data has become the lifeblood of the economy. As a result, the value of data has grown immensely. Effective analysis and utilization of this data requires collection, extraction, and/or transformation thereof into specific formats required by business and/or industry.
Extraction and/or transformation of data may play a vital role in enabling businesses and/or industries to analyze and/or evaluate essential information for providing services to customers and/or clients thereof. Data extraction may involve the collection and/or retrieval of various types of data from diverse sources, often characterized by poor organization and/or complete lack of structure. The process of data extraction and/or transformation may simplify data consolidation, processing and/or refinement, allowing end users to tailor the data to specific business needs thereof.
Data extraction may be integral to data cleaning, organization and/or storage preparation, and/or for analytical purposes. Data extraction may involve consolidating and/or unifying disparate data for subsequent transformation.
Traditional data extraction methods may encompass manual extraction, which relies solely on human effort without the aid of software or tools. However, these methods may be time-consuming and/or prone to errors. Optical Character Recognition (OCR)-based data extraction involves extracting data from printed and/or scanned documents, and template-based data extraction utilizes predefined patterns to extract text from unstructured documents. In such data extraction processes, a lot of time-consuming human interference and effort may be required.
In a manual data extraction process, the user may first receive a file and read the contents thereof. Then, the user may enter the data into a system, which may apply rules to the data. Subsequently, the data is pushed to an end device or a system. To make this more efficient, a recent and evolving technique involving Artificial Intelligence (AI)-enabled data extraction may be applied; this AI-based extraction may leverage AI and Machine Learning (ML)-based techniques and/or models to extract and/or transform diverse types of data.
The present invention solves the abovementioned efficiency-related issue with a
human-like and an even better level of accuracy with an AI-enabled system. The present invention in the form of a system, a device, and/or a method relates to direct context-based structured extraction and refinement of target information from a document in a computing system.

Various prior arts have disclosed data extraction and transformation techniques.
US Patent Application US20180165416A1 relates to a method of providing
healthcare-related, blockchain-associated cognitive insights, where the focus is on obtaining insights from healthcare data using ML. Insights are gained based on the context of the appearance of the data in input files. The data is extracted, normalized, and transformed. This method uses a combination of data collection, processing, and ML algorithms (supervised, unsupervised and probabilistic reasoning) to provide the healthcare-related cognitive insights.
In this prior art, the context of generating cognitive insights and word disambiguation is discussed generally; in contrast, the present invention discloses a targeted approach to the training of an ML model based on a direct context supplied thereto for the training. Further, this prior art pertains to cognitive insights that are based on semantic analyses, common sense reasoning, Natural Language Processing (NLP), temporal/spatial reasoning et al. Here, classification models are discussed generically, whereas, on the other hand, in the proposed invention, NLP and/or deep learning technologies, along with business validation and direct context, are used to achieve classification in a targeted approach.
Furthermore, the present prior art uses cognitive graphs to generate cognitive insight streams. Structured and unstructured data are processed to provide content description and summary. In this prior art, blockchain data is being used by cognitive platforms for the delivery of digital assets, whereas, in the proposed invention, data is dispatched and used for making business decisions like insurance-related clearance, knockout, etc.
This prior art discloses providing cognitive insight streams, whereas the present invention provides custom transformation of data. In this prior art, ingested data is fed to an enrichment component and the data enrichment is in the form of sentiment analysis, geotagging, entity detection, familiar filtering, etc., whereas, on the other hand, the proposed invention provides data enrichment in the form of filling in missing data and/or editing erroneous data.
Another US patent application US11379506B2 purports to perform similarity analysis, for which synonymous data is found and contextual understanding of data is required. The data is contextually extracted, classified, and transformed once validated. The system implemented in this prior art performs similarity metric analysis and data enrichment using knowledge sources. Not much is revealed about the context in which the system envisioned in the prior art is employed. Further, the transformation methods are different from those of the present invention. The data enrichment process disclosed in this prior art is not related to patching missing data from the document.
The present invention is set apart from all the prior arts by virtue of critical features thereof with regard to training an ML model with documents including information sets similar or analogous to components of target information based on providing direct or active context to the training through a document, email content and/or content accessible via a File Transfer Protocol (FTP) link, a file management/document management system and the like, decision making based on learning (e.g., deep learning) by self or external judgment, data extraction also based on direct or active context provided thereto, and data enrichment.
Specifically, the present invention employs a direct or active context document and/or content to train an ML model to identify information similar or analogous to components of target information, which has not been done in any of the prior arts disclosed above. Even if one or more of the prior arts discussed above discloses contexts, these contexts are indirect contexts, implying that the system or the model involved therein figures out the contexts based on user/content interaction and/or characteristics. It is especially important to note that direct or active context documents have not only been used for training purposes but also have been used for data extraction. The ML model or the setup may even modify the context document based on feedback from outputs of the ML model and/or learnings therefrom.
The present invention enables the extraction of data from various documents such as spreadsheets (e.g., Microsoft® Excel® spreadsheets). For instance, in the insurance industry, a Statement of Values (SOV) document may be obtained on behalf of clients/prospective clients seeking insurance usually through brokers. The SOV document may be used by insurance companies to assess the value of the insured entity (e.g., property) and determine the appropriate coverage and/or premiums. The SOV document may typically be prone to data extraction errors as the SOV may be associated with forms or Microsoft® Excel® templates. These forms or Microsoft® Excel® templates to be used by the brokers are not standardized and may vary vastly across one another. Insurance companies use the information in the SOV document to calculate potential risk exposure and set appropriate coverage limits. Specific values listed in the document may help determine the insurance premium a policyholder or a prospective policyholder needs to pay.
The system implementation of the present invention may be customized according to use cases and may be implementation agnostic. The system implementation may execute on a standalone server, a data processing device, a number of data processing devices or servers, and/or in a cloud and/or a distributed network environment including servers and/or data processing devices.

OBJECTS OF THE INVENTION
The main objective of the present invention is to provide a method, a device and/or a system that provides for direct or active context-based structured extraction and refinement of target information from a document in a computing system.
A primary objective of the present invention is to provide an ML model with documents that include information sets that are similar or analogous to components of target information.
Yet another objective of the present invention is to provide the direct or the active context to the training via the ML model in the form of a document, general content and/or email content accessible by the ML model.
Further, it is an object of the present invention to provide direct or active context for data extraction.
It is another object of the present invention to provide data enrichment by replacing and/or editing existing information and/or filling in missing information.
It is another object of the present invention to provide an intelligent data extraction, classification, processing, and transformation system to enable input data in various formats.
It is another object of the present invention to employ a learning model to conduct learning of data sets to enable application thereof during data capture and/or updates to data.
It is another object of the present invention to perform contextual data understanding and extraction to facilitate data classification.
It is another object of the present invention to provide an implementation-agnostic system to facilitate deployment of one or more elements thereof on any type of server.
It is another object of the present invention to ingest files from different sources and in different formats, extract data therefrom in a contextualized manner, and normalize the extracted data.
It is another object of the present invention to take feedback from an output of the ML model and process the feedback as learnings through the ML model.
It is another object of the present invention to perform data enrichment in various sheets and tabs of spreadsheet documents in a contextualized manner and implement learnings from outputs of the ML model.
It is another object of the present invention to provide a system implementing the above objectives in an insurance-processing context that requires significant data transformation and/or manipulation.
It is another object of the present invention to provide a system to read, classify, transform, normalize, and present actuarial data.
It is another object of the present invention to extract, process, and present actuarial data based on the context of the training data set provided and the format specified therein.
It is another object of the present invention to reduce a time required for processing actuarial data and increase a number of insurance applications processed through a system implementing the present invention.
It is another object of the proposed invention to take in unstructured data and provide a user-requested and/or a required format of processed actuarial data, corresponding to data in an input document.
It is another object of the present invention to prevent underinsurance due to errors in interpreting and/or processing input data during insurance processing.
SUMMARY OF THE INVENTION

The present invention discloses system and a method for direct context based structured extraction and refinement of target information from a document in a computing system.
In one aspect, a method includes executing, through one or more server(s), instructions associated with a computing platform engine, and training the computing platform engine with a number of documents including a first number of components of information analogous to a second number of components of target information to identify the first number of components in accordance with an ML model. The first number of components is distributed across the number of documents in varied layouts and/or in non-standardized formats.
In accordance with the training and through the computing platform engine, the method also includes extracting interpretational context with respect to a new document from content distinct from the new document, identifying, in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context, generating the target information with the identified second number of components structured in a specific format, and refining the target information in the specific format in accordance with a downstream processing requirement of a client device communicatively coupled to the one or more server(s) through a computer network.
In another aspect, a server includes a memory including instructions associated with a computing platform engine stored therein, and a processor executing the instructions associated with the computing platform engine to train the computing platform engine with a number of documents including a first number of components of information analogous to a second number of components of target information to identify the first number of components in accordance with an ML model. The first number of components is distributed across the number of documents in varied layouts and/or in non-standardized formats.
In accordance with the training, the processor further executes the instructions associated with the computing platform engine to extract interpretational context with respect to a new document from content distinct from the new document, identify, in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context, generate the target information with the identified second number of components structured in a specific format, and refine the target information in the specific format in accordance with a downstream processing requirement of a client device communicatively coupled to the server through a computer network.
In yet another aspect, a computing system includes a server executing a computing platform engine thereon, and a client device communicatively coupled to the server through a computer network. The server executes the computing platform engine to train the computing platform engine with a number of documents including a first number of components of information analogous to a second number of components of target information to identify the first number of components in accordance with an ML model. The first number of components is distributed across the number of documents in varied layouts and/or in non-standardized formats.
In accordance with the training, the server also executes the computing platform engine to extract interpretational context with respect to a new document from content distinct from the new document, identify, in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context, generate the target information with the identified second number of components structured in a specific format, and refine the target information in the specific format in accordance with a downstream processing requirement of the client device.
Other features will be apparent from the accompanying drawings and from the detailed description that follows.

BRIEF DESCRIPTION OF THE DRAWINGS
The embodiments of this invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Figure 1 is a schematic view of a computing system, according to one or more embodiments.
Figure 2 is a schematic view of training a computing platform engine executing on a server in a ML environment of the computing system of Figure 1, according to one or more embodiments.
Figure 3 is a schematic view of a normalization module of the computing platform engine of Figures 1-2 performing a normalization process following a classification process, according to one or more embodiments.
Figure 4 is a schematic view of a transformation module of the computing platform engine of Figures 1-3 effecting a transformation of a common structure of data, according to one or more embodiments.
Figure 5 is a process flow diagram detailing the operations involved in contextual and structured extraction, normalization, and refinement of target information from a document in the computing system of Figure 1, according to one or more embodiments.
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.

DETAILED DESCRIPTION OF THE INVENTION WITH RESPECT TO THE DRAWINGS
Example embodiments, as described below, may be used to provide a system and a method for direct context based structured extraction and refinement of target information from a document in a computing system. Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments.

Figure 1 shows a computing system 100 (e.g., a distributed computing system), according to one or more embodiments. In one or more embodiments, computing system 100 may include a number of servers 1021-N communicatively coupled to one another through a computer network 104. In one or more embodiments, computing system 100 may also include a number of client devices 1061-M communicatively coupled to the number of servers 1021-N through computer network 104. In one or more embodiments, computing system 100 is a distributed computing system. In one or more embodiments, computer network 104 may include but is not limited to a Wide Area Network (WAN), a Local Area Network (LAN) and/or a short range network such as WiFi® and Bluetooth®. Other forms of computer network 104 are within the scope of the exemplary embodiments discussed herein. As implied above, computer network 104 may, in some embodiments, encompass a wide variety of computer networks.

In one or more embodiments, computing system 100 may be a traditional computing system with data centers, a cloud-based computing system, or a hybrid computing system. In one or more embodiments, computing system 100 may provide one or more computing environment(s) through a computing platform 150 thereof. Computing platform 150, as discussed herein, may encompass hardware, software, operating systems, applications including browsers, and/or virtualized environments including virtual machines representing operating systems, virtual storage etc.
Figure 1 shows a computing platform engine 170 (e.g., a comprehensive engine enabling capabilities and functionalities provided through computing platform 150) executing on a server 1021 and components (e.g., computing platform engine components 1702-N; computing platform engine components 1761-M) executing on other servers 1022-N and client devices 1061-M. Examples of client devices 1061-M may include but are not limited to desktop computers, laptops, notebooks, smart devices, and mobile phones. Other client devices 1061-M including thin clients are within the scope of the exemplary embodiments discussed herein. Each client device 1061-M may represent a type of channel for accessing capabilities provided through computing platform engine 170.
Each server 1021-N discussed above may be a standalone server or a network of servers configured to operate in conjunction with one another and capable of handling large sets of data. “Processor,” as used herein, may also refer to a number of processors configured to operate in conjunction with one another. In one or more embodiments, server 1021 may include a processor 1721 communicatively coupled to a memory 1741.
Figure 1 shows computing platform engine 170 stored in memory 1741; computing platform engine 170 is executable through processor 1721. Similarly, computing platform engine components 1702-N and computing platform engine components 1761-M may be stored in corresponding memories to be executed on processors associated therewith. All processes and operations discussed herein may be performed through computing platform engine 170 and/or components thereof.
In one example implementation, computing platform 150 (or computing platform engine 170) may implement an insurance processing system. An insured party (e.g., a company, an individual) may submit to an insurance company a Statement of Values (SOV) in the form of a document (e.g., a Microsoft® Excel® spreadsheet). Said document may include, for example, one or more addresses, monetary value of the insurables (e.g., in a property) and Construction Occupancy Protection Exposure (COPE) information of the insurables (e.g., properties). In a typical setup, insurance brokers may create an SOV report for the insurables (e.g., commercial properties) and submit it to insurance companies.
Each broker may maintain their own standards thereof with regard to the SOV report, with there being no standardized template or format of the SOV report. A typical SOV report may predominantly include spreadsheets. Interpretation of the SOV report by an underwriter (e.g., at a client device 1061-M executing computing platform engine component 1761-M; the underwriter may be an individual, a banking institution, a financial institution and so on) on behalf of one or more insurance companies may, thus, result in inaccuracies and errors. Further, the same SOV report may be interpreted differently by different underwriters who evaluate risks associated with the insured in terms of, say, the business/employment associated therewith and/or the properties thereof. Because of the inaccuracies and the ambiguities that compromise on the accuracy of the values, the underwriter may accept the risk in exchange for a premium but the insured may be "underinsured."

To solve the issues discussed above, an insurance company may itself manually extract information from the spreadsheets constituting the SOV report and format the spreadsheets in a way acceptable to a downstream process in computing system 100. This may prove to be tedious as the extraction, for example, may involve formulating tabular data, modifying columns, and applying business rules; the manual nature of a lot of these may also prove to be time consuming. This set of disadvantages may limit the number of insurances provided through the insurance companies.
Also, the information may be extracted from insurance related documents by entities such as Knowledge Process Outsourcing (KPO)/Business Process Outsourcing (BPO) companies. Examples of these insurance related documents may include but are not limited to property documents, liability accounts documents and commercial verification related documents, and examples of the information extracted may include but are not limited to (user) name, location, and risks. It should be noted that the exemplary embodiments discussed herein relate to SOV or spreadsheets merely for the sake of illustration and that the concepts embedded herein are extensible to all forms of information extractable through documents including spreadsheets.
Figure 2 shows training computing platform engine 170 on server 1021 (e.g., a network of servers, or servers 1021-N in general) in an ML environment of computing system 100, according to one or more embodiments. Here, in one or more embodiments, a user 2021-M (e.g., an insurance broker, an underwriter) at a client device 1061-M may transmit one or more spreadsheet documents (e.g., spreadsheet document 2041-M) to another user 2021-M (e.g., an entity associated with computing platform engine 170) at another client device 1061-M; alternatively, spreadsheet document 2041-M may be transferred to the another client device 1061-M through storage means.
In one or more embodiments, the document (e.g., spreadsheet document 2041-M) may be sent by e-mail by a first user 2021-M at a first client device 1061-M to an e-mail account of a second user 2021-M at a second client device 1061-M. In one or more embodiments, the
e-mail (e.g., e-mail 240) itself may include text content that provides context (e.g., context 242) relevant to the document attached with the e-mail. For example, context 242 may include instructions and statements relevant to the interpretation of spreadsheet document 2041-M. In another implementation, spreadsheet document 2041-M and context 242 may be available as separate documents downloadable from a web source/the cloud or from an access-control-based setting. In yet another implementation, Application Programming Interfaces (APIs) serving as connections to download spreadsheet document 2041-M and/or context 242 may be available. Here, the title of the document relevant to context 242 and/or content thereof may be utilized in interpreting spreadsheet document 2041-M.
In one or more other implementations, even intelligent agents such as chat bots or Robotic Process Automation (“RPA”) bots may supply context 242 relevant to interpreting the document discussed above. In one or more embodiments, e-mail 240 discussed above may have replies (e.g., including further attached documents) thereto in a chain; a reply may further clarify context 242 (and may be interpreted as context 242) and may correct it, leading to a revised context 242. Here, in one or more embodiments, the replies may be used by computing platform engine 170 to clarify context 242 or modify it. Further, in one or more other embodiments, the document and/or context 242 may be scanned or captured on one or more image(s) through a client device 1061-M such as a mobile phone. Still further, context 242 may be accessible via, say, a File Transfer Protocol (FTP) link, a file management system and/or a document management system. All reasonable variations are within the scope of the exemplary embodiments discussed herein.
Figure 2 shows memory 1741 of server 1021 as including a number of training spreadsheet documents 2101-L, according to one or more embodiments. In one or more embodiments, training spreadsheet documents 2101-L may have SOV 2121-L distributed across different layouts (and/or non-standardized formats) thereof such that SOV 2121-L in one training spreadsheet document 2101-L may be in a different layout (and/or in a different non-standardized format) compared to SOV 2121-L of another training spreadsheet document 2101-L.
In one or more embodiments, training spreadsheet documents 2101-L may, thus, feed an ML model 220 of computing platform engine 170 with enough training data to learn from. Thus, in one or more embodiments, even if spreadsheet document 2041-M may be in a form that is non-standardized, computing platform engine 170 may be trained enough to only extract the relevant information (e.g., desired SOV 224) therefrom out of all other information present therein. As will be discussed below, in one or more embodiments, following the identification of desired SOV 224, business rules may be applied "on-the-fly."
In one or more embodiments, a human expert or supervisor 226 of ML model 220 may be present at yet another client device 1061-M of computing system 100. In one or more embodiments, this human expert or supervisor 226 may optionally supervise the results of ML model 220 by way of desired SOV 224 through a corresponding computing platform engine component 1761-M (e.g., using a Graphical User Interface (“GUI”)) thereof. In one or more embodiments, the optionality of the human supervision and the utilization thereof by computing platform engine 170 only when required may significantly reduce the human resources and the time invested in the extraction of desired SOV224 from spreadsheet document 2041-M. In one or more embodiments, this may significantly increase the number of insurance applications that are processed by insurance companies/brokers/underwriters (or, other personnel).
In one or more embodiments, the extraction processes relevant to SOV 224 and other information may involve both text extraction (e.g., scanning, Optical Character Recognition (“OCR”) based extraction, Portable Document Format (“PDF”) extraction) and trained ML models for contextual extraction (e.g., in conjunction with context 242).
In one or more embodiments, as part of identifying desired SOV 224, computing platform engine 170 may, in conjunction with ML model 220, subject spreadsheet document 2041-M through a classification process 250. For example, an uploaded
(or, e-mailed) spreadsheet document 2041-M may have a number of worksheets or tabs therein, out of which a subset of the worksheets or tabs includes desired SOV 224 in varied layouts and/or non-standardized formats. In one or more embodiments, desired SOV 224 may be duplicated across worksheets or tabs of the subset or separate elements thereof spread across the worksheets or tabs of the subset.
In one or more embodiments, the worksheets/tabs of the subset may also include information such as but not limited to images, headers, footers, descriptions, and disclaimers. In one or more embodiments, as the first operation of the classification process, tabular information with headers and data rows separately may be identified. In one or more embodiments, once tabular information across the worksheets of the subset discussed above is identified, a classification module 260 of the computing platform engine 170 may apply a classification model 270 on the headers and the data rows. In one or more embodiments, all the worksheets of the subset may go through classification model 270 as a repetitive process. Figure 2 shows spreadsheet document 2041-M stored in memory 1741 along with training spreadsheet documents 2101-L. Figure 2 also shows classified data 280 (e.g., classified tabular data from the subset) as an output of classification module 260 and as an input to a normalization module 262 communicatively coupled to classification module 260.
In one or more embodiments, as part of classification process 250, each parameter of information (e.g., SOV 224) to be extracted may be baselined at a highly granular level; therefore, the result of classification process 250, i.e., classified data 280, may have a high likelihood of being significantly accurate.
Figure 3 shows normalization module 262 performing a normalization process following the classification process, according to one or more embodiments. In one or more embodiments, the normalization process may be applied to each column header and column data of classified data 280 corresponding thereto. In one or more embodiments, once specific columns of classified data 280 are identified in accordance with ML model 220, the normalization of column headers may be accomplished. In one example scenario, spreadsheet document 2041-M may include a specific address in various column titles across worksheets thereof. One worksheet 3101 of classified data 280 may include the address under a column title "Full Address," another worksheet 3102 may distribute the address under multiple column titles such as "Street Address," "City," "State," and "Zip," yet another worksheet 3103 may distribute the address across column titles such as "Address Line 1," "Address Line 2," "City," "State," "Zip Code," and "Country," and so on.
In one or more embodiments, worksheets of classified data 280 may be processed through normalization module 262 to normalize the address (example desired SOV 224 or a part thereof) into a common structure 350 such as: ["Address Line 1," "Address Line 2," "Address Line 3," "City," "State," "Zip Code," "County," "Country"]. All reasonable variations are within the scope of the exemplary embodiments discussed herein.
Referring back to Figure 2, the output of normalization, i.e., common structure 350, may be input to a transformation module 264 of computing platform engine 170, according to one or more embodiments. Figure 4 shows transformation module 264 of computing platform engine 170 effecting a transformation of common structure 350, according to one or more embodiments.
In one or more embodiments, a downstream requirement (e.g., that of an underwriter at a client device 1061-M) of computing system 100 may relate to a specific format of common structure 350. For example, an underwriter may require the SOV of common structure 350 to be in a specific format. In one or more embodiments, the transformation process transforms common structure 350 into the desired SOV 224 based on dynamically applying the downstream requirement thereto. In some embodiments, the downstream requirement may be predetermined before the dynamic application thereof or also dynamically determined.
In one or more embodiments, subsequent to the transformation, a validation process and an enrichment process may be applied on desired SOV 224. In one or more embodiments, desired SOV 224 may be validated by human expert or supervisor 226 at a client device 1061-M or another entity in the downstream. For example, in the case of address, the downstream requirement may involve Zip Code being a mandatory field. Here, if the Zip Code is missing, the address may be validated during the transformation and a missing field error may be raised; alternatively, the missing field may be proactively searched for (e.g., through Google® Application Programming Interfaces (APIs)) and desired SOV 224 enriched.
Additionally, in one or more embodiments, the enrichment may be percolated to spreadsheet document 2041-M itself. In one or more embodiments, this implies that the source entity (e.g., at a client device 1061-M) of spreadsheet document 2041-M itself may benefit through the processes discussed above. In one or more embodiments, the source entity may be an insurance broker, an insurance customer, or even an insurance company/underwriter. In one or more embodiments, any validation or a modification done through human expert/supervisor 226 or a downstream element may be fed back into ML model 220 to retrain said ML model 220. Additionally, in one or more embodiments, spreadsheet document 2041-M itself may be utilized by ML model 220 for refinement thereof. All reasonable variations are within the scope of the exemplary embodiments discussed herein.
It should be noted that intelligent agents such as chat bots and RPA bots may be employed to further enrich the desired SOV 224 (information in general) and/or spreadsheet document 2041-M. Expert responses or customer service request responses may also be incorporated therein. While exemplary embodiments have been discussed in the context of insurance and spreadsheet documents, concepts discussed herein may be portable across contexts involving document identification and information extraction, classification, normalization, and transformation discussed above. Further,
non-spreadsheet-based documents are also within the scope of the exemplary embodiments discussed herein.
Figure 5 shows a process flow diagram detailing the operations involved in contextual and structured extraction, normalization, and refinement of target information from a document in a computing system (e.g., computing system 100), according to one or more embodiments. In one or more embodiments, operation 502 may involve executing, through one or more server(s) (e.g., server 1021), instructions associated with a computing platform engine (e.g., computing platform engine 170). In one or more embodiments, operation 504 may involve training the computing platform engine with a number of documents (e.g., training spreadsheet documents 2101-L) including a first number of components of information (e.g., SOV 2121-L) analogous to a second number of components of target information (e.g., SOV 224) to identify the first number of components in accordance with an ML model (e.g., ML model 220).
In one or more embodiments, the first number of components may be distributed across the number of documents in varied layouts and/or in non-standardized formats. In one or more embodiments, operation 506 may involve, in accordance with the training and through the computing platform engine, extracting interpretational context (e.g., context 242) with respect to a new document (e.g., spreadsheet document 2041-M) from content distinct from the new document, identifying, in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context, generating the target information with the identified second number of components structured (e.g., common structure 350) in a specific format, and refining the target information in the specific format in accordance with a downstream processing requirement of a client device (e.g., client device 1061-M) communicatively coupled to the one or more server(s) through a computer network
(e.g., computer network 104).
The structures and modules in the figures may be shown as distinct and communicating with only a few specific structures and not others. The structures may be merged with each other, may perform overlapping functions, and may communicate with other structures not shown to be connected in the figures.
Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

, Claims:We Claim,
1. A method for structured extraction and refinement of target information from a new document (2041-M) in a computing system (100), comprising:
executing (502), through at least one server (1021-N) in the computing system (100), instructions associated with a computing platform engine (170);
training (504) the computing platform engine (170) with a plurality of documents (2101-L) comprising a first number of components of information (2121-L) analogous to a second number of components of target information (224) to identify the first number of components in accordance with a Machine Learning (ML) model (220), the first number of components distributed across the plurality of documents at least one of: in varied layouts and in non-standardized formats; and
in accordance with the training and through the computing platform engine (170):
extracting (506) interpretational context (242) with respect to the new document (2041-M) from content distinct from the new document;
identifying (506), in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context (242);
generating (506) the target information with the identified second number of components structured in a specific format; and
refining (506) the target information in the specific format in accordance with a downstream processing requirement of a client device (1061-M) communicatively coupled to the at least one server through a computer network (104).
2. The method as claimed in claim 1, comprising extracting, through the computing platform engine (170), the interpretational context from at least one of: an electronic mail, a separate document, and an intelligent agent as the content distinct from the new document.
3. The method as claimed in claim 1, further comprising supervising a result of the identification of the second number of components of the target information through another client device communicatively coupled to the at least one server (1021-N) through the computer network.
4. The method as claimed in claim 1, wherein refining the target information in the specific format comprises supplying at least one piece of missing information in the target information through the computing platform engine (170).
5. The method as claimed in claim 1, further comprising feeding a modification to the target information in the specific format done at the client device (1061-M) back into the ML model (220) to retrain the ML model (220).
6. The method as claimed in claim 3, further comprising feeding a modification done at the another client device pertaining to the target information back into the ML model (220) to retrain the ML model (220).
7. The method as claimed in claim 1, further comprising the ML model (220) utilizing the new document for refinement thereof.
8. A computing system (100) for structured extraction and refinement of target information from a new document (2041-M) comprising:
a server (1021-N) executing a computing platform engine (170) thereon; and
a client device (1061-M) communicatively coupled to the server (1021-N) through a computer network (104),
wherein the server (1021-N) executing the computing platform engine (170):
trains the computing platform engine (170) with a plurality of documents (2101-L) comprising a first number of components of information (2121-L) analogous to a second number of components of target information (224) to identify the first number of components in accordance with an ML model (220), the first number of components distributed across the plurality of documents at least one of: in varied layouts and in non-standardized formats, and
in accordance with the training:
extract interpretational context (242) with respect to the new document from content distinct from the new document,
identify, in the new document, the second number of components of the target information distributed across the new document based on the extracted interpretational context,
generate the target information with the identified second number of components structured in a specific format, and
refine the target information in the specific format in accordance with a downstream processing requirement of the client device.
9. The computing system (100) as claimed in claim 8, wherein the interpretational context (242) is extracted from at least one of: an electronic mail, a separate document and an intelligent agent as the content distinct from the new document.
10. The computing system (100) as claimed in claim 8, wherein the server (1021-N) executing the computing platform engine (170) supplies at least one piece of missing information in the target information as part of refining the target information in the specific format.

Documents

Application Documents

#	Name	Date
1	202341063248-STATEMENT OF UNDERTAKING (FORM 3) [20-09-2023(online)].pdf	2023-09-20
2	202341063248-POWER OF AUTHORITY [20-09-2023(online)].pdf	2023-09-20
3	202341063248-FORM 1 [20-09-2023(online)].pdf	2023-09-20
4	202341063248-DRAWINGS [20-09-2023(online)].pdf	2023-09-20
5	202341063248-DECLARATION OF INVENTORSHIP (FORM 5) [20-09-2023(online)].pdf	2023-09-20
6	202341063248-COMPLETE SPECIFICATION [20-09-2023(online)].pdf	2023-09-20
7	202341063248-FORM 18 [21-09-2023(online)].pdf	2023-09-21
8	202341063248-Proof of Right [11-10-2023(online)].pdf	2023-10-11
9	202341063248-RELEVANT DOCUMENTS [09-09-2024(online)].pdf	2024-09-09
10	202341063248-FORM 13 [09-09-2024(online)].pdf	2024-09-09
11	202341063248-Covering Letter [10-10-2024(online)].pdf	2024-10-10
12	202341063248-CERTIFIED COPIES TRANSMISSION TO IB [10-10-2024(online)].pdf	2024-10-10
13	202341063248-POA [22-07-2025(online)].pdf	2025-07-22
14	202341063248-FORM 13 [22-07-2025(online)].pdf	2025-07-22