Data Parsing System And Method Thereof

< Back

Data Parsing System And Method Thereof

Abstract: The present subject matter discloses an effective and efficient system and method for parsing data, from disparate data sources over a computer based platform. The system receives structured or unstructured data associated with the domain specific entry from disparate data sources. Further performs parsing of the structured or unstructured data. The system is further configured to interpret the parsed data, perform combined predictive and descriptive analysis of this data and correlating the previously stored data with interpreted parsed data and domain terms and generating a summarized report of the correlated data in the form of statistic and probabilistic analysis.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

21 June 2017

Publication Number

27/2017

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ip@stratjuis.com

Parent Application

Applicants

Innov4Sight Health and Biomedical Systems Private Limited

B-107, Aban Essence, Kudlu Village, Madiwala Post, Bangalore - 560068.

Inventors

1. Vijayagopal Rangarajan

B-107, Aban Essence, Kudlu Village, madiwala post, bangalore – 560068.

2. Geetha Sanjay

# 314, 1st Cross, 10th Main, 7th Block, IV phase, BSK III stage Bangalore 560078

3. Balachander Agoramurthy

#202 Gaurav Arcade 4, 14 1A Cross Road, Manorayana Palaya, New Ext. RT Nagar, Bangalore – 560032

Specification

Claims:WE CLAIM: -

1. A method for parsing data from disparate data sources over a computer based platform, the method comprising:
uploading, via a processor, plurality of master data on the computer-based platform, wherein the master data comprises:
variety of entries associated with a specific domain and metadata associated with each entry;
category list having predefined ID’s associated with domain terms;
storing, via the processor 201, a plurality of data and domain terms in a data repository 210 coupled with the processor 201;
receiving, via the processor 201, plurality of structured or unstructured data associated with the domain specific entry from disparate data sources;
preparing, via the processor 201, the data for further processing;
performing, via the processor, parsing of the data,
interpreting, via the processor, the parsed data;
correlating, via the processor, the metadata with interpreted parsed data and domain terms;
enabling, the processor 201, to summarize the correlated data in the form of statistic and probabilistic analysis.
2. The method claimed in claim 1, wherein the domain is healthcare, natural science, pharmaceutical and like.

3. The method claimed in claim 1, wherein the domain terms are captured from domain specific standard database, dictionaries and like.

4. The method claimed in claim 1, wherein the domain term comprising a group of terms related to specific domain, a group of terms that are spelled differently or terms that have multiple names that are the same actual terms and like.

5. The method claimed in claim 1, wherein preparation of data comprises identification, packaging and normalisation of the structured and unstructured data.

6. The method claimed in claim 1, wherein the parsing of plurality of data comprises a combined predictive and descriptive analysis of data and correlating and comparing, analysed data with the master data.

7. The method claimed in claim 1, wherein the master data is updated based on the statistic and probabilistic analysis.

8. A system for parsing data from disparate data sources over a computer based platform, the system comprising:
a processor; and
a memory coupled with the processor, wherein the processor executes programmed instructions stored in the memory for: -
uploading, plurality of master data on the computer-based platform, wherein the master data comprises:-
variety of entries associated with a specific domain and metadata associated with each entry
category list having predefined ID’s associated with domain terms;
storing, a plurality of data and domain terms in a data repository 210 coupled with the processor 201;
plurality of structured or unstructured data associated with the domain specific entry from disparate data sources;
preparing, the data for further processing;
performing, parsing of the data,
interpreting, the parsed data;
correlating, the metadata with interpreted parsed data and domain terms;
enabling, to summarize the correlated data in the form of statistic and probabilistic analysis.
9. The system claimed in claim 8, wherein the domain is healthcare, natural science, pharmaceutical and like.

10. The system claimed in claim 8, wherein the domain terms are captured from domain specific standard database, dictionaries and like.

11. The system claimed in claim 8, wherein the domain term comprising a group of terms related to specific domain, a group of terms that are spelled differently or terms that have multiple names that are the same actual terms and like.

12. The system claimed in claim 8, wherein preparation of data comprises identification, packaging and normalisation of the structured and unstructured data.

13. The system claimed in claim 8, wherein the parsing of plurality of data comprises a combined predictive and descriptive analysis of data and correlating and comparing, analysed data with the master data.
14. The system claimed in claim 8, wherein the master data is updated based on the statistic and probabilistic analysis.

15. A non-transitory computer readable medium storing a program of parsing data from disparate data sources, the program comprising:
a program code for uploading, plurality of master data on the computer-based platform, wherein the master data comprises :
variety of entries associated with a specific domain and metadata associated with each entry; and
category list having predefined ID’s associated with domain terms;
a program code for storing, a plurality of data and domain terms in a data repository 210 coupled with the processor 201;
a program code for plurality of structured or unstructured data associated with the domain specific entry from disparate data sources;
a program code for preparing, the data for further processing;
a program code for performing, parsing of the data,
a program code for interpreting, the parsed data;
a program code for correlating, the metadata with interpreted parsed data and domain terms;
a program code for enabling summarization of the correlated data in the form of statistic and probabilistic analysis.
Dated this 21th Day of June 2017
Priyank Gupta
Agent for Applicant
IN-PA-1454
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
DATA PARSING SYSTEM AND METHOD THEREOF

Applicant:
Innov4Sight Health and Biomedical Systems Private Limited
a company incorporated in India
Having address:
B-107, Aban Essence, Kudlu Village, Madiwala Post, Bangalore - 560068

The following specification describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The present disclosure relates generally to a method and a system for parsing data, and more particularly to parsing of binary data.
BACKGROUND
These days, the biggest challenge in healthcare is the fact that there is an explosion of scientific knowledge. Doctors’ and Physicians’ have to assimilate all sorts of information that is available and quickly deploy new advances while also delivering efficient and high quality care to the growing patient population especially in the case of complex diseases like cancer.
In the present technologies, a medical research search engine has been developed that helps retrieve relevant documents having medical research evidence. This search engine can be beneficial to the user for research and study purposes but it cannot guide the user as far as treatment and palliative care protocol recommendation is concerned. Another technology that has been developed of late involves a system which is accessed by a computer. In this system a report is generated to help decide among a plurality of treatment options for a patient with a given medical condition. The system receives patient information with relation to the medical condition and queries a treatment option for the medical condition. Along with it, the system also receives information on the patients’ preference for potential treatment outcomes and the treatment options. In the end, the report that is generated, entails only the treatment options available, the treatment scores and the information derived from the scores. While current technologies either help the doctor to make a diagnosis and plan a treatment protocol by investing a lot of time in research, the later helps the user with the treatment protocol only after the diagnosis is done. Neither of the two systems provide a well-rounded support to the user (Doctors’ or Physicians’).
Thus, there is a long-standing need for computer implemented systems and methods to not only help the user (Doctors’ or Physicians’) with the diagnosis but also with the personalized treatment protocol recommendation. This makes the work of the user a lot easier and speeds’ up the administering of treatment to a patient.

SUMMARY
This summary is provided to introduce concepts related to parsing data from disparate data sources over a computer based platform and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter
In one implementation, a method for parsing data from disparate data sources over a computer based platform is disclosed. The method may comprise, uploading, via a processor, plurality of master data on the computer-based platform, wherein the master data further comprises variety of entries associated with a specific domain and metadata associated with each entry and metadata associated with each parameter; and category list having predefined ID’s associated with domain terms. The method may further comprise storing, via the processor, a plurality of data and domain terms in a data repository coupled with the processor. Furthermore, the method may comprise receiving, plurality of structured or unstructured data associated with the domain specific entry from disparate data sources. The method may further comprise preparing, via the processor 201, the data for further processing. The method may further comprise performing, via the processor, parsing of the data. Furthermore, the method may comprise interpreting, via the processor, the parsed data. Furthermore, the method may comprise correlating, via the processor, the metadata with interpreted parsed data and domain terms. Furthermore, the method may comprise enabling, the processor 201, to summarize the correlated data in the form of statistic and probabilistic analysis.

In another implementation, a system for parsing data from disparate data sources over a computer based platform is disclosed. The system may comprise a processor and a memory coupled with the processor. The memory may store programmed instructions capable of being executed by the processor. The programmed instructions may comprise uploading, plurality of master data on the computer-based platform, wherein the master data comprises variety of entries associated with a specific domain and metadata associated with each entry and metadata associated with each parameter and category list having predefined ID’s associated with domain terms. Further, the programmed instructions may comprise instructions for storing, a plurality of data and domain terms in a data repository coupled with the processor. Further, the programmed instructions may comprise instructions for plurality of structured or unstructured data associated with the domain specific entry from disparate data sources. Further, the programmed instructions may comprise instructions for preparing, the data for further processing. Further, the programmed instructions may comprise instructions for performing, parsing of the data. Further, the programmed instructions may comprise instructions for interpreting, the parsed data. Further, the programmed instructions may comprise instructions for correlating, the metadata with interpreted parsed data and domain terms. Furthermore, the programmed instructions may comprise instructions for enabling, to summarize the correlated data in the form of statistic and probabilistic analysis.
In yet another implementation, a non-transitory computer readable medium storing program of parsing data from disparate data sources is disclosed. The program may comprise a program code for uploading, plurality of master data on the computer-based platform, wherein the master data comprises variety of entries associated with a specific domain and metadata associated with each entry and metadata associated with each parameter and category list having predefined ID’s associated with domain terms. The metadata associated with each entry may comprise, all medical records of the patient, patient data such as, EHRs, radiological images and reports, omics data such as genomics, proteomics and metabolomics. Metadata associated with each parameter may comprise, research data (from research papers, clinical trials, and journals). The program may further comprise a program code for plurality of structured or unstructured data associated with the domain specific entry from disparate data sources. The program may further comprise a program code for preparing, the data for further processing. The program may further comprise a program code for performing, parsing of the data. The program may further comprise a program code for interpreting, the parsed data. The program may further comprise a program code for correlating, the metadata with interpreted parsed data and domain terms. Furthermore, the program may comprise a program code for enabling summarization of the correlated data in the form of statistic and probabilistic analysis.
BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components
Figure 1 illustrates, a network implementation 100 of a system 101 for parsing data from disparate data sources over a computer based platform, in accordance with an embodiment of the present disclosure.
Figure 2 illustrates, the system 200 in accordance with an embodiment of the present disclosure.
Figure 3 illustrates, a method 300 for parsing data from disparate data sources over a computer based platform, in accordance with an embodiment of the present disclosure.
Figure 4 illustrates, the method for data preparation for further processing.
Figure 5 illustrates, the method of initial formatting of the medical records input to the system.
Figure 6 illustrates, method of analyzing the text data and images lexically.
Figure 7 illustrates, method of the lexical data analysis, tokenization, tagging and categorization for uniform recognition.
Figure 8 illustrates, method of input token interpretation and a parse tree creatation for each sentence.
Figure 9 illustrates, method of re-interpretation parse tree and merging it into a single tree for the report formation.
Figure 10 illustrates, method of analysis of the plurality of data and relating it with master data.
Figure 11 illustrates, method of detailed background analysis.
Figure 12 illustrates, the method of processing the text input and identification the words/phrases accordingly which is essential for the analysis.
Figure 13 illustrates, the method of preparing the unstructured data (Types and Hand-written) for processing by the system.

DETAILED DESCRIPTION

System(s) and method(s) for parsing data from disparate data sources over a computer based platform are described.
The present subject matter discloses an effective and efficient mechanism for parsing of patient data, and provide an integrated analysis of metadata correlated with patient’s panomics data and knowledge or research data crawled from the various data sources. This helps the doctor with various types of multi-level analytics in addition to providing treatment recommendations.
The system parses both structured and unstructured data received from the sources including H I S, E H R, PACS, RIS, LIMS, discharge summary reports, reports from molecular / genomic / proteomic diagnostics etc. into a unified binary representation. The data then interpreted and combined predictive and descriptive analysis of this data is carried out by the system. The system further correlates the analyzed data with research or knowledge data from the various data sources. The comparative study of similar patient record is carried out. The system updates standard lexicons automatically and provides ICD codes to medical reports. The standard lexicon may include a medical lexicon, which is an integration of many dictionaries such as ICD - 10, ICD- O, SNOMED C, WHO DDE and so on. It is language agnostic and is deployed in the cloud. This system predicts a patient’s medical condition, treatment-treatment interaction and treatment success based on various factors. It recommends treatment plan and suggests alterations to the existing treatment plan. Every treatment protocol applicable to a patient’s condition is ranked and elucidated.
While aspects of described system and method for parsing data from disparate data sources over a computer based platform may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
Referring now to Figure 1, a network implementation 100 of system and method for parsing data from disparate data sources over a computer based platform is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the system 101 may comprise a repository that has plurality of master data and category list having predefined ID’s associated with domain terms. The plurality of structured and unstructured data is received from various data sources through the network. The plurality of structured and unstructured data may comprise recent or updated patient data from varied health care data sources and medical record sources and recent or updated research data. The structured and unstructured data may The system may parsed the data and interpreted the parsed data. The system may correlate, the metadata with interpreted parsed data and domain terms. The system may further summarize the correlated data in the form of statistic and probabilistic analysis. The system 101 may establish an authorized connection between the server and the user device to access, view and compare analyzed data.
Although the present subject matter is explained considering that the system 101 is implemented as a server, it may be understood that the system 101 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, user device, and the like. It will be understood that the system 101 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 101 through a network 102.
In one implementation, the network 102 may be a wireless network, a wired network or a combination thereof. It will be understood that the system 101 may be accessed through one or more networks 102-1, 102-2…102-N, collectively referred to as user 102 hereinafter, or applications residing on the user devices 104. The network 102 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 102 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 102 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
Referring now to Figure 2, the system 101 is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the system 101 may include at least one processor 201, an input/output (I/O) interface 202, and a memory 203. The at least one processor 201 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 201 is configured to fetch and execute computer-readable instructions stored in the memory 203.
The I/O interface 202 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 202 may allow the system 101 to interact with a user directly or through the user devices 104. Further, the I/O interface 202 may enable the system 101 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 202 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 202 may include one or more ports for connecting a number of devices to one another or to another server.
The memory 203 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 203 may include modules 204 and data 209.
The modules 204 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 204 may include a data sources module 205, a input management subsystem module 206, a lexical analyzer module 207, a tagger subsystem module 208, a syntax analyzer module 209 ,a semantic analyzer subsystem module 210, a patient record summarizer module 211, a background analyzer module 212, a indexing subsystem module 213 ,a search analyzer module 214, a reinforcement engine module 215 and other modules 216. The other modules 215 may include programs or coded instructions that supplement applications and functions of the system 101.
The data 217, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 204. The data 217 may also include a data repository 218 and other data 219. The other data 219 may include data generated as a result of the execution of one or more modules in the other modules 216.
In one implementation, at first, a user may use the user device 104 to access the system 101 via the I/O interface 202.
DATA SOURCES MODULE
Referring to Figure 4, a detailed working of the data sources module 205 along with the working of other components of the system 101 is illustrated. In one embodiment, the data sources subsystem may receive structured and unstructured data from varied medical record sources and anonymize structured data. Further data sources module 205 may classifie the data into image and text data, in order to obtain unified data. The classified data is packaged to a specific format. The data sources module may further pushe the packaged data into the input manager sub-system module.
In one preferred embodiment, the Data Sources Module may classify the data into image data or text data. The text data is packaged into a C-CDA format. The image data is copied to a cloud blob and copies the URL into the C-CDA wrapper. In another embodiment, the data sources module may installed in the user device.

INPUT MANAGEMENT SUBSYSTEM MODULE
Referring to Figure 5, a detailed working of the input management subsystem module 206 along with the working of other components of the system 101 is illustrated. In one embodiment, the input management subsystem module may unpack the data from the wrapper, identify the type of medical record and prepare it for further processing. In one preferred embodiment, the type of medical record may comprise EHRs, radiological images and reports, omics data such as genomics, proteomics and metabolomics, research papers, clinical trials, and journals
LEXICAL ANALYZER MODULE
Referring to Figure 6, a detailed working of the lexical analyzer module 207 along with the working of other components of the system 101 is illustrated. In one embodiment, the lexical analyzer module may analyse the text data and images lexically. In one preferred embodiment, if the input data is an image, the image is normalized and the noise is removed using Open CV Library and Machine Learning Algorithms. In yet another preferred embodiment, if the input data is text data, sentences are detected and recorded, are anonymized using NLP/ML. In yet another preferred embodiment, custom algorithms are used for spell correction, sentence splitter and stemming of phrases. Further, the data may represent in an identical format and Parts of Speech (POS)Tagged using NLP. Lexical analyzer module 207 may further forward the data to other tagger subsystem module for further processing. The lexical analyzer module may use image and text lexicons obtained from standard healthcare domain sources, which is an integration of many dictionaries such as ICD - 10, ICD- O, SNOMED C, WHO DDE and so on.
TAGGER SUBSYSTEM MODULE
Referring to Figure 7, a detailed working of the tagger subsystem module 208 along with the working of other components of the system 101 is illustrated. The Tagger Subsystem module may assign unique, pre-defined IDs to each data relevant for analysis, based on the category list having predefined ID’s associated with domain terms. In one preferred embodiment the tagger subsystem module may further categorize the data using ML classification algorithms. In another embodiment, the categorization technique used is custom defined and does not follow UMLS Standards.
In one preferred embodiment ID’s associated with categories comprises,
ID Name Description
1 SCAN REGION Part of the body that is scanned (Thorax, Head and Neck, Abdomeno-pelvic)
2 ORGAN SYSTEMS Nervous system, gastro-intestinal system, etc
3 ORGAN NA
4 SUBORGAN
5 PART OF AN ORGAN Atrium, Ventricle, Tricuspid Valve, Mitral Valve are parts of heart
6 ANATOMIC SITE Thigh is an anatomic site in lower limb, the leg itself is an anatomical site
7 PART OF ANATOMIC SITE Base of tongue is a part of tongue (an anatomic site)
8 PART OF THE BODY NA
9 TISSUES AND CELLS Squamous Cells, Paranchimal Cells, small round cells etc
10 MEDICAL_ENTITY Lobe, region, sputem, fluid, mucus etc cannot be mapped to a specific part of the body. Tumor and leasion fall into this category as well.
11 ANATOMICAL DESCRIPTION Doctors description about an anatomical structure at the time of observation
12 ANATOMICAL FUNCTIONS Biological function of the specified anatomical structure
13 CONDITION Fever, pain, conditions
14 DISEASE Infectious diseases, Neoplasm (will be moved to a different category), and such ICD defined diseases.
15 ICD-O ICDO code and short description
16 ICD ICD 10 code and short description
17 FINDING positive, uncertain, or negative
18 SEVERITY Mild, benign, malignant, acute, chronic, slight etc
19 STAGING Stage 1, 2, 3, 4, T, N, M (Tumor, Node, and Metastatic)
20 REPORTSTATUS Clear or unclear
21 NORMALITY Normal, abnormal, regular, irregular, smooth, rough etc.
22 ICD-PCS A class of procedures
23 CPT A class of procedures
24 HCPCS A class of procedures
25 MODALITY Scanning devices
26 PROCEDURE Scan, types of scans, any surgery methodology etc.
27 MEDICAL TREATMENT Any treatment or therapy
28 MEDICAL TEST Biochemical, hematology, and histo-pathology tests
29 MODALITY TECHNIQUES Protocols followed during scanning
30 POSITION position of the anatomical structure in the body, for example left, right etc
31 PLANES Longitudinal planes, transverse, axial planes etc (anatomical planes)
32 GENERAL TERM Any general English phrase that carries some relevant meaning in that context.
33 MEDICAL TERM Injection, IV, etc
34 ANATOMICAL ATTRIBUTES Characteristic of an anatomical structure that may be measurable. For example width, echo density, ejection fraction, standard uptake value etc.
35 MEASUREMENT Just the words 'measures measurement'. These words trigger a special operation.
36 UNITS All units with their values.
37 COMMENTS Doctor's comments
38 RECOMMENDATION Recommendation for further evaluation or prescription
39 RELOOKUP Not used as of now.
40 SPECIAL_HANDLING All terms those require context sensitive handling.
41 DATE
42 TIMELINE hours, minutes, pre, post, all months, year etc
45 REPORT_TYPE Pre-treatment, treatment, follow-up, basic etc.
46 SCAN_TYPE Secondary modality device that may appear in a report
47 REPORT SECTION
48 MEDICAL ATTRIBUTE pulse, blood pressure (vitals) etc
49 VESSELS Blood vessels
50 NERVES NA
51 BONE NA
52 LIGAMENT NA
53 TENDON NA
54 CARTILAGE NA
55 JOINTS NA
56 MUSCLES NA
101 DRUG NAME NA
102 DRUG COMPOSITION Chemical composition of a medication
103 DRUG ATTRIBUTES Dosage details
104 CHEMICAL ELEMENT NA
105 CHEMICAL FORMULA NA
106 ISOTOPE NA

SYNTAX ANALYZER MODULE
Referring to Figure 8, a detailed working of the syntax analyser module 209 along with the working of other components of the system 101 is illustrated. In one embodiment, the syntax analyser subsystem module may create a parse tree for each sentence in the data. The syntax analyser module may use the category list provided in the tagger analyser module. The module 209, further may identify top-level anatomic sites in the line and creates a branch for each identified top-level anatomic site. Further, module 209 may relate other anatomic sites to the top level anatomic structure identified above, using parent-child relationship rules (or ML) as defined by the lexicon, may be the medical lexicon. If more than one anatomic entity is of the same level, the module 209 may derive a separate branch. Further, it may consider the position and part of anatomic site as children to the appropriate anatomic entity. The module 209, may further associate diseases, disorders, conditions, signs and symptoms to the lowest level anatomic entity (one per branch) using NLP and POS tags to identify the relationships. Further, it may relate the treatments provided to each disease entity, as treatment / drug administration branch.
The module 209 may add :-
the treatment start and end-dates, outcome, and response as leaves to the treatment / drug administration branch;
each cycle (with start, end date, outcome, and response) and days of administration (with date, response) as individual branches;
the drug brand name, composition, isotope, dosage, taken/not taken, and associated attributes and values as branches to the therapy and cycle details.
In one preferred embodiment, if the data is related to radiotherapy, the accelerator configuration details are added by module 209. Further the anatomic attributes with their values as branches to the appropriate anatomic entity, descriptions and anatomic function with their attributes as branches to the appropriate anatomic entity, any procedure, scan, details along with their attributes, anatomic planes, as branches to the appropriate anatomic entity, recommendations, follow-up status, requests, visit type etc as children to the main node (root) are added by module 209. Further, module 209 may derive trees for medical tests and associate them with the appropriate anatomic entity along the similar lines as above. Further, the module 209 may derive trees for genomic, and other omics data and associates them with the main root.
SEMANTIC ANALYZER SUBSYSTEM
Referring to Figure 9, a detailed working of the semantic analyser subsystem module 210 along with the working of other components of the system 101 is illustrated. In one embodiment, the semantic analyser subsystem module may interpret the semantics of each detected relevant phrase related to the report itself. First, module 210 may create a single root Node for one patient record and adds all top-level sentence branches to the main roots from the module 209. Further module 210 may group similar hierarchies / sub-hierarchies (navigating down) and merges them into a single branch of the main tree. In one example, SalivaryGlands -> {(ParotidGlands -> secretion -> reduced) (SubMandibularGlands -> secretion -> normal) (SubLingualGlands -> secretion -> reduced)} form the main branch of the root. The module 210 further may detect the sentiment in each branch and associates the finding with the appropriate parent node (uses a combination of Naïve Base algorithm and random forest for this). Further, it may eliminate contradicting findings (sentiments) using ML classification algorithms. The module 210, further may identify the International Classification of Diseases (ICD) disease class by traversing all branches of the main tree and use of random forest algorithm. Using ML Naïve Bayes algorithm, the module 210 may identify the primary and secondary complications. Using a combination of classification and regression ML algorithms, module 210 further may interpret the meanings of lab attribute values based on their ranges. Using classification and regression ML algorithms, the module 210 further interprets the meanings of values of anatomic attributes, drug dosage, modality attributes, anatomic functions, etc using the pre-defined ranges.
THE PATIENT RECORD SUMMARIZER MODULE
Referring to Figure 10, a detailed working of the patient record summarizer module 211 along with the working of other components of the system 101 is illustrated. In one embodiment, the module may generate a summary of various interpretations from the current report relative to the master data or previous visit. In one embodiment the module 211, ensures the tracking of each complications or conditions or symptoms the patient has or had in the past (may be from birth) till date. Further, the module 211 may track, the state of every anatomic entity in the patient’s body if reported. Further the module 211 is configured with the system to predict any drug – drug interaction, or any possible reaction for a specified treatment. Further, the module 211 identifies the chemical composition (upto the chemical element level) along with its dosage details.
The patient record summarizer module creates a summary of each interpreted data point from the report along with their attributes and values, and dates. Further, the module 211 may merge this summary with existing summary of the patient and correlates the summary with patient’s omics data.
In one example, the module 211, may track all relevant data points from the report. Such as the treatment provided to the patient and its outcome and response, medication details, state of anatomic entities, state of diseases, conditions, disorders, signs and symptoms, scan details, genomic data and other omics information etc. With this summary, the system may easily predict whether a treatment plan will work for the patient at all or will it cause any side effect. The module 211, further maps the medication administered to the patient to the underlying chemical composition (to the chemical element level) which may enable the system to predict if a reaction would occur. In one embodiment, the system may track all genetic mutations and proteomic alterations that have occurred in this patient.
Further, in addition to the summarization process, the patient record summarizer module may prepare the inserted patient summary for ranking. Further it uses a forward matrix to list all the data points (search terms) of interest, and uses an inverted matrix approach to assign weights to each identified term.

THE BACKGROUND ANALYZER MODULE
Referring to Figure 11, a detailed working of the background analyzer module 212 along with the working of other components of the system 101 is illustrated. In one embodiment, the background analyzer module may use the summaries generated by the patient record summarizer module 211 to perform various analyses including recommendation of treatment plans, computation of possibility of patient survival, probability of advancement or reduction of a complication, probable response and outcome if a relevant treatment protocol is administered etc. Further the module 212, may correlate the summary with research data crawled from the internet at various levels. The background analyzer module may use the summarized data points along with genomic, proteomic and research data to predict if a treatment plan will reduce or aggravate the complication, or cause a reaction.
The Background Analyzer subsystem may further use many ML based classification and regression algorithms in combination with dimensionality reduction algorithms as well.

THE INDEXING SUBSYSTEM MODULE
In one embodiment, a detailed working of the indexing subsystem module 213 along with the working of other components of the system 101 is illustrated. In one example, the System may use Hadoop HBase for indexing patient record metadata, and Patient summary records and research matter crawled from the internet. Further it uses Apache Solr to index searchable tags. The background analyzer module may run in the context of the MapReduce component.
SEARCH ANALYZER MODULE
In one embodiment, a detailed working of the search analyzer module 213 along with the working of other components of the system 101 is illustrated. In one embodiment, the search analyzer module may perform the ranking operations, generate real time statistics and cache search results. In one example, the search analyzer module may use a combination of Principal Component Analysis, unsupervised clustering, and classification algorithms to perform its tasks.

THE REINFORCEMENT ENGINE MODULE
In one embodiment, a detailed working of the Reinforcement Engine module 215 along with the working of other components of the system 101 is illustrated. In one embodiment, the reinforcement engine module may implement the Artificial Intelligence (AI) logic. Further the module 215 may configured for automatic updation of lexicons, error tracking and correction, voice recognition, and image recognition features. Further it may also responsible for crawling of research data from published papers, patents, journals, clinical trial data etc.
In one embodiment, the system may analyse the quality of treatment provided to a patient by a hospital as well as analyses patient records for insurance claims, thereby reducing insurance frauds.
Referring now to figure 3, a method 300 for parsing data from disparate data sources over a computer based platform, is shown, in accordance with an embodiment of the present subject matter. The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 300 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300 or alternate methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 300 may be considered to be implemented in the above described system 101.
At block 301, plurality of master data on the computer-based platform is uploaded through data source module 205 and input management module 206 wherein the master data comprises variety of entries associated with a specific domain and metadata associated with each entry and category list having predefined ID’s associated with domain terms.
At block 302, a plurality of data and domain terms are stored in a data repository 210 coupled with the processor 201.
At block 303, plurality of structured or unstructured data associated with the domain specific entry is received through Data Sources Module 205, from disparate data sources.
At block 304, the data is prepared for further processing through input management module 206.
At block 305, parsing of the data is carried out through Lexical Analyzer module 207 Tagger Subsystem module 208 and Syntax Analyzer module 209.
At block 306, the parsed data is interpreted through Semantic Analyzer Subsystem module 210.
At block 307, the metadata or standard data is correlated with the interpreted parsed data and domain terms through Semantic Analyzer Subsystem module 210.
At block 308, the correlated data is summarized in the form of statistic and probabilistic analysis through Patient Record Summarizer module (211).
Referring figure 12, the system may use DL (Demaerau- Levenshein) Algorithm to process the text input and identify the words or phrases accordingly which is essential for the analysis is disclosed.
Referring figure 13, preparation of unstructured patient reports (Typed and Hand-written) for system processing is disclosed. System may receive unstructured data (typed or handwritten data). The typed data is scanned into system. In one prefered embodiment, the system may use optical character recognition (OCR) for digitization of document. The system is configured to receive handwritten data through I/O device in digital format. Further, the system may load the unified structured data into the data repository for further processing.
In one preferred embodiment, the system may develope as a cloud based SAAS component that can plug-in to any afore mentioned sources, can be invoked from a Health care application that requires such a service.
In yet another preferred embodiment, system may input and output data in C-CDA format.
In one embodiment the system may invokes the Custom Spell Corrector to,
Correct the misspelt English words and Semantically wrong grammatical constructs
Identify and correct the medical terms from input data
Interpret Local / Regional Verbs to Standard Medical Verbs as defined by the International Medical dictionaries.

In one embodiment the system may provide detailed analytics or reports such as causes or probabilities of a Specific disease based on symptoms and cross-referential details.
In yet another embodiment, the system may provide statistics and probabilistic analysis to get data based upon region hereditary (genetic coding), occupation, food habits, personal habits, pharmaceutical and prescription data and ad hoc data.
In yet another embodiment , system may provide a Voice based editor to generate e-prescription or diagnostic Reports based on voice commands from the Radiologists or Physicians. The system may further help the Doctor with differential diagnosis and store the data for future analysis.
In yet another embodiment, the system may reduce the duration required for diagnosis and planning the treatment protocol.
In one embodiment, the system may provide a personalized treatment protocol considering the factors like patient history, drug allergies, etc. which further make the treatment a lot more targeted and patient centric.
In one embodiment, the system may dynamically updated based on the statistic and probabilistic analysis.
In one embodiment, the system is configured for quick reading through documents and fast processing images
In one embodiment, parsing of data and its analysis may minimize work and increases efficiency.
In one preferred embodiment, the system may keep track of possible protocols and possible drug-drug (or procedure-drug) interactions.
In another preferred embodiment, the system may keep track of all existing medical conditions to minimize effect of treatment provided on existing condition than the targeted ones
In yet another preferred embodiment, the system may keep Keeps track of genetic mutations and proteomic alterations
In yet another preferred embodiment, the system may keep track of treatment outcomes and response for each patient, for each complication
In yet another preferred embodiment, the system may keep track of doctors advice and suggestions and patient’s adherence to them
In yet another preferred embodiment, the system may provide various levels of sharing options to share the patient data among the users of the system.

Documents

Application Documents

#	Name	Date
1	201741021746-Written submissions and relevant documents (MANDATORY) [15-06-2018(online)].pdf	2018-06-15
1	OTHERS [21-06-2017(online)].pdf	2017-06-21
2	201741021746-HearingNoticeLetter.pdf	2018-04-16
2	FORM28 [21-06-2017(online)].pdf	2017-06-21
3	Form 20 [21-06-2017(online)].pdf	2017-06-21
3	201741021746-FER_SER_REPLY [26-02-2018(online)].pdf	2018-02-26
4	EVIDENCE FOR SSI [21-06-2017(online)].pdf	2017-06-21
4	201741021746-OTHERS [26-02-2018(online)].pdf	2018-02-26
5	Drawing [21-06-2017(online)].pdf	2017-06-21
5	201741021746-FER.pdf	2017-08-28
6	Description(Complete) [21-06-2017(online)].pdf_69.pdf	2017-06-21
6	Correspondence by Agent_Form 26, Assignment Deed_06-07-2017.pdf	2017-07-06
7	Form 26 [03-07-2017(online)].pdf	2017-07-03
7	Description(Complete) [21-06-2017(online)].pdf	2017-06-21
8	PROOF OF RIGHT [03-07-2017(online)].pdf	2017-07-03
8	Form 3 [22-06-2017(online)].pdf	2017-06-22
9	Form 18 [26-06-2017(online)].pdf	2017-06-26
9	Form 9 [26-06-2017(online)].pdf	2017-06-26
10	Form 18 [26-06-2017(online)].pdf	2017-06-26
10	Form 9 [26-06-2017(online)].pdf	2017-06-26
11	Form 3 [22-06-2017(online)].pdf	2017-06-22
11	PROOF OF RIGHT [03-07-2017(online)].pdf	2017-07-03
12	Description(Complete) [21-06-2017(online)].pdf	2017-06-21
12	Form 26 [03-07-2017(online)].pdf	2017-07-03
13	Correspondence by Agent_Form 26, Assignment Deed_06-07-2017.pdf	2017-07-06
13	Description(Complete) [21-06-2017(online)].pdf_69.pdf	2017-06-21
14	201741021746-FER.pdf	2017-08-28
14	Drawing [21-06-2017(online)].pdf	2017-06-21
15	201741021746-OTHERS [26-02-2018(online)].pdf	2018-02-26
15	EVIDENCE FOR SSI [21-06-2017(online)].pdf	2017-06-21
16	201741021746-FER_SER_REPLY [26-02-2018(online)].pdf	2018-02-26
16	Form 20 [21-06-2017(online)].pdf	2017-06-21
17	201741021746-HearingNoticeLetter.pdf	2018-04-16
17	FORM28 [21-06-2017(online)].pdf	2017-06-21
18	OTHERS [21-06-2017(online)].pdf	2017-06-21
18	201741021746-Written submissions and relevant documents (MANDATORY) [15-06-2018(online)].pdf	2018-06-15

Search Strategy

1	search11_24-08-2017.pdf