Sign In to Follow Application
View All Documents & Correspondence

Method And System For Generating Automated Coding Of A Medical Narrative Using Hierarchical Graphs

Abstract: ABSTRACT METHOD AND SYSTEM FOR GENERATING AUTOMATED CODING OF A MEDICAL NARRATIVE USING HIERARCHICAL GRAPHS This disclosure relates generally to and, more particularly, for generating automated coding of a medical narrative using hierarchical graphs. Medical coding is performed all over the world, with most countries using the International Classification of Diseases (ICD). The ICD has an evolving ontology with a hierarchical data structure with vast vocabulary of the healthcare domain that makes it challenging for medical coding professionals to establish stable representations of disease manually. Several existing state-of-art techniques for ICD medical coding rely on machine learning (ML) approaches, which is not very effective for patients with multiple inter-related diagnosis. The disclosure is a knowledge graph-based approach for generating automated coding of a medical narrative using a hierarchical graph, wherein the hierarchical graph is generated using several steps including creation a concept graph, enrichment of the concept graph and further creating a hierarchical graph. [To be published with FIG.2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
19 January 2023
Publication Number
30/2024
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. CHITTIMALLI, Pavan Kumar
Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India
2. SHARMA, Sonam
Tata Consultancy Services Limited, Plot no. A-44 & A45, Ground, 1st to 05th floor & 10th floor, Block C&D, Sector 62, Noida 201309, Uttar Pradesh, India
3. KSHIRSAGAR, Mahesh
Tata Consultancy Services Limited, Unit 130/131, SDF 5, Seepz, Andheri (E), Mumbai 400096, Maharashtra, India
4. NAIK, Ravindra Dinkar
Tata Consultancy Services Limited, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India
5. SIRMOKADAM, Sumukh Sudhakar
Tata Consultancy Services Limited, Unit 130/131, SDF 5, Seepz, Andheri (E), Mumbai 400096, Maharashtra, India

Specification

Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

METHOD AND SYSTEM FOR GENERATING AUTOMATED CODING OF A MEDICAL NARRATIVE USING HIERARCHICAL GRAPHS

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD

[001] The disclosure herein generally relates to medical coding and, more particularly, to generating automated coding of a medical narrative using hierarchical graphs.

BACKGROUND

[002] Medical coding is an integral part of the healthcare domain. Medical coding refers to translation of a healthcare diagnosis, procedures, medical services, and equipment’s used during treatment in a patient’s medical record into an alphanumeric universal standard medical code. Universal standard codes are used during the medical billing process for reimbursements/insurance and also provide consistency throughout the medical field in a global scenario. A patient's diagnosis, test results, and treatment must be documented, not just for reimbursement/insurance purposes, but to guarantee high quality care through subsequent complaints and treatments of the patient.
[003] Medical coding is performed all over the world, with most countries using the International Classification of Diseases (ICD). The ICD is a widely used diagnostic ontology which is a standardized classification system of diagnosis codes representing conditions-diseases, health problems, abnormal findings, signs-symptoms, injuries, external causes of injuries-diseases, and social circumstances. The ICD is maintained by the World Health Organization and modified/periodically revised to keep up with the dynamically evolving health care domain, with inclusions/updates on new diseases, healthcare diagnosis, procedures, medical services, equipment’s, and treatments.
[004] The ICD has an evolving ontology with a hierarchical data structure of vast vocabulary of the healthcare domain, which makes it challenging for medical coding professionals to establish stable representations of disease or other health conditions across the periodic ICD revisions, as it is a labor-intensive and error-prone. Several existing state-of-art techniques for ICD medical coding rely on machine learning (ML) approaches, however the ML diagnosis does not address sections if scattered across a medical narrative and hence may not be very effective for patients with inter-related diagnosis/diseases.

SUMMARY

[005] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system for generating automated coding of a medical narrative using hierarchical graphs is provided.
[006] The system includes a memory storing instructions, one or more communication interfaces, and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to receive a plurality of inputs from a plurality of sources, via one or more hardware processors, wherein the plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an International Classification of Disease (ICD) data, wherein the ICD data comprises a ICD clinical modification (CM) data. The system is further configured to create a concept graph, via the one or more hardware processors, using the plurality of medical narratives based on a document digitization technique. The system is further configured to enrich the concept graph to obtain an enhanced concept graph, via the one or more hardware processors, based on a plurality of enrichment techniques comprising a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique. The system is further configured to create a hierarchical graph, via the one or more hardware processors, based on a mapping technique using the enhanced concept graph and the ICD data, wherein the hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein, (a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality, and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury or disease, and (b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of a ICD code. The system is further configured to generate an automated coding of a medical narrative, via the one or more hardware processors, using the hierarchical graph based on a medical narrative automated coding technique.
[007] In another aspect, a method for generating automated coding of a medical narrative using hierarchical graphs is provided. The method includes receiving a plurality of inputs from a plurality of sources, wherein the plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an International Classification of Disease (ICD) data, wherein the ICD data comprises a ICD clinical modification (CM) data. The method further includes creating a concept graph, using the plurality of medical narratives based on a document digitization technique. The method further includes enriching the concept graph to obtain an enhanced concept graph, based on a plurality of enrichment techniques comprising a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique. The method further includes creation of a hierarchical graph, based on a mapping technique using the enhanced concept graph and the ICD data, wherein the hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein, (a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality, and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury or disease, and (b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of a ICD code. The method further includes generating an automated coding of a medical narrative, using the hierarchical graph based on a medical narrative automated coding technique.
[008] In yet another aspect, a non-transitory computer readable medium for generating automated coding of a medical narrative using hierarchical graphs is provided. The method includes receiving a plurality of inputs from a plurality of sources, wherein the plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an International Classification of Disease (ICD) data, wherein the ICD data comprises a ICD clinical modification (CM) data. The method further includes creating a concept graph, using the plurality of medical narratives based on a document digitization technique. The method further includes enriching the concept graph to obtain an enhanced concept graph, based on a plurality of enrichment techniques comprising a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique. The method further includes creation of a hierarchical graph, based on a mapping technique using the enhanced concept graph and the ICD data, wherein the hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein, (a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality, and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury or disease, and (b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of a ICD code. The method further includes generating an automated coding of a medical narrative, using the hierarchical graph based on a medical narrative automated coding technique.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates an exemplary system for generating automated coding of a medical narrative using hierarchical graphs according to some embodiments of the present disclosure.
[012] FIG. 2 is a functional block diagram for generating automated coding of a medical narrative using hierarchical graphs according to some embodiments of the present disclosure.
[013] FIGS. 3A to FIG.3B is a flow diagram illustrating a method (300) for generating automated coding of a medical narrative using hierarchical graphs in accordance with some embodiments of the present disclosure.
[014] FIG. 4 is a flow diagram illustrating a method (400) for the document digitization technique during generation of automated coding of a medical narrative using hierarchical graphs in accordance with some embodiments of the present disclosure.
[015] FIG. 5 illustrates a sample of international classification of disease (ICD) data in accordance with some embodiments of the present disclosure.
[016] FIG. 6A, FIG. 6B and FIG. 6C illustrate generation of dependency tree, identification of a concept and creation of the concept graph in accordance with some embodiments of the present disclosure.
[017] FIG.7 illustrates enrichment of the concept graph in accordance with some embodiments of the present disclosure.
[018] FIG. 8 the creation of hierarchical graph using the international classification of disease (ICD) in accordance with some embodiments of the present disclosure.
[019] FIG. 9 is a flow diagram illustrating a method (900) for medical narrative automated coding technique during generation of automated coding of a medical narrative using hierarchical graphs in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

[020] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[021] The International Classification of Disease (ICD) is a widely used diagnostic ontology for the classification of health disorders and a valuable resource for healthcare analytics. ICD serves a wide range of uses globally and provides critical knowledge on extent, causes and consequences of an exhaustive range of human disease and death worldwide. Medical narratives or Clinical terms coded with ICD are the main basis for health recording and statistics on disease as well as on cause of death certificates. The ICD data and statistics support payment systems, service planning, administration of quality and safety, and health services research. Further, the diagnostic guidance linked to categories of ICD also standardizes data collection and enables large scale research. Hence the ICD is important because it provides a common language for recording, reporting and monitoring diseases. This allows the world to compare and share data in a consistent and standard way – between hospitals, regions and countries and over periods of time. Originating in the 19th century, the ICD ontology has been ever evolving, which makes it challenging for medical coding professionals to establish stable representations of disease or other health conditions across the periodic ICD revisions, as it is a labor-intensive and error-prone. Several existing state-of-art techniques for ICD medical coding rely on machine learning (ML) approaches, however the ML diagnosis does not address sections if scattered across a medical narrative and hence may not be very effective for patients with inter-related diagnosis/diseases. The disclosure is a knowledge graph-based approach for generating automated coding of a medical narrative using a hierarchical graph, wherein the hierarchical graph is generated using several steps including creation a concept graph, enrichment of the concept graph and further creating a hierarchical graph.
[022] Referring now to the drawings, and more particularly to FIG. 1 through FIG.9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[023] FIG.1 is an exemplary block diagram of a system 100 for generating automated coding of a medical narrative using hierarchical graphs in accordance with some embodiments of the present disclosure.
[024] In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
[025] Referring to the components of the system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 is configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, a network cloud and the like.
[026] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, a touch user interface (TUI) and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting a number of devices (nodes) of the system 100 to one another or to another server.
[027] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[028] Further, the memory 102 may include a database 108 configured to include information regarding generation of automated coding of a medical narrative using hierarchical graphs. The memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106.
[029] Functions of the components of system 100 are explained in conjunction with functional overview of the system 100 in FIG.2 and flow diagram of FIGS.3A and FIG.3B for generating automated coding of a medical narrative using hierarchical graphs.
[030] The system 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of various components of the system 100 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 100 is implemented to operate as a stand-alone device. In another embodiment, the system 100 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 100 are described further in detail.
[031] FIG.2 is an example functional block diagram of the various modules of the system of FIG.1, in accordance with some embodiments of the present disclosure. As depicted in the architecture, the FIG.2 illustrates the functions of the modules of the system 100 that includes for generating automated coding of a medical narrative using hierarchical graphs.
[032] As depicted in FIG.2, the functional system 200 of the system 100 is configured for generating automated coding of a medical narrative using hierarchical graphs. The modules as depicted in FIG. 2 may be implementation of the one or more hardware processors 104 of the system 100.
[033] The system 200 comprises of an input module 202 is configured for receiving a plurality of inputs, wherein the plurality of inputs is associated with medical coding. The plurality of inputs comprises a plurality of medical narratives, and an international classification of disease (ICD) data. The system 200 further comprises a concept graph creator 204 configured for creating a concept graph using the plurality of medical narratives based on a document digitization technique. The system 200 further comprises a concept graph enricher 206 configured for enriching the concept graph to obtain an enhanced concept graph based on a plurality of enrichment techniques. The system 200 further comprises a hierarchical graph unit 208 configured for creating a hierarchical graph based on a mapping technique using the enhanced concept graph and the ICD data. The system 200 is configured to generate an automated coding of a medical narrative using the hierarchical graph based on a medical narrative automated coding technique. Hence the modules 202 to 208 are configured to create the hierarchical graph in several steps using, the created hierarchical graph is used to generate an automated coding of a medical narrative. The automated coding is generated based on the medical narrative automated coding technique, wherein a medical diagnosis of a patient is received in the input module 202. Further a root cause is identified from the medical diagnosis in a root cause identifier 210. The identified root cause is used for creating a plurality of search strings based on traversing the hierarchical graph in the hierarchical graph unit 208. Further the automated coding is generated for each of the plurality of search strings in the hierarchical graph unit 208.
[034] The various modules of the system 100 and the functional blocks in FIG.2 are configured for generating automated coding of a medical narrative using hierarchical graphs are implemented as at least one of a logically self-contained part of a software program, a self-contained hardware component, and/or, a self-contained hardware component with a logically self-contained part of a software program embedded into each of the hardware component that when executed perform the above method described herein.
[035] Functions of the components of the system 200 are explained in conjunction with functional modules of the system 100 stored in the memory 102 and further explained in conjunction with flow diagram of FIGS.3A-3B. The FIGS.3A-3B with reference to FIG.1, is an exemplary flow diagram illustrating a method 300 for generating automated coding of a medical narrative using hierarchical graphs using the system 100 of FIG.1 according to an embodiment of the present disclosure.
[036] The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 of FIG.1 for generating automated coding of a medical narrative using hierarchical graphs and the modules 202-210 as depicted in FIG.2 and the flow diagrams as depicted in FIGS.3A-3B. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[037] At step 302 of the method 300, a plurality of inputs is received from a plurality of sources in the input module 202. The plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an international classification of disease (ICD) data, wherein the ICD data comprises a ICD clinical modification (CM) data.
[038] In an embodiment, the plurality of medical narratives are in the form of free text (Natural Language) information generated by healthcare professionals when a patient makes visit to a healthcare provider either for diagnosis or treatment. The plurality of medical narratives contains key details related to patient health, along with other findings, and identified diagnosis of the case. Each medical record may contain history, examination, findings, case diagnosis, differential diagnosis, follow-up etc. The medical history is unified by connecting all health information of a patient from various sources such as hospital, pharmacy, lab etc.
[039] The health record along with the history from plurality of medical narratives can be mined to get ICD codes. The International Classification of Disease (ICD) is a widely used diagnostic ontology for the classification of health disorders and a valuable resource for healthcare analytics. ICD-10-CM is a developing ontology and is conditioned to periodic revisions (e.g., ICD9 CM to ICD10 CM) because of which there is a lot of difference among different versions, making it difficult for cross complete crosswalk. ICD-9-CM codes are very different than ICD-10-CM/PCS code sets: There are nearly 19 times as many procedure codes in ICD-10-PCS than in ICD-9-CM. There are nearly 5 times as many diagnosis codes in ICD-10-CM than in ICD-9-CM. This is extremely difficult to evolve system to identify over these versions and be consistent.
[040] In an example scenario, a section within a medical narrative is shared below:
• History
62 year old male with right hip injury.
• Exam
Pain in the right hip.
• Findings
For plain films, an anteroposterior radiograph of the pelvis shows the ilioischial and iliopectineal lines and the lateral margin of the anterior and posterior walls. The posterior oblique radiograph specific to the side in question, which is also called the iliac oblique view, shows the posterior part of the acetabulum and greater sciatic notch to advantage. The iliac wing is also well seen. The contralateral posterior oblique view, also called the obturator oblique view, shows the acetabulum in question well, in addition to the obturator foramen. The anterior portion of the iliopectineal line and posterior wall are also seen to advantage. Careful evaluation of all views is of the utmost importance in evaluating for fractures. Resorting to CT or MRI in times of question should not be hesitated.
• Differential Diagnosis
There is no differential in this case.
• Case Diagnosis
acetabular fracture
• Diagnosis By
The image findings are specific for injury. A CT and MRI was performed for further fracture definition and surgical planning.
• Discussion
This patient fractured both the anterior and posterior columns and the posterior wall. By analyzing the bony lines on the plain film, a posterior column fracture can be diagnosed by disruption of the ilioischial line. With the CT the complexity of the fracture pattern is revealed. This case demonstrates the importance in evaluating the pelvic lines carefully as the fracture was initially missed. The complexity and number of the fracture types can be daunting to those who do not often use the system. However, the referenced paper gives a detailed description on how to classify the fracture types, resulting in more comfort dealing with the complex anatomy and fracture pathology.
• Dx: S32.431A, S32.441A, S32.421A acetabular fracture, X58XXXA right hip injury
[041] At step 304 of the method 300, a concept graph is created in the concept graph creator 204. The concept graph is created using the plurality of medical narratives based on a document digitization technique.
[042] The document digitization technique comprises a narration parsing technique, a section identification technique, and a narrative concept graph creation technique. The document digitization technique is explained using method 400 of the FIG.4 as explained below:
[043] At step 402 of the method 400, a concept is identified from the medical narrative. The concept is a multi-word which is domain independent and is identified based on a Natural Language Processing (NLP) technique.
[044] In an embodiment, a parsing technique first identifies the concept ( multi-word) that would be the structure of the narration where section and subsection template will be identified. Considering the example medical narrative shared in step 302 - The sections History, Exam, Findings etc. will be identified and subsequent text will be captured under those section headings to capture the context. The section template is captured as illustrated in FIG.5.
[045] In an example scenario - SPACY, a state of art Natural Language Processing (NLP) tool is used for parsing the input medical narratives. Further dependency parse tree is generated along with rules defined over the POS (Part Of Speech) Tags and relations among them. Considering an example scenario of " The patient fractured both the anterior and posterior columns and the posterior wall”, the dependency tree generated is illustrated in FIG.6A. The node in dependency tree are basic tokens with POS Tag information. The edges show how these tokens are connected as per the language grammar rules. Using the dependency tree, the multiword concepts and the relations among them is identified. The concepts and relations identified using FIG.6A to obtain FIG.6B are:
Concepts: Patient, anterior column, posterior column, posterior wall
Relations: fractured.
[046] At step 404 of the method 400, the concept graph is created based on the identified concepts. The concept graph comprises of a plurality of concept graph nodes indicating the concepts and a plurality of concept graph edges describing a plurality of relations among the concepts.
[047] In an embodiment, the concept graph is built using concept and the relations identified in the parsing stage of step 402. The node in the graph denotes the concepts and edges represents the relation among them as shown in the above figure. The direction of the edge denotes the relation from source concept to the target concept.
If the concept graph for a narration given below is seen as a forest illustrated using the FIG.6C is as shown below: (collection of graphs):
• History: A 62-year-old male with right hip injury.
• Case diagnosis: acetabular fracture.
• Findings: The patient fractured both the anterior column, and the posterior column, and the posterior wall.
[048] At step 306 of the method 300, the concept graph is enriched to obtain an enhanced concept graph in the concept graph enricher 206. The concept graph is enriched based on a plurality of enrichment techniques.
[049] The plurality of enrichment techniques comprises a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique.
[050] The enhanced concept graph comprises of a plurality of enhanced nodes and a plurality of enhanced edges, wherein:
(a) The plurality of enhanced nodes comprises a string value of the concept, which is a contextual information,
(b) A plurality of labels identified from a plurality of medical ontologies, and
(c) The plurality of enhanced edges describes a plurality of relations among the string value of the concept.
[051] In an embodiment, during concept graph enrichment, a UML/Medcat ontology is used to enrich/enhance the nodes. In the example scenarios shared above with reference to FIG.6B and FIG.6C - the concept graph is a forest, and some parts of the concept graph is disconnected as there is no explicit relation among them. Based on a named entity recognition technique – the named entity of the node “patient” is identified – which is connected to “62-year-old male” using coreference resolution. Further based on medical identification-laterality technique and a medical annotation technique an implicit edge is created between those concepts. The concepts “Fracture” and “Injury” are resolved as similar using our bio-medical similarity measure created using “BioSentVect”. The concepts “right hip injury” and “acetabular fracture” connected as “disorder” to “body part” relation. Using UMLS, the concepts “right hip injury” and “acetabular fracture” are captured as similar. After the concept graph has been enhanced to obtain the enhanced concept graph (that includes NER, medical identification-laterality technique and a medical annotation technique), all the medical narrative concepts would be connected. Thus, ensuring that the medical narrative knowledge is captured well in the enhanced concept graph as illustrated using FIG.7.
[052] At step 308 of the method 300, a hierarchical graph is created in the hierarchical graph 208. The hierarchical graph is created based on a mapping technique using the enhanced concept graph and the ICD data.
[053] The hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein:
(a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury/disease; and
(b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of an ICD code.
[054] In an embodiment, the ICD-10-CM is current ontology mapping used for generating medical codes. A snapshot view of the ICD-10-CM is shared in FIG.8. The document is organized in a hierarchical view consisting of injury, body part, laterality, and specificity. The hierarchical information needs to be mapped to the concept graph entities to create a separate hierarchical graph consisting of these mapped ICD-CM Data and Concept Graph entries as shown below:
- Nature of injury/abnormality (Level 1)
- Body Part (level 2)
- Location (Laterality) (Leve 3)
- Specificity (Level 4).
[055] Further a snapshot of mapped ICD-CM Data as shown in FIG.9 - shows the things of interest that have been identified using the mapping technique. The mapping technique first identifies the root node using root cause identification technique. This technique looks for the concept with context related to diagnosis. It identifies all such nodes in the enriched concept graph as root(s). We pick one root (concept string value) at a time search in the ICD-CM Data to get the corresponding information. The mapping is captured in . For example, the concept acetabulum is mapped to ICD-CM Data as . All the concepts in the input narration are mapped to such structure. For the example shown here, the mapping structure can be seen as:

|

|

| |
|
| |
|
|

|

[056] At step 310 of the method 300, an automated coding of each of the medical narratives is generated in the hierarchical graph 208. The automated coding of a medical narrative is generated using the hierarchical graph based on a medical narrative automated coding technique.
[057] The medical narrative automated coding technique is explained using the method 900 of FIG.9.
[058] At step 902 of the method 900, a medical diagnosis of a patient is received in the input module 202. The medical diagnosis comprises of at least one medical narrative of the patient and whose automated coding is to be generated.
[059] In an embodiment, the medical narratives are in the form of free text (Natural Language) information generated by healthcare professionals when a patient make visit to healthcare provider either for diagnosis or treatment. These narratives contain key details related to patient health, along with other findings, and identified diagnosis of the case. Each medical record may contain history, examination, findings, case diagnosis, differential diagnosis, follow-up etc. The medical history is unified by connecting all health information of a patient from various sources such as hospital, pharmacy, lab etc.
[060] At step 904 of the method 900, identifying a root cause from the medical diagnosis, based on the patterns observed the final diagnosis in the plurality of medical narrative of patients narratives in the root cause identifier 210.
[061] In an embodiment, in a typical medical narrative, the context of the root cause is mentioned in diagnosis section of the narrative. In the case if this pattern is not found, the concept label information can be used to identify the root cause. The concept that is labelled as disease/abnormality/infection. In example narration, the root cause identified is “acetabular fracture”.
[062] At step 906 of the method 900, a plurality of search strings is created for the root cause. The plurality of search strings is created based on traversing the hierarchical graph.
[063] The hierarchical graph is traversed starting from the root node till a leaf node, where the root node is identified using the root cause from a CM base, wherein the CM base is an exhaustive database that comprises of a plurality of CM data related to clinical modification. The clinical modification is associated with a system used by physicians and other healthcare providers to classify and code all diagnoses, symptoms and procedures recorded in conjunction with hospital care.
[064] In an embodiment, for generating the ICD-10-Code, the Hierarchical graph is search from the root cause identified using the earlier approach. The graph is traversed from root till leaf to generate the search string. The corresponding leaf has the mapped the ICD code. All the paths having a valid mapping generate the correct ICD codes. For example, for “posterior column acetabular fracture” the ICD code will be generated as S32.443. The path can be seen as:

|

|

|

[065] At step 908 of the method 900, an automated coding is identified for each of the plurality of search strings. The automated coding is a n-digit ICD code.
[066] In an embodiment, the automated coding is for generated the ICD-10-Code, wherein the n-digit ICD code is a 7-digit ICD code. In the example for “posterior column acetabular fracture” the 7-digit ICD code will be generated as S32.443.
[067] EXPERIMENTS:
[068] An experiment has been conducted using the disclosed techniques for generating automated coding of a medical narrative using hierarchical graphs. In an example scenario, the experiment is implemented in python 3.9. Further Spacy 3.4 is utilized for NLP parsing, Medcat is utilized for annotating medical ontologies and “biosentvect” is utilized for finding the similarity of the concepts. The experiment is conducted on 167 medical narratives for which the expected medical codes have been created manually by experts. The disclosed techniques using hierarchical graphs identified medical codes with a accuracy of 91%.
[069] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[070] This disclosure relates generally to medical coding and particularly to generating automated coding of a medical narrative using hierarchical graphs. Medical coding is performed all over the world, with most countries using the International Classification of Diseases (ICD). The ICD has an evolving ontology with a hierarchical data structure with vast vocabulary of the healthcare domain that makes it challenging for medical coding professionals to establish stable representations of disease manually. Several existing state-of-art techniques for ICD medical coding rely on machine learning (ML) approaches, which is not very effective for patients with multiple inter-related diagnosis. The disclosure is a knowledge graph-based approach for generating automated coding of a medical narrative using a hierarchical graph, wherein the hierarchical graph is generated using several steps including creation a concept graph, enrichment of the concept graph and further creating a hierarchical graph.
[071] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[072] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[073] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[074] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[075] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:We Claim:
1. A processor implemented method (300), comprising:
receiving a plurality of inputs from a plurality of sources, via one or more hardware processors, wherein the plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an International Classification of Disease (ICD) data, wherein the ICD data comprises an ICD clinical modification (CM) data (302);
creating a concept graph, via the one or more hardware processors, using the plurality of medical narratives based on a document digitization technique (304);
enriching the concept graph to obtain an enhanced concept graph, via the one or more hardware processors, based on a plurality of enrichment techniques comprising a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique (306);
creating a hierarchical graph (308), via the one or more hardware processors, based on a mapping technique using the enhanced concept graph and the ICD data, wherein the hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein,
(a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality, and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury or disease, and
(b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of a ICD code; and
generating an automated coding of a medical narrative, via the one or more hardware processors, using the hierarchical graph based on a medical narrative automated coding technique (310).

2. The processor implemented method of claim 1, wherein the medical narrative automated coding technique (900) comprises:
receiving a medical diagnosis of a patient, wherein the medical diagnosis comprises of at least one medical narrative of the patient, wherein an automated coding is to be generated for the patient (902);
identifying a root cause from the medical diagnosis, based on one or more patterns observed in the plurality of medical narratives (904);
creating a plurality of search strings for the root cause based on traversing the hierarchical graph, wherein the hierarchical graph is traversed starting from a root node till a leaf node, where the root node is identified using the root cause from a CM base (906); and
identifying an automated coding for each of the plurality of search strings, wherein the automated coding is a n-digit ICD code (908).

3. The processor implemented method of claim 1, wherein the document digitization technique (400) comprises:
identifying a concept from the plurality medical narrative based on a Natural Language Processing (NLP) technique, wherein the concept is a multi-word which is domain independent (402), and
creating a concept graph based on the identified concepts, wherein the concept graph comprises of a plurality of concept graph nodes indicating the concepts and a plurality of concept graph edges describing a plurality of relations among the concepts (404).

4. The processor implemented method of claim 1, wherein the enhanced concept graph comprises of a plurality of enhanced nodes and a plurality of enhanced edges, where the plurality of enhanced nodes comprises a string value of the concept, wherein the string value is a contextual information and a plurality of labels identified from a plurality of medical ontologies and the plurality of enhanced edges describes a plurality of relations among the string value of the concept.

5. A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive a plurality of inputs from a plurality of sources, via one or more hardware processors, wherein the plurality of inputs is associated with medical coding and the plurality of inputs comprises a plurality of medical narratives, and an International Classification of Disease (ICD) data, wherein the ICD data comprises an ICD clinical modification (CM) data;
create a concept graph, via the one or more hardware processors, using the plurality of medical narratives based on a document digitization technique;
enrich the concept graph to obtain an enhanced concept graph, via the one or more hardware processors, based on a plurality of enrichment techniques comprising a named entity recognition technique, a medical identification-laterality technique and a medical annotation technique;
create a hierarchical graph, via the one or more hardware processors, based on a mapping technique using the enhanced concept graph and the ICD data, wherein the hierarchical graph comprises of a plurality of hierarchical labels and a plurality of hierarchical nodes, wherein,
(a) the plurality of hierarchical labels is associated with a category and a sub-category, where the category describes one of an injury, a disease, and an abnormality, and the subcategory describes one of a cause, a manifestation, a location, a severity, and a type of injury or disease, and
(b) the plurality of hierarchical nodes indicates an ICD CM level information and a string value of a ICD code; and
generate an automated coding of a medical narrative, via the one or more hardware processors, using the hierarchical graph based on a medical narrative automated coding technique.

6. The system of claim 5, wherein the medical narrative automated coding technique comprises:
receiving a medical diagnosis of a patient, wherein the medical diagnosis comprises of at least one medical narrative of the patient, wherein an automated coding is to be generated for the patient;
identifying a root cause from the medical diagnosis, based on one or more patterns observed in the plurality of medical narratives;
creating a plurality of search strings for the root cause based on traversing the hierarchical graph, wherein the hierarchical graph is traversed starting from a root node till a leaf node, where the root node is identified using the root cause from a CM base; and
identifying an automated coding for each of the plurality of search strings, wherein the automated coding is a n-digit ICD code.

7. The system of claim 5, wherein the document digitization technique comprises a narration parsing technique, a section identification technique, and a narrative concept graph creation technique, wherein:
identifying a concept from the plurality medical narrative based on a Natural Language Processing (NLP) technique, wherein the concept is a multi-word which is domain independent, and
creating a concept graph based on the identified concepts, wherein the concept graph comprises of a plurality of concept graph nodes indicating the concepts and a plurality of concept graph edges describing a plurality of relations among the concepts.

8. The system of claim 5, wherein the enhanced concept graph comprises of a plurality of enhanced nodes and a plurality of enhanced edges, where the plurality of enhanced nodes comprises a string value of the concept, wherein the string value is a contextual information and a plurality of labels identified from a plurality of medical ontologies and the plurality of enhanced edges describes a plurality of relations among the string value of the concept.

Dated this 19th Day of January 2023

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

# Name Date
1 202321003858-STATEMENT OF UNDERTAKING (FORM 3) [19-01-2023(online)].pdf 2023-01-19
2 202321003858-REQUEST FOR EXAMINATION (FORM-18) [19-01-2023(online)].pdf 2023-01-19
3 202321003858-FORM 18 [19-01-2023(online)].pdf 2023-01-19
4 202321003858-FORM 1 [19-01-2023(online)].pdf 2023-01-19
5 202321003858-FIGURE OF ABSTRACT [19-01-2023(online)].pdf 2023-01-19
6 202321003858-DRAWINGS [19-01-2023(online)].pdf 2023-01-19
7 202321003858-DECLARATION OF INVENTORSHIP (FORM 5) [19-01-2023(online)].pdf 2023-01-19
8 202321003858-COMPLETE SPECIFICATION [19-01-2023(online)].pdf 2023-01-19
9 202321003858-FORM-26 [15-02-2023(online)].pdf 2023-02-15
10 202321003858-Proof of Right [28-02-2023(online)].pdf 2023-02-28
11 Abstract1.jpg 2023-03-16
12 202321003858-FER.pdf 2025-08-06
13 202321003858-FORM-26 [05-11-2025(online)].pdf 2025-11-05

Search Strategy

1 202321003858_SearchStrategyNew_E_202321003858E_28-03-2025.pdf