System And Method For Extracting Literal Information From Scanned

< Back

System And Method For Extracting Literal Information From Scanned Images Of Patients’ Medical Reports/ Clinical Documents

Abstract: Disclosed is a system for extracting literal information from scanned images of medical reports. This system includes a document layout analysis module that segments the scanned medical report into predefined content regions and categorizes them, a structure recognition module recognizing the layout of segmented regions, an information extraction module that uses optical character recognition (OCR) to transform image content into textual data, and a storage module organizing and storing the extracted information for further processing. Some variations of this system offer added layers of detail such as classifying the segmented regions into additional categories, detecting specific structures within tabular data, enhancing the image clarity during pre-processing, and employing multiple OCR engines to ensure extraction accuracy. The system may also extract specific types of data, such as patient demographics or laboratory results, for use in downstream analytics and decision support. Reference will be Fig 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 February 2025

Publication Number

10/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Pondicherry University

East Coast Road, Kalapet, Puducherry – 605014, India

Inventors

1. S. Siva Sathya

Dept. of Computer Science, PU, Kalapet, Puducherry, Puducherry – 605014, India

2. Ravichandra Sriram

Dept. of Computer Science, PU, Kalapet, Puducherry, Puducherry – 605014, India

Specification

Description:Technical Field:
The present disclosure relates to the field of computer engineering. More particularly, the present disclosure relates to a system and method for extracting literal information from scanned images of patients' medical reports/clinical documents.
BACKGROUND
The field of computer engineering encompasses a range of technologies and methodologies aimed at improving the efficiency and effectiveness of various computational and analytical tasks. Among these tasks are the extraction, processing, and interpretation of information from complex data sources, such as medical reports and clinical documents. A common challenge in this field is the accurate and reliable extraction of literal information from various types of documents, particularly those that have been scanned into digital formats. These documents often contain critical data that can be used for various applications such as clinical decision support, research, and patient management. However, due to the inherent complexity and variability of these documents, the extraction of this data often presents numerous technical challenges. Therefore, the field is constantly evolving, with a focus on developing and refining systems and methods for improving the extraction of literal information from scanned medical reports and clinical documents.
The prior art in the field of automated information extraction from medical documents faces several significant challenges and limitations. Many existing systems struggle with document layout analysis, especially for the diverse and complex formats found in medical reports. Traditional tools often fail to accurately identify and process intricate structures like bordered and borderless tables, multi-column text, or key-value pair arrangements. Additionally, these systems cannot differentiate between distinct regions of interest, such as headers, tables, and textual notes, leading to errors in data extraction.
Another major limitation lies in the inability of prior solutions to handle poor-quality scans effectively. Many medical reports are handwritten, scanned under suboptimal conditions, or contain noise, skewed text, and poor contrast. Existing systems fail to adapt to such irregularities, often misinterpreting the content or overlooking important details. Moreover, the inability to process diverse document layouts, such as single-column and two-column formats, or reports containing graphical elements like charts and diagrams, further hampers their usability.
Efficient handling of tabular data remains another area where prior art systems fall short. Borderless tables and irregular row or column alignments pose significant challenges, often leading to erroneous data extraction. Traditional OCR engines exacerbate these issues, as they struggle with handwritten text, stylized fonts, and multi-lingual content. These systems frequently fail to preserve the context or structure of extracted data, making it difficult to link numerical values to their corresponding parameters or interpret specialized medical terms accurately.
References:
[1] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.
[2] Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. 1804.02767
[3] C.-Y. Wang, I.-H. Yeh, H.-Y. M. Liao, You only learn one representation: Unified network for multiple tasks (2021). arXiv:2105.04206.
[4] Sachin Raja, Ajoy Mondal, and C V Jawahar. Visual un derstanding of complex table structures from document im ages. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2543±2552. IEEE Com puter Society, jan 2022.
[5] Chi Z, Huang H, Xu HD, et al (2019) Com plicated table structure recognition. arXiv preprint arXiv:190804729
[6] Schreiber S, Agne S, Wolf I, et al (2017) Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: Intl. Conf. on document analysis and recognition, IEEE, pp 11621167
[7] Xiao, B., Simsek, M., Kantarci, B., & Alkheir, A.A. (2022). Table Structure Recognition with Conditional Attention. ArXiv, abs/2203.03819.
[8] Umer, M., Mohsin, M.A., Ul-Hasan, A., Shafait, F. (2023). PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents. In: Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham.
Additionally, compliance and security concerns are often overlooked in prior solutions, which lack mechanisms to ensure the anonymization of sensitive patient data. This raises potential vulnerabilities in data handling, risking non-compliance with regulations like HIPAA or GDPR. The inefficiency, high cost, and manual dependency of these systems further limit their scalability, making them unsuitable for processing large volumes of medical reports. These limitations underscore the need for a robust, adaptable, and scalable solution that integrates advanced document layout analysis and OCR capabilities, while ensuring accuracy, compliance, and efficiency. There is a need to overcome these shortcomings comprehensively, offering a state-of-the-art solution for extracting structured information from scanned medical documents.
SUMMARY
One or more of the problems of the conventional prior art may be overcome by various embodiments of the present disclosure.

In one aspect of the present disclosure, a system is provided for extracting literal information from scanned images of medical reports, which is comprised of a document layout analysis module, a structure recognition module, an information extraction module, and a storage module, all working in unison to extract and store useful text data from scanned medical reports for later analysis and processing.

In another aspect of the present disclosure, the document layout analysis module in the system is further developed to classify the segmented regions of the scanned images into predefined categories such as lines of text, one-column key-value pairs, two-column key-value pairs, and multi-column tabular data.
In another aspect of the present disclosure, the structure recognition module is designed to detect rows, columns, and cells within tabular data, distinguish borderless tables and multi-column layouts based on average word and row gaps, and identify delimiters between keys and values in key-value pairs.
In another aspect of the present disclosure, the invention's information extraction module is advanced to map the extracted text into a structured format like JSON and associate each extracted text entry with its corresponding spatial location in the scanned image.
In another aspect of the present disclosure, the system includes a preprocessing module that is designed to clean up and enhance the scanned image for more accurate information extraction and layout analysis.
In another aspect of the present disclosure, a modified version of YOLOv3 deep learning model is utilized by the document layout analysis module for accurate real-time object detection and region segmentation.
In another aspect of the present disclosure, the optical character recognition engine used in the information extraction module can support various OCR engines such as Tesseract OCR, Paddle OCR, and Amazon Textract to enhance extraction accuracy based on scanned document quality.
In another aspect of the present disclosure, the invention can extract different types of literal information including patient demographics, laboratory results, discharge summaries, and clinical notes, structuring the data to facilitate downstream analytics and decision support.
In another aspect of the present disclosure, a method for extracting literal information from scanned images of medical reports is disclosed. This method involves receiving a scanned image, analyzing its layout, identifying its structure, extracting information, and storing the information for future use.

DETAILED DESCRIPTION OF THE DRAWINGS
The manner in which the features, advantages and objects of the invention, as well as others which will become apparent, may be understood in more detail, a more particular description of the invention briefly summarized above may be had by reference to the embodiment thereof which is illustrated in the appended drawings, which form a part of this specification. It is to be noted, however, that the drawings illustrate only a preferred embodiment of the invention and is therefore not to be considered limiting of the invention's scope as it may admit to other equally effective embodiments.
Fig. 1 illustrates the overall architecture of the system for extracting literal information from scanned medical reports, including its key components such as the document layout analysis module, structure recognition module, information extraction module, and storage module.
Fig. 2 illustrates the process flow for extracting literal information from scanned medical reports at various stages.
Fig. 3 illustrates the overall framework for literal information extraction, emphasizing the analysis of document layout and the identification of structure in various types of border and borderless tables.
Fig. 4 illustrates the architecture of YOLOv3-DLA. The deep learning model is a modified version of YOLOv3. It presents the Darknet-65 feature extractor and FPN feature detector.
Fig. 5 presents the different types of tables with and without borders used for structure identification and literal information extraction.
Fig. 6 presents the sample outputs of document layout analysis on clinical documents using the YOLOv3-DLA model.
Fig. 7 illustrates the various stages of the structure identification process of a multi-column table without a border.

DETAILED DESCRIPTION OF THE PRESENT INVENTION
Various embodiments of the disclosure are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the disclosure. Thus, the following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of the disclosure. However, in certain instances, known details are not described in order to avoid obscuring the description.
References to one or an embodiment in the present disclosure can be references to the same embodiment or any embodiment; and, such references mean at least one of the embodiments.
Reference to "one embodiment", "an embodiment", “one aspect”, “some aspects”, “an aspect” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Alternative language and synonyms may be used for any one or more of the terms discussed herein, and no special significance should be placed upon whether or not a term is elaborated or discussed herein. In some cases, synonyms for certain terms are provided.
A recital of one or more synonyms does not exclude the use of other synonyms.
The use of examples anywhere in this specification including examples of any terms discussed herein is illustrative only, and is not intended to further limit the scope and meaning of the disclosure or of any example term. Likewise, the disclosure is not limited to various embodiments given in this specification. Without intent to limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the disclosure. Unless otherwise defined, technical and scientific terms used herein have the meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the present document, including definitions, will control.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims or can be learned by the practice of the principles set forth herein.
As mentioned before, there is a need for a robust, adaptable, and scalable solution that integrates advanced document layout analysis and OCR capabilities, while ensuring accuracy, compliance, and efficiency. There is a need to overcome these shortcomings comprehensively, offering a state-of-the-art solution for extracting structured information from scanned medical documents.
In one embodiment, the present invention pertains to a system for extracting literal information from scanned images of medical reports. This system comprises a document layout analysis module, a structure recognition module, an information extraction module, and a storage module. The document layout analysis module encompasses a trained deep learning model that is programmed to analyze the structure of a scanned medical report, further segmenting the image into predefined content regions. This module classifies these segmented regions into specific types, which include lines of text, one-column key-value pairs, two-column key-value pairs, and multi-column tabular data. The structure recognition module is equipped with the capability to determine the layout of these segmented regions using pattern recognition process or step or methods. It can identify the structure of the segmented regions, detect rows, columns, and cells within tabular data, recognize borderless tables and multi-column layouts based on average word and row gaps, and discern delimiters between keys and values in key-value pairs.
The information extraction module is configured to retrieve literal information from the identified regions and transform the image content into textual data using an optical character recognition (OCR) engine. This module enhances extraction capabilities by supporting multiple OCR engines, such as Tesseract OCR, Paddle OCR, and Amazon Textract, depending on the quality of the scanned document. The retrieved textual information is then translated into a structured format, such as JSON, where each extracted text entry is associated with its spatial location in the scanned image. The final module, the storage module, organizes and safely stores the extracted textual information in a structured database format so that it can be processed and analyzed later.
Fig. 1 illustrates the overall architecture of the system for extracting literal information from scanned medical reports. The system comprises a document layout analysis module configured to segment scanned images into predefined regions, a structure recognition module for identifying tabular data and key-value pairs, an information extraction module to retrieve text from the segmented regions, and a storage module for saving extracted data in a structured format, such as JSON.
Fig. 2 illustrates the process flow for extracting literal information from clinical document images. The process begins with the application of a modified YOLOv3-DLA deep learning model to perform layout analysis on scanned medical reports. This analysis segments the documents into specific content types, including bordered tables, borderless tables, key-value pairs, and text regions. Subsequently, pattern recognition techniques are utilized to identify the structure of tables, whether bordered or borderless. The final step involves extracting textual information from each identified cell and storing it in a machine-readable JSON format for further use.

Fig. 3 demonstrates the framework for literal information extraction. The framework includes the creation of a labeled dataset and training of the YOLOv3-DLA deep learning model. It also outlines the structure recognition process for tabular data, where the system identifies rows, columns, and cells using advanced methods, such as morphological operations and thresholding. The figure further details the accurate segmentation of bordered and borderless tables for structured data extraction.
The present invention, referred to as System (100), is configured for automated layout analysis and data extraction from scanned medical reports. The system receives scanned reports (102) in JPEG or PDF formats, which are pre-processed through a pre-processing module (106) to enhance image quality. A labeling module (108) is employed to create labeled datasets by annotating regions of interest, such as tables, key-value pairs, and text blocks, with location and boundary information. These labeled datasets are then used to train the YOLOv3-DLA model (110), a deep learning-based object detection model specifically optimized for medical document analysis. The trained model produces weights (114) that are utilized during inference to identify and classify regions in unseen data.
The system incorporates an overlapping region refinement module (116) to resolve ambiguities or overlapping regions detected by the YOLOv3-DLA model. Various processes are employed to handle distinct content structures within scanned documents, including a one-column key-value pair method (120), a two-column key-value pair method (122), a table detection method (124) for identifying rows and cells, and a multi-column text method (126). Free-text regions are directly converted into text through an image-to-text conversion module (128). Collectively, these modules ensure comprehensive data extraction from a variety of document layouts.
The disclosed method (200) provides a systematic approach for automating the extraction and structuring of information from patient medical reports in image or PDF formats into a JSON structure. The process begins with a pre-processing module (202) to generate training and validation datasets by labeling key information fields for supervised learning. These labeled datasets are utilized to train the YOLOv3-DLA model (204), which is then applied to perform document layout analysis (206), segmenting critical regions such as headers, tables, and text blocks. Identified regions are refined (208) to eliminate overlaps, ensuring precise and non-redundant mapping of content.
Pattern recognition techniques (210) are subsequently applied to analyze and define the structure of each region, associating them with predefined formats commonly found in medical reports. Literal information, including textual and numeric data, is extracted (212) and converted into a structured JSON format. The final output (214) is a JSON file that facilitates integration with systems such as electronic health records. This method significantly reduces manual effort, enhances data processing accuracy, and provides a scalable solution for unstructured medical data.
Fig. 4 depicts the architecture of the YOLOv3-DLA deep learning model, which performs real-time object detection by evaluating the entire image in a single pass. YOLOv3-DLA comprises two primary components: a feature extractor (Darknet-65) and a feature detector. Darknet-65 contains 66 convolution layers organized into six residual units, with input images down-sampled five times for multi-level feature extraction. The feature detector divides the input image into a 13x13 grid, predicting bounding boxes, objectness scores, and class confidences for each cell, indicating the probability of finding objects within specified regions.
Fig. 5 illustrates various table structures used for information extraction, including bordered, borderless, and partially bordered tables. Borderless tables present unique challenges, as they lack distinct divisions between rows and columns.
Fig. 6 shows examples of document layout analysis applied to clinical documents using the YOLOv3-DLA model. This process segments images into different region types, including two-column key-value pairs, multi-column data, and text regions.
Fig. 7 outlines the structure identification process for a multi-column, borderless table. The figure demonstrates the steps of removing partial borders, applying thresholding, eliminating overlapping text across columns, identifying columns and rows using average gaps, and determining the overall table structure.

The following table 1 shows the Performance of Document Layout Analysis using YOLOv3-DLA on the CLINDOC dataset.
Table 1
IoU 0.5 0.6 0.7 0.8
Model P R F1 P R F1 P R F1 P R F1
Faster-RCNN-VGG16 [1] 82.4 81.3 81.6 78.3 77.2 77.5 70.9 69.9 70.2 67.6 66.7 66.9
Faster-RCNN-ResNet-101 [1] 88.9 90.1 89.6 84.5 85.6 85.1 76.5 77.5 77.1 72.9 73.9 73.5
Yolo-Tiny [2] 68.5 72.5 70.4 65.1 68.9 66.9 58.9 62.4 60.6 56.2 59.5 57.8
Yolov3 [2] 94.2 92.5 93.4 89.5 87.9 88.8 81 79.6 80.4 77.2 75.9 76.6
YoloR [3] 86.9 94.4 88.2 82.6 89.7 83.8 74.7 81.2 75.9 71.3 77.5 72.3
YoloV3-DLA(Proposed) 97.2 95.0 96.1 92.4 90.3 91.4 83.65 81.8 82.7 79.8 78 78.9

The invention employs a YOLOv3-DLA deep learning model to perform document layout analysis on scanned patient medical reports. This model demonstrates exceptional accuracy in detecting and segmenting various regions within the reports, such as tables, key-value pairs, and text lines. The evaluation metrics, including Precision (P), Recall (R), and F1-Score (F1), are computed at various Intersection over Union (IoU) thresholds. Table 1 illustrates the comparison between YOLOv3-DLA and other state-of-the-art object detection models, where YOLOv3-DLA outperforms competing methods in all metrics, particularly at higher IoU thresholds.
The following table 2 shows Performance of YOLOv3-DLA for different region types on the CLINDOC dataset.

Table 2
Region Type Precision Recall F1-Score
TWO_COL_KV 97.89 94.9 96.37
ONE_COL_KV 97.78 89.8 93.62
TABLE 98.96 95.96 97.44
MULTI_COL_TEXT 93.88 97.87 95.83
TEXT_LINES 97.94 96.94 97.44
Overall 97.27 95.07 96.16

Where the system demonstrates class-specific performance metrics on the CLINDOC dataset, achieving high precision and recall values across all region types. Table 2 presents the detailed performance breakdown for each region type, where text lines and tables exhibit the highest F1 scores, underscoring the system’s capability to accurately detect both simple and complex structures.
The method of performance by the proposed structure detecting table layouts and extracting literal information. Table 3 compares the precision, recall, and F1-scores of the process or step or methods against other state-of-the-art approaches.
Table 3
IoU 0.5 0.6 0.7 0.8
Model P R F1 P R F1 P R F1 P R F1
TabStructNet [4] 92.7 91.3 92.0 84.4 83.1 83.7 78.8 77.6 78.2 76 74.9 75.4
GraphTSR[5] 81.9 85.5 83.7 74.5 77.8 76.2 69.6 72.7 71.1 67.2 70.1 68.6
DeepDeSRT [6] 90.6 88.7 89.0 82.4 80.7 81 77 75.4 75.7 74.3 72.7 73
CATT-Net [7] 94.1 90.0 92.3 85.6 82 84 80 76.6 78.5 77.2 73.9 75.7
PyramidTabNet [8] 93.2 88.6 90.8 84.8 80.6 82.6 79.2 75.3 77.2 76.4 72.7 74.5
Proposed methodology 95.5 97.9 96.6 86.9 89.1 88 81.2 83.2 82.2 78.3 80.3 79.3
The system achieves high precision and recall for various region types during table structure recognition. Table 4 summarizes the results, highlighting near-perfect scores for most regional types, particularly text lines.

Table 4
Region Type Precision Recall F1-Score
TWO_COL_KV 92.78 96.77 94.74
ONE_COL_KV 93.81 96.81 95.29
TABLE 98.98 97.98 98.48
MULTI_COL_TEXT 91.84 97.83 94.74
TEXT_LINES 100 100 100
Overall 95.51 97.91 96.69

The following Table 5 Performance of Literal Information extraction on CLINDOC dataset. The performance of various OCR engines during literal information extraction is evaluated using edit-distance metrics. Table 5 shows that Tesseract OCR achieves the highest accuracy across all edit distances.
OCR Engine Edit-Distance
0 1 2 3
Paddle OCR 0.54 0.56 0.61 0.65
Easy OCR 0.52 0.58 0.61 0.64
Amazon Textract 0.72 0.75 0.79 0.81
Microsoft Read API 0.75 0.78 0.80 0.82
Tesseract OCR 0.79 0.82 0.85 0.88

From table 1-5, the present invention exhibits addressing the challenges associated with extracting literal information from patient medical reports, which often exhibit diverse and complex layouts. The invention utilizes a YOLOv3-DLA deep learning model for document layout analysis, achieving superior performance in identifying and segmenting regions such as tables, key-value pairs, and text lines with high precision and recall. The invention further incorporates customized processes or step or methods for structure recognition, enabling the accurate detection of various tabular layouts, including bordered and borderless tables, one-column and two-column key-value pairs, and multi-column text. These processes or step or methods employ advanced techniques, including Otsu's thresholding, pixel-based gap analysis, and morphological operations, to define rows, columns, and cells. This allows the system to address challenges associated with irregular layouts, overlapping content, and multi-line cells. The extracted data is transformed into a machine-readable JSON format, ensuring seamless compatibility with downstream applications such as electronic health records (EHRs) and clinical decision support systems (CDSS).
The invention demonstrates exceptional technical performance, as evidenced by its evaluation metrics. The YOLOv3-DLA model achieves an F1-score of 96.16% for document layout analysis, significantly outperforming other state-of-the-art methods. Additionally, the structure identification process or step or methods achieve near-perfect precision and recall for region-specific tasks, with table recognition obtaining an F1-score of 98.48%. The inclusion of advanced OCR engines for literal information extraction further enhances the system’s robustness, achieving high accuracy across varying edit-distance thresholds. These results illustrate the system's ability to handle complex medical document layouts effectively, making it a transformative solution for automating the extraction of critical information, improving efficiency, accuracy, and interoperability in healthcare data management systems.
The invention provides a system and method for extracting literal information from scanned medical reports. The process begins with the receipt of scanned medical reports, which serve as input to the system. As illustrated in Fig. 1 and 3, the system comprises a document layout analysis module, a structure recognition module, an information extraction module, and a storage module. These components operate in a sequential manner to convert unstructured data into structured formats for downstream applications.
The first step involves the document layout analysis module, as shown in Fig. 2, which segments the scanned medical reports into predefined regions such as tables, key-value pairs, and text blocks. The module employs a modified YoloV3-DLA deep learning model trained on a dataset of annotated medical reports. This ensures high accuracy in identifying complex and diverse layouts, including bordered and borderless tables, as well as single-column and multi-column text regions. The segmented regions are then passed to the subsequent module for further processing.
Next, the structure recognition module, depicted in Fig. 3, processes the segmented regions, particularly tables. This module utilizes advanced process or step or methods, including morphological operations and thresholding, to identify rows, columns, and cells within tables. Both bordered and borderless tables are accurately recognized, ensuring that the hierarchical relationships within the data are preserved. For instance, headers are aligned with their corresponding rows and columns, enabling precise extraction of tabular information.
Following segmentation and structure recognition, the information extraction module extracts textual data from the segmented regions using an integrated optical character recognition (OCR) engine. The OCR engine converts image data into textual content while maintaining the logical structure of the original document. The module supports printed and handwritten text, enhancing the robustness of the system. The extracted text is mapped to its respective regions in the document, ensuring contextual accuracy.
Finally, the structured data is stored in a machine-readable format, such as JSON. This structured format preserves the logical relationships and hierarchical structure of the original document, allowing for seamless integration with external systems like electronic health records (EHRs). The system ensures compliance with data privacy regulations by anonymizing sensitive patient information during processing. Preprocessing steps, including noise removal and skew correction, enhance input quality, ensuring accurate results. The invention, by integrating advanced machine learning techniques and robust preprocessing, provides an efficient and scalable solution for extracting literal information from scanned medical reports.
The foregoing description provides illustrative details of the present invention, its structure, and its operation. It is to be understood that the above description is merely illustrative of the principles of the invention and is not intended to limit the scope of the invention to the specific embodiments disclosed. The invention is capable of various modifications and adaptations, as will be evident to those skilled in the art, without departing from the broader spirit and scope of the invention.
The scope of the invention is defined solely by the appended claims and their equivalents. All features disclosed in this specification, including drawings, may be combined in any manner consistent with the claims
, Claims:A system (100) for extracting literal information from scanned images of medical reports, comprising:
a preprocessing module (108) configured to prepare scanned medical reports in image or PDF formats by enhancing image quality, removing noise, correcting skewness, and preparing the data for downstream analysis;
a training module (110) configured to train a YOLOv3-DLA deep learning model on validated datasets that include labeled medical reports with boundaries and location data;
a document layout analysis module (104) configured to analyze the structure of scanned medical reports by segmenting them into predefined content regions such as tables, key-value pairs, and text blocks;
a refinement module (116) configured to resolve overlapping or ambiguous regions identified during the document layout analysis to ensure accurate mapping and segmentation of content;
a structure recognition module (118) configured to identify the structural layout of the segmented regions using pattern recognition techniques, including rows, columns, and key-value pairs;
an information extraction module (112) configured to retrieve literal information from the identified regions, including textual and numerical data, and transform it into structured formats, such as JSON; and
a storage module (130) configured to organize and store the extracted textual information in a structured database format for subsequent processing and analysis.
2. The system (100), as claimed in claim 1, wherein the preprocessing module (104) is further configured to remove noise, adjust contrast, and correct for skewness to enhance image quality before analyzing and labeling (108) the contents based on its structure.
3. The system (100) as claimed in claim 1, wherein the training module (110) utilizes labeled datasets in formats like MS COCO to optimize the YOLOv3-DLA deep learning model for document layout analysis.
4. The system (100), as claimed in claim 1, wherein the document layout analysis module (104) is configured to classify segmented regions into predefined categories, including lines of text, one-column key-value pairs, two-column key-value pairs, table, and multi-column tabular data.
5. The system (100) as claimed in claim 1, wherein the refinement module (116) is configured to resolve overlapping or ambiguous boundaries in regions detected during the document layout analysis to ensure precision in segmentation.
6. The system (100) as claimed in claim 1, wherein the structure recognition module (118) is further configured to detect rows, columns, and cells within tabular data, distinguish borderless tables and multi-column layouts based on average word and row gaps, and identify delimiters between keys and values in key-value pairs.
7. The system (100) as claimed in claim 1, wherein the information extraction module (112) is further configured to map the extracted text into a structured format, such as JSON, and associate each extracted text entry with its corresponding spatial location in the scanned image.
8. The system (100) as claimed in claim 1, wherein the storage module (130) stores the extracted data in formats compatible with downstream analytics systems, such as electronic health records (EHRs).
9. A method (200) for extracting literal information from scanned images of medical reports, the method comprising:
receiving (202) scanned medical reports in image or PDF formats and pre-processing them by enhancing image quality, removing noise, correcting skewness, and preparing the data for downstream analysis;
training (204) a YOLOv3-DLA deep learning model on validated datasets that include labeled medical reports with boundaries and location data;
applying (206) document layout analysis on the trained dataset to identify and segment predefined regions, including tables, key-value pairs, and text regions, from the scanned medical reports;
refining (208) overlapping or ambiguous regions identified during the layout analysis to ensure accurate mapping and segmentation of content;
applying (210) pattern recognition techniques to identify the structure of each segmented region, including rows, columns, and key-value pairs, within the medical reports;
extracting (212) literal information from the identified regions, including textual and numerical data, and converting it into structured formats, such as JSON; and
outputting (214) the structured information in a machine-readable format for subsequent processing, storage, and analysis.
10. The method (200) as claimed in claim 9, wherein the step of pre-processing (202) further includes removing noise, adjusting for contrast, correcting image skew, and preparing labeled datasets in formats like MS COCO for model training.
11. The method (200) as claimed in claim 9, wherein the document layout analysis (206) utilizes a YOLOv3-DLA deep learning model to identify complex document structures, including bordered and borderless tables, multi-column text regions, and key-value pairs.
12. The method (200) as claimed in claim 9, wherein the step of refining (208) includes resolving ambiguities and overlaps in regions detected during the document layout analysis using additional processing process or step or methods.
13. The method (200) as claimed in claim 9, wherein the pattern recognition (210) step identifies delimiters, such as spacing or alignment, between keys and values in single-column or two-column key-value pair layouts.
14. The method (200) as claimed in claim 9, wherein the output (214) includes formatting the extracted information into JSON files, adhering to standards compatible with electronic health records (EHR) systems.

Documents

Application Documents

#	Name	Date
1	202541012130-STATEMENT OF UNDERTAKING (FORM 3) [13-02-2025(online)].pdf	2025-02-13
2	202541012130-PROOF OF RIGHT [13-02-2025(online)].pdf	2025-02-13
3	202541012130-FORM FOR SMALL ENTITY(FORM-28) [13-02-2025(online)].pdf	2025-02-13
4	202541012130-FORM 1 [13-02-2025(online)].pdf	2025-02-13
5	202541012130-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [13-02-2025(online)].pdf	2025-02-13
6	202541012130-EVIDENCE FOR REGISTRATION UNDER SSI [13-02-2025(online)].pdf	2025-02-13
7	202541012130-EDUCATIONAL INSTITUTION(S) [13-02-2025(online)].pdf	2025-02-13
8	202541012130-DRAWINGS [13-02-2025(online)].pdf	2025-02-13
9	202541012130-DECLARATION OF INVENTORSHIP (FORM 5) [13-02-2025(online)].pdf	2025-02-13
10	202541012130-COMPLETE SPECIFICATION [13-02-2025(online)].pdf	2025-02-13
11	202541012130-FORM-9 [27-02-2025(online)].pdf	2025-02-27
12	202541012130-FORM-8 [27-02-2025(online)].pdf	2025-02-27
13	202541012130-FORM-26 [28-02-2025(online)].pdf	2025-02-28
14	202541012130-FORM 18A [28-02-2025(online)].pdf	2025-02-28
15	202541012130-EVIDENCE OF ELIGIBILTY RULE 24C1f [28-02-2025(online)].pdf	2025-02-28
16	202541012130-FER.pdf	2025-04-22
17	202541012130-FORM 3 [10-07-2025(online)].pdf	2025-07-10
18	202541012130-OTHERS [07-10-2025(online)].pdf	2025-10-07
19	202541012130-FER_SER_REPLY [07-10-2025(online)].pdf	2025-10-07

Search Strategy

1	202541012130_SearchStrategyNew_E_SearchHistoryE_30-03-2025.pdf