Abstract: Disclosed herein is a text summarization system and method thereof (100) a pre-processing unit (102), configured to receive an input data stream comprising unstructured or structured text and to perform tokenization, noise removal, and content normalization on the input data stream, a multimodal input processor (104), a hybrid encoder-decoder unit (106), operatively connected to the pre-processing unit (102), and the multimodal input processor (104), a retrieval-augmented generation module (108), integrated within the hybrid encoder-decoder unit (106), a domain-specific fine-tuning module (110), integrated within the hybrid encoder-decoder unit (106), a communication network (112), operatively connected to the hybrid encoder-decoder unit (106), and configured to transmit generated summaries and contextual data, a user device (114), operatively connected to the communication network (112), a user interface (116), inside the user device (114), the user interface (116), a storage unit (118), operatively connected to the hybrid encoder-decoder unit (106).
Description:FIELD OF DISCLOSURE
[0001] The present disclosure relates generally relates to natural language processing, more specifically, relates to text summarization system and method thereof.
BACKGROUND OF THE DISCLOSURE
[0002] The invention helps people understand large amounts of written information quickly by giving short and meaningful summaries. It saves time for users who want to get the main idea without reading everything in detail. This is helpful for students, professionals, and researchers who deal with many documents. The summaries are easy to read and make the information clearer. It improves learning and decision-making by giving quick insights. People can stay informed without feeling overwhelmed.
[0003] The invention works well with different types of content, such as news articles, reports, or documents from different fields. It adjusts its summaries to match the topic, making it useful in education, business, healthcare, and more. It removes the need to search for summaries from multiple sources. Users can trust that the summaries are suitable for their needs. This makes the invention helpful in many real-world situations.
[0004] The invention also works with other formats like speech and images by turning them into useful summaries. This helps people who prefer listening or viewing over reading. It is especially useful for presentations, meetings, or videos. It allows people to understand information shared in different ways. This flexibility makes it easier for more people to use the system in daily life. It helps connect different types of content into one clear message.
[0005] Many existing systems fail to give meaningful summaries because they do not fully understand the context of the content. The result often leaves out important parts or focuses on the wrong sections. This makes the summaries confusing or incomplete. Readers may still have to go through the full document to understand the topic. This reduces the usefulness of the system. It also creates extra work instead of saving time.
[0006] Some existing solutions do not work well across different subjects or content types. A summary made for one topic might not be helpful for another. This lack of flexibility means people need to use different tools for different needs. It creates confusion and slows down the process. Users also have to spend extra effort adjusting the system to work in their area. This limits the system’s usefulness in real-world situations.
[0007] Many older systems cannot handle information that comes from pictures or speech. They are limited to only written content and ignore other helpful formats. As a result, people miss out on important parts of the message shared through images or voice. These systems do not help in situations like meetings, presentations, or videos. They are less useful for users who want summaries from different types of media. This makes them less practical for modern use.
[0008] Thus, in light of the above-stated discussion, there exists a need for a text summarization system and method thereof.
SUMMARY OF THE DISCLOSURE
[0009] The following is a summary description of illustrative embodiments of the invention. It is provided as a preface to assist those skilled in the art to more rapidly assimilate the detailed design discussion which ensues and is not intended in any way to limit the scope of the claims which are appended hereto in order to particularly point out the invention.
[0010] According to illustrative embodiments, the present disclosure focuses on a text summarization system and method thereof which overcomes the above-mentioned disadvantages or provide the users with a useful or commercial choice.
[0011] An objective of the present disclosure is to provide a text summarization solution that delivers concise and meaningful summaries while preserving the original intent and key ideas of the source material. The invention aims to improve the way users engage with long-form content without losing essential information.
[0012] An objective of the present disclosure is to reduce the time and effort required to understand lengthy documents by presenting users with a short, easy-to-read version of the content. This helps users process more information in less time with increased efficiency.
[0013] Another objective of the present disclosure is to create a solution that works equally well for different subjects, domains, and content types. The invention ensures broad usability across education, journalism, legal analysis, and content research.
[0014] Another objective of the present disclosure is to support the processing of text that comes from different languages and regions. This makes the system inclusive for diverse audiences and beneficial for global applications.
[0015] Another objective of the present disclosure is to help people make faster decisions by providing clear summaries of content from documents, news articles, reports, or discussions. The invention is especially useful in environments where rapid understanding is critical.
[0016] Another objective of the present disclosure is to include the ability to handle content from multiple formats such as written documents, images with embedded text, and spoken words. This supports the needs of users working across various media platforms and workflows.
[0017] Another objective of the present disclosure is to provide a system that can be used by both individual users and large organizations. The scalability of the system ensures its value across personal, academic, and corporate use cases.
[0018] Another objective of the present disclosure is to make the summarization process more reliable and consistent, especially for users who rely on summaries for quick reviews or decision-making. The invention intends to improve user confidence in automated summarization outputs.
[0019] Another objective of the present disclosure is to assist professionals who regularly deal with long or technical documents by providing focused summaries that highlight the most relevant sections. This increases productivity while maintaining content integrity.
[0020] Yet another objective of the present disclosure is to offer a helpful tool for students, educators, and researchers who need to review large volumes of reading material. The invention encourages better learning outcomes by enhancing the accessibility of complex content.
[0021] In light of the above, in one aspect of the present disclosure, a text summarization system is disclosed herein. The system comprises a pre-processing unit configured to receive an input data stream comprising unstructured or structured text and to perform tokenization, noise removal, and content normalization on the input data stream. The system includes a multimodal input processor operatively connected to the pre-processing unit and configured to extract textual information from different input formats including image data and speech data using embedded conversion mechanisms for image-to-text transformation and speech-to-text transcription. The system also includes a hybrid encoder-decoder unit operatively connected to the pre-processing unit and the multimodal input processor, the hybrid encoder-decoder unit comprising a transformer-based encoder and a transformer-based decoder configured to convert processed input into contextual embedding’s and generate corresponding text summaries using sequential decoding and attention-based analysis. The system also includes a retrieval-augmented generation module integrated within the hybrid encoder-decoder unit, the retrieval-augmented generation module configured to dynamically retrieve relevant content from one or more external information sources and incorporate the retrieved content into contextual embedding’s for enriched summary generation. The system also includes a domain-specific fine-tuning module integrated within the hybrid encoder-decoder unit, the domain-specific fine-tuning module configured to refine the summarization process based on linguistic structures, content patterns, and terminology associated with a particular application domain. The system also includes a communication network operatively connected to the hybrid encoder-decoder unit and configured to transmit generated summaries and contextual data. The system also includes a user device operatively connected to the communication network and configured to receive, display, and store generated summaries, configuration inputs, and system feedback. The system also includes a user interface inside the user device, the user interface configured to provide visualization of the generated summaries, accept user-defined summary preferences, and allow real-time configuration of summarization parameters. The system also includes a storage unit operatively connected to the hybrid encoder-decoder unit and configured to maintain historical content data, previously generated summaries, and feedback entries for adaptive learning and performance improvement.
[0022] In one embodiment, the pre-processing unit further comprises a language detection module configured to identify the input language and activate language-specific normalization pipelines.
[0023] In one embodiment, the multimodal input processor further comprises an optical character recognition module configured to extract text from scanned image content embedded within documents.
[0024] In one embodiment, the multimodal input processor further comprises an automatic speech recognition module configured to convert voice-based or recorded input into structured textual content for summarization.
[0025] In one embodiment, the hybrid encoder-decoder unit further comprises a cross-modal alignment unit configured to merge and align textual content derived from multiple input formats for unified context interpretation.
[0026] In one embodiment, the retrieval-augmented generation module further comprises an index-ranking sub module configured to rank retrieved content from external databases based on contextual proximity to the input content.
[0027] In one embodiment, the domain-specific fine-tuning module further comprises a domain classifier configured to automatically identify the topical domain of the input text and apply corresponding adaptation strategies.
[0028] In one embodiment, the user interface further comprises a multilingual rendering module configured to display summaries and interface prompts in user-selected regional or international languages.
[0029] In one embodiment, the storage unit further comprises a vector embedding archive configured to retain semantic representations of input and summary texts for historical comparison and adaptive learning.
[0030] In light of the above, in one aspect of the present disclosure, text summarization system disclosed herein. The method comprises receiving an input data stream at a pre-processing unit, the input data stream comprising at least one of structured or unstructured text, image data, or speech data. The method includes processing the input data stream at the pre-processing unit by performing tokenization, noise removal, and content normalization, and transmitting the processed data to a multimodal input processor. The system also includes extracting text from the image data and speech data using the multimodal input processor operatively connected to the pre-processing unit, wherein the multimodal input processor transforms image-based content using optical character recognition and transcribes voice-based input using speech recognition. The system also includes generating contextual embedding’s at a hybrid encoder-decoder unit operatively connected to the multimodal input processor and the pre-processing unit, the hybrid encoder-decoder unit applying a transformer-based encoding and decoding process to generate an intermediate representation of the content. The system also includes retrieving external knowledge through a retrieval-augmented generation module integrated within the hybrid encoder-decoder unit, the retrieval-augmented generation module dynamically sourcing relevant information and enriching the contextual embedding’s with the retrieved content. The system also includes refining the contextual embedding’s at a domain-specific fine-tuning module integrated within the hybrid encoder-decoder unit, the domain-specific fine-tuning module modifying summary content based on application-specific linguistic structure and domain terminology. The system also includes transmitting the generated summary and associated metadata through a communication network operatively connected to the hybrid encoder-decoder unit. The system also includes receiving and displaying the generated summary at a user device operatively connected to the communication network, wherein the user device comprises a user interface configured to display the summary, accept feedback, and modify system parameters in real time. The system also includes storing the input data, generated summaries, and user feedback into a storage unit operatively connected to the hybrid encoder-decoder unit for continual learning and iterative improvement.
[0031] These and other advantages will be apparent from the present application of the embodiments described herein.
[0032] The preceding is a simplified summary to provide an understanding of some embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
[0033] These elements, together with the other aspects of the present disclosure and various features are pointed out with particularity in the claims annexed hereto and form a part of the present disclosure. For a better understanding of the present disclosure, its operating advantages, and the specified object attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated exemplary embodiments of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0034] To describe the technical solutions in the embodiments of the present disclosure or in the prior art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description merely show some embodiments of the present disclosure, and a person of ordinary skill in the art can derive other implementations from these accompanying drawings without creative efforts. All of the embodiments or the implementations shall fall within the protection scope of the present disclosure.
[0035] The advantages and features of the present disclosure will become better understood with reference to the following detailed description taken in conjunction with the accompanying drawing, in which:
[0036] FIG. 1 illustrates a block diagram of a text summarization system and method thereof, in accordance with an exemplary embodiment of the present disclosure;
[0037] FIG. 2 illustrates a flowchart of a text summarization system, in accordance with an exemplary embodiment of the present disclosure;
[0038] FIG. 3 illustrates a flowchart of a text summarization method, in accordance with an exemplary embodiment of the present disclosure;
[0039] FIG. 4 illustrates a working view of the proposed model, in accordance with an exemplary embodiment of the present disclosure;
[0040] FIG. 5 illustrates a 3D bar chart of the performance of T5 model, in accordance with an exemplary embodiment of the present disclosure.
[0041] Like reference, numerals refer to like parts throughout the description of several views of the drawing.
[0042] The text summarization system and method thereof is illustrated in the accompanying drawings, which like reference letters indicate corresponding parts in the various figures. It should be noted that the accompanying figure is intended to present illustrations of exemplary embodiments of the present disclosure. This figure is not intended to limit the scope of the present disclosure. It should also be noted that the accompanying figure is not necessarily drawn to scale.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0043] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure.
[0044] In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be apparent to one skilled in the art that embodiments of the present disclosure may be practiced without some of these specific details.
[0045] Various terms as used herein are shown below. To the extent a term is used, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.
[0046] The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.
[0047] The terms “having”, “comprising”, “including”, and variations thereof signify the presence of a component.
[0048] Referring now to FIG. 1 to FIG. 5 to describe various exemplary embodiments of the present disclosure. FIG. 1 illustrates a block diagram of a text summarization system and method thereof, in accordance with an exemplary embodiment of the present disclosure.
[0049] The system 100 may include a pre-processing unit 102 configured to receive an input data stream comprising unstructured or structured text and to perform tokenization, noise removal, and content normalization on the input data stream. The system 100 may also include a multimodal input processor 104 operatively connected to the pre-processing unit 102 and configured to extract textual information from different input formats including image data and speech data using embedded conversion mechanisms for image-to-text transformation and speech-to-text transcription. The system 100 may also include a hybrid encoder-decoder unit 106 operatively connected to the pre-processing unit 102 and the multimodal input processor 104 the hybrid encoder-decoder unit 106 comprising a transformer-based encoder and a transformer-based decoder configured to convert processed input into contextual embedding’s and generate corresponding text summaries using sequential decoding and attention-based analysis. The system 100 may also include a retrieval-augmented generation module 108 integrated within the hybrid encoder-decoder unit 106 the retrieval-augmented generation module 108 configured to dynamically retrieve relevant content from one or more external information sources and incorporate the retrieved content into contextual embedding’s for enriched summary generation. The system 100 may also include a domain-specific fine-tuning module 110 integrated within the hybrid encoder-decoder unit 106 the domain-specific fine-tuning module 110 configured to refine the summarization process based on linguistic structures, content patterns, and terminology associated with a particular application domain. The system 100 may also include a communication network 112 operatively connected to the hybrid encoder-decoder unit 106 and configured to transmit generated summaries and contextual data. The system 100 may also include a user device 114 operatively connected to the communication network 112 and configured to receive, display, and store generated summaries, configuration inputs, and system feedback. The system 100 may also include a user interface 116 inside the user device 114 the user interface 116 configured to provide visualization of the generated summaries, accept user-defined summary preferences, and allow real-time configuration of summarization parameters. The system 100 may also include a storage unit 118 operatively connected to the hybrid encoder-decoder unit 106 and configured to maintain historical content data, previously generated summaries, and feedback entries for adaptive learning and performance improvement.
[0050] The pre-processing unit 102 further comprises a language detection module configured to identify the input language and activate language-specific normalization pipelines.
[0051] The multimodal input processor 104 further comprises an optical character recognition module configured to extract text from scanned image content embedded within documents.
[0052] The multimodal input processor 104 further comprises an automatic speech recognition module configured to convert voice-based or recorded input into structured textual content for summarization.
[0053] The hybrid encoder-decoder unit 106 further comprises a cross-modal alignment unit configured to merge and align textual content derived from multiple input formats for unified context interpretation.
[0054] The retrieval-augmented generation module 108 further comprises an index-ranking sub module configured to rank retrieved content from external databases based on contextual proximity to the input content.
[0055] The domain-specific fine-tuning module 110 further comprises a domain classifier configured to automatically identify the topical domain of the input text and apply corresponding adaptation strategies.
[0056] The user interface 116 further comprises a multilingual rendering module configured to display summaries and interface prompts in user-selected regional or international languages.
[0057] The storage unit 118 further comprises a vector embedding archive configured to retain semantic representations of input and summary texts for historical comparison and adaptive learning.
[0058] The method 100 may include receiving an input data stream at a pre-processing unit 102 the input data stream comprising at least one of structured or unstructured text, image data, or speech data. The method 100 may also include processing the input data stream at the pre-processing unit 102 by performing tokenization, noise removal, and content normalization, and transmitting the processed data to a multimodal input processor 104. The method 100 may also include extracting text from the image data and speech data using the multimodal input processor 104 operatively connected to the pre-processing unit 102 wherein the multimodal input processor 104 transforms image-based content using optical character recognition and transcribes voice-based input using speech recognition. The method 100 may also include generating contextual embedding’s at a hybrid encoder-decoder unit 106 operatively connected to the multimodal input processor 104 and the pre-processing unit 102 the hybrid encoder-decoder unit 106 applying a transformer-based encoding and decoding process to generate an intermediate representation of the content. The method 100 may also include retrieving external knowledge through a retrieval-augmented generation module 108 integrated within the hybrid encoder-decoder unit 106 the retrieval-augmented generation module 108 dynamically sourcing relevant information and enriching the contextual embedding’s with the retrieved content. The method 100 may also include refining the contextual embedding’s at a domain-specific fine-tuning module 110 integrated within the hybrid encoder-decoder unit 106 the domain-specific fine-tuning module 110 modifying summary content based on application-specific linguistic structure and domain terminology. The method 100 may also include transmitting the generated summary and associated metadata through a communication network 112 operatively connected to the hybrid encoder-decoder unit 106. The method 100 may also include receiving and displaying the generated summary at a user device 114 operatively connected to the communication network 112 wherein the user device 114 comprises a user interface 116 configured to display the summary, accept feedback, and modify system parameters in real time. The method 100 may also include storing the input data, generated summaries, and user feedback into a storage unit 118 operatively connected to the hybrid encoder-decoder unit 106 for continual learning and iterative improvement.
[0059] The pre-processing unit 102 receives an input data stream comprising unstructured or structured text collected from various sources such as documents, articles, and transcribed speech content. The pre-processing unit 102 performs essential data refinement operations including tokenization, which segments the input into manageable sub word units while preserving contextual integrity. The pre-processing unit 102 eliminates extraneous characters, punctuation artefacts, HTML tags, and other forms of noise that interfere with downstream processing. The pre-processing unit 102 further executes normalization procedures to standardize linguistic patterns, such as converting all characters to a uniform case, removing stop words, and applying stemming or lemmatization where necessary. The pre-processing unit 102 incorporates logic to assess and manage character encoding inconsistencies, language-specific symbols, and structural formatting variances. The pre-processing unit 102 ensures consistency in input representation across multilingual data streams by implementing a language identification subroutine that triggers dedicated normalization workflows. The pre-processing unit 102 supports transformation of multimodal data, passing cleaned text from image-based or speech-based sources onward for further interpretation. The pre-processing unit 102 structures the refined data stream into a format compliant with sequential processing expectations of the multimodal input processor 104 and the hybrid encoder-decoder unit 106. The pre-processing unit 102 operates in real-time synchronization with upstream data acquisition interfaces to ensure continuous input flow and seamless transition to subsequent processing components. The pre-processing unit 102 acts as a critical interface between raw user-generated input and context-aware interpretation engines, enforcing consistency, clarity, and computational efficiency at the initial stage of the summarization pipeline. The pre-processing unit 102 enables adaptive pre-processing configurations aligned with domain-specific input patterns, allowing optimized filtering based on subject matter, terminologies, and language structures. The pre-processing unit 102 retains no intermediate state, passing all output directly to downstream processors through defined communication pathways, ensuring clean and uniform data propagation across the summarization system.
[0060] The multimodal input processor 104 receives refined content from the pre-processing unit 102 and processes inputs across diverse formats including text, image data, and speech recordings. The multimodal input processor 104 extracts meaningful textual information from visual content by employing an integrated optical character recognition mechanism that identifies and converts printed or handwritten text present within images into editable and analysable digital text. The multimodal input processor 104 also includes an embedded transcription system that uses automatic speech recognition to transform spoken language into accurate, time stamped textual content, preserving semantic integrity and speaker-dependent nuances. The multimodal input processor 104 manages the simultaneous processing of inputs originating from multiple modalities by routing them through parallel conversion pipelines, synchronizing the extracted outputs into a unified textual format. The multimodal input processor 104 eliminates inconsistencies resulting from poor image resolution, accented speech patterns, or background disturbances by implementing noise-tolerant recognition filters and confidence-based revalidation loops. The multimodal input processor 104 facilitates dynamic switching between input modalities based on user preference or source priority, thereby enhancing the versatility of the summarization system across different environments and use cases. The multimodal input processor 104 enables seamless integration of content originating from heterogeneous media sources, ensuring that each data type contributes equally to the contextual interpretation performed by the hybrid encoder-decoder unit 106. The multimodal input processor 104 applies language-specific interpretation rules post-conversion to preserve domain semantics, idiomatic expressions, and metadata associations for the input. The multimodal input processor 104 aligns the converted content into a structured token format that remains consistent across all downstream modules. The multimodal input processor 104 ensures that real-time inputs such as live voice feeds or scanned forms are promptly transformed into structured input for immediate processing. The multimodal input processor 104 serves as a content unification gateway that bridges multimodal inputs with the text-focused processing architecture of the summarization system.
[0061] The hybrid encoder-decoder unit 106 receives structured and unified input from the multimodal input processor 104 and the pre-processing unit 102 to perform context-aware transformation and summary generation. The hybrid encoder-decoder unit 106 comprises a transformer-based encoder and a transformer-based decoder that function sequentially to extract contextual relationships and generate coherent textual summaries. The hybrid encoder-decoder unit 106 begins by processing the input sequence through the transformer-based encoder, which applies self-attention layers and positional encodings to convert the input into contextual embedding’s that preserve syntactic structure and semantic depth. The hybrid encoder-decoder unit 106 then transfers these contextual embedding’s to the transformer-based decoder, which utilizes cross-attention mechanisms to align and decode the embedding are into fluent logically ordered summaries. The hybrid encoder-decoder unit 106 supports hierarchical token processing to capture both sentence-level and document-level dependencies. The hybrid encoder-decoder unit 106 integrates and fuses data from various sources while maintaining the flow and coherence of the input narrative. The hybrid encoder-decoder unit 106 dynamically adjusts attention weights based on keyword significance and information entropy, enabling selective emphasis on high-value tokens. The hybrid encoder-decoder unit 106 incorporates positional memory to ensure that sequence continuity and contextual flow are maintained throughout the decoding process. The hybrid encoder-decoder unit 106 maintains compatibility with multilingual embedding’s and accommodates domain-specific jargon without semantic distortion. The hybrid encoder-decoder unit 106 is integrated with the retrieval-augmented generation module 108 and the domain-specific fine-tuning module 110, allowing it to enrich contextual embedding’s with external knowledge and adapt to linguistic variations from specialized fields. The hybrid encoder-decoder unit 106 outputs grammatically and structurally optimized summaries that reflect the core message of the source content. The hybrid encoder-decoder unit 106 enables efficient processing of high-volume text streams while ensuring the generated summaries remain readable, accurate, and tailored to user-defined parameters and domain requirements.
[0062] The retrieval-augmented generation module 108 is integrated within the hybrid encoder-decoder unit 106 and functions as a dynamic information retrieval and content enrichment component. The retrieval-augmented generation module 108 operates by sourcing relevant external data from indexed knowledge bases, online encyclopaedic datasets, internal repositories, or context-specific document collections to complement the contextual embedding’s generated by the transformer-based encoder of the hybrid encoder-decoder unit 106. The retrieval-augmented generation module 108 evaluates the semantic proximity between input content and external data using vector similarity measures and ranked index scoring to identify the most contextually relevant information. The retrieval-augmented generation module 108 retrieves the top-ranked passages and fuses them with the intermediate embedding’s produced by the encoder, thereby enhancing the factual accuracy, contextual completeness, and topical relevance of the output generated by the transformer-based decoder. The retrieval-augmented generation module 108 functions continuously during both training and inference phases to allow real-time integration of up-to-date knowledge. The retrieval-augmented generation module 108 dynamically adjusts retrieval strategies based on query structure, content density, and user-defined parameters such as domain constraints or preference filters. The retrieval-augmented generation module 108 utilizes a two-tier filtering mechanism to exclude irrelevant or redundant data before fusion, ensuring that only concise and high-utility information influences the summarization process. The retrieval-augmented generation module 108 is also capable of handling multilingual sources and performing language alignment before embedding the retrieved knowledge into the target context. The retrieval-augmented generation module 108 ensures that the generated summaries reflect not only the original input but also timely and authoritative external references. The retrieval-augmented generation module 108 enhances system adaptability in rapidly evolving knowledge domains and improves summarization robustness in situations where the input lacks clarity or completeness. The retrieval-augmented generation module 108 enables the hybrid encoder-decoder unit 106 to produce highly accurate, evidence-backed, and context-enriched summaries.
[0063] The domain-specific fine-tuning module 110 is integrated within the hybrid encoder-decoder unit 106 and is responsible for customizing the summarization process to suit specific professional, academic, or industry contexts. The domain-specific fine-tuning module 110 receives contextual embedding’s generated by the transformer-based encoder and applies semantic calibration based on domain-trained parameters. The domain-specific fine-tuning module 110 utilizes specialized training datasets derived from fields such as medicine, law, finance, education, and technology to guide the summary generation process toward using domain-appropriate terminology, tone, and structural formatting. The domain-specific fine-tuning module 110 includes a topic classifier that identifies the domain of the incoming input text and activates a corresponding adaptation pathway within the hybrid encoder-decoder unit 106. The domain-specific fine-tuning module 110 adjusts attention weightings, embedding prioritization, and decoder token preferences to align the generated summary with the linguistic patterns and syntactic expectations of the identified domain. The domain-specific fine-tuning module 110 incorporates a feedback-driven optimization routine that updates its internal calibration models based on previously generated summaries, expert validation scores, and user interaction data retrieved from the storage unit 118. The domain-specific fine-tuning module 110 ensures that technical jargon is preserved where necessary, abbreviations are correctly expanded or retained depending on context, and phraseology conforms to professional standards. The domain-specific fine-tuning module 110 enhances summarization accuracy in complex documents such as research papers, medical case studies, legal arguments, and business reports by ensuring terminological consistency and logical coherence. The domain-specific fine-tuning module 110 also applies context-aware re-ranking of sentence structures to improve flow and readability within the summary. The domain-specific fine-tuning module 110 functions as an intermediary between raw contextual embedding’s and the final output, refining semantic fidelity to meet specific audience expectations. The domain-specific fine-tuning module 110 improves overall relevance, trustworthiness, and usability of the generated summary in sensitive and knowledge-intensive domains.
[0064] The communication network 112 is operatively connected to the hybrid encoder-decoder unit 106 and functions as the primary transmission channel for transferring generated summaries, contextual data, and system instructions between various system components and external endpoints. The communication network 112 facilitates secure, real-time data exchange between the hybrid encoder-decoder unit 106 and the user device 114 by utilizing standardized protocols for digital communication. The communication network 112 is designed to maintain uninterrupted and bi-directional data flow, supporting the low-latency transfer of generated text summaries, user configuration preferences, and feedback information. The communication network 112 operates in coordination with the retrieval-augmented generation module 108 by enabling connectivity with one or more external content repositories and knowledge databases, thereby allowing dynamic retrieval of contextually relevant information. The communication network 112 also supports integration with remote cloud storage systems and application interfaces, permitting scalable deployment across distributed computing environments. The communication network 112 handles metadata propagation for maintaining system logs, access timestamps, and device-level activity histories, which are subsequently utilized by the storage unit 118 for adaptive learning and monitoring. The communication network 112 maintains redundancy protocols and fall back routing paths to ensure consistent connectivity even under variable bandwidth or infrastructure conditions. The communication network 112 is also responsible for routing user-defined control commands originating from the user interface 116 to the hybrid encoder-decoder unit 106 to execute summary parameter modifications or initiate domain reconfiguration within the domain-specific fine-tuning module 110. The communication network 112 manages session authentication and user verification procedures, ensuring that data exchange between the user device 114 and system components adhere to access control policies. The communication network 112 plays a critical role in preserving the functional integrity and operational synchronization of the overall system architecture by serving as the backbone for digital coordination, remote access, and decentralized system interaction.
[0065] The user device 114 is operatively connected to the communication network 112 and serves as the primary medium through which users interact with the text summarization system 100. The user device 114 is configured to receive generated summaries, system alerts, configuration inputs, and contextual metadata transmitted from the hybrid encoder-decoder unit 106. The user device 114 processes and displays the incoming content in a structured and interactive format, allowing the user to review generated summaries in real time. The user device 114 supports the storage of generated summaries and session-based data for offline access and historical review. The user device 114 also facilitates the transmission of user-defined configuration preferences, custom summary length parameters, and domain-specific constraints back to the hybrid encoder-decoder unit 106 through the communication network 112. The user device 114 operates as an interface point between the user interface 116 and backend system components, enabling real-time adjustments to model behaviour and content refinement. The user device 114 maintains synchronization with the cloud-based or locally hosted components of the storage unit 118 to ensure all user activity, system-generated feedback, and configuration logs are stored for adaptive learning purposes. The user device 114 also enables content export in various standardized formats, allowing integration with external platforms, digital workflows, or archiving systems. The user device 114 supports session-based authentication and validation protocols to restrict access and ensure user-specific customization of output. The user device 114 further enables multi-modal rendering of output data, including text-to-speech conversion, graphical visualizations, and dynamic interface adaptation based on user roles. The user device 114 operates continuously in coordination with the user interface 116 to provide an intuitive and personalized environment for monitoring summaries, receiving notifications, and managing summarization tasks. The user device 114 acts as the operational hub for delivering responsive and contextualized content interaction in the system.
[0066] The user interface 116 is integrated within the user device 114 and is configured to provide a dynamic, intuitive, and interactive layer for visualizing and managing the output of the text summarization system 100. The user interface 116 is designed to present the generated summaries, processing status, domain classification, and contextual metadata in a clear and comprehensible format for user review. The user interface 116 enables users to input real-time configuration parameters such as preferred summary length, summary format, target domain selection, and relevance weighting factors which are transmitted to the hybrid encoder-decoder unit 106 through the communication network 112. The user interface 116 supports multilingual rendering by allowing users to select a preferred language, and accordingly adapt summary content, interface labels, and control panels to match regional or international language preferences. The user interface 116 displays session-based feedback mechanisms enabling users to rate summaries, report inaccuracies, and suggest refinements, which are collected and transferred to the storage unit 118 for continuous learning. The user interface 116 includes interactive control widgets that allow users to switch between input formats, toggle processing stages, visualize retrieved external knowledge, and monitor how the retrieval-augmented generation module 108 contributes to the final summary output. The user interface 116 provides layered visualization panels that include original content, extracted key points and final summaries with annotations representing the contribution of each processing module. The user interface 116 also includes security and authentication elements to validate user access and maintain role-specific visibility, ensuring personalized experiences based on user profiles. The user interface 116 offers customizable dashboards for organizing summary projects, exporting content, and synchronizing data with the storage unit 118. The user interface 116 functions in continuous coordination with the user device 114, ensuring adaptive responsiveness, real-time interactivity, and smooth execution of all summarization-related operations within the system.
[0067] The storage unit 118 is operatively connected to the hybrid encoder-decoder unit 106 and functions as the centralized repository for all intermediate and final outputs generated by the text summarization system 100. The storage unit 118 is configured to store incoming input data streams, including unstructured text, structured documents, image-derived text, and transcribed speech content, ensuring data persistence and traceability across multiple processing sessions. The storage unit 118 maintains historical summaries previously generated by the hybrid encoder-decoder unit 106 along with their associated contextual embedding representations, thereby enabling comparative analytics and version-controlled summary outputs. The storage unit 118 is further configured to retain feedback entries captured through the user interface 116 of the user device 114, which include user ratings, correction prompts, and summary preferences that are used to refine future operations of the domain-specific fine-tuning module 110. The storage unit 118 comprises a vector embedding archive designed to preserve the semantic relationships between original content and generated summaries by storing multidimensional vector encodings computed during contextual embedding generation. The storage unit 118 facilitates retrieval-augmented learning by cataloguing previously retrieved external content indexed by the retrieval-augmented generation module 108, allowing selective reuse of reliable sources. The storage unit 118 is structured into modular segments categorized by document type, processing domain, user profiles, and summary characteristics, enabling targeted data access by the processing units and user device 114. The storage unit 118 operates in real-time synchronization with the communication network 112 to allow consistent data flow between processing modules and user-facing elements, ensuring that updates, corrections, and new inputs are reflected across the system architecture. The storage unit 118 also supports encrypted storage protocols and access-level permissions to maintain confidentiality and integrity of sensitive information processed by the text summarization system 100. The storage unit 118 enables continuous learning, traceability, and reliability of operations within the system framework.
[0068] FIG. 2 illustrates a flowchart of a text summarization system, in accordance with an exemplary embodiment of the present disclosure.
[0069] At 202, initiate content intake by capturing structured or unstructured text, images, and audio through the input channels.
[0070] At 204, sanitize and segment the incoming data using the pre-processing unit to enable uniform downstream interpretation.
[0071] At 206, convert visual and audio signals into textual form using embedded recognition modules within the multimodal input processor.
[0072] At 208, encode context-rich representations and initiate summary generation using the hybrid encoder-decoder unit.
[0073] At 210, augment summarization with up-to-date content dynamically retrieved from external repositories via the retrieval-augmented generation module.
[0074] At 212, customize the output by applying field-specific language adaptations through the domain-specific fine-tuning module.
[0075] At 214, render finalized summaries on the user interface and log session outputs, inputs, and interactions into the storage unit for future optimization.
[0076] FIG. 3 illustrates a flowchart of a text summarization method, in accordance with an exemplary embodiment of the present disclosure.
[0077] At 302, receiving an input data stream at a pre-processing unit, the input data stream comprising at least one of structured or unstructured text, image data, or speech data
[0078] At 304, processing the input data stream at the pre-processing unit by performing tokenization, noise removal, and content normalization, and transmitting the processed data to a multimodal input processor.
[0079] At 306, extracting text from the image data and speech data using the multimodal input processor operatively connected to the pre-processing unit, wherein the multimodal input processor transforms image-based content using optical character recognition and transcribes voice-based input using speech recognition.
[0080] At 308, generating contextual embedding’s at a hybrid encoder-decoder unit operatively connected to the multimodal input processor and the pre-processing unit, the hybrid encoder-decoder unit applying a transformer-based encoding and decoding process to generate an intermediate representation of the content.
[0081] At 310, retrieving external knowledge through a retrieval-augmented generation module integrated within the hybrid encoder-decoder unit, the retrieval-augmented generation module dynamically sourcing relevant information and enriching the contextual embedding’s with the retrieved content.
[0082] At 312, refining the contextual embedding’s at a domain-specific fine-tuning module integrated within the hybrid encoder-decoder unit, the domain-specific fine-tuning module modifying summary content based on application-specific linguistic structure and domain terminology.
[0083] At 314, transmitting the generated summary and associated metadata through a communication network operatively connected to the hybrid encoder-decoder unit.
[0084] At 316, receiving and displaying the generated summary at a user device operatively connected to the communication network, wherein the user device comprises a user interface configured to display the summary, accept feedback, and modify system parameters in real time.
[0085] At 318, storing the input data, generated summaries, and user feedback into a storage unit operatively connected to the hybrid encoder-decoder unit for continual learning and iterative improvement.
[0086] FIG. 4 illustrates a working view of the proposed model, in accordance with an exemplary embodiment of the present disclosure.
[0087] Data collection and loading the data 402 involves sourcing relevant structured and unstructured datasets from repositories, APIs, or internal content archives. The datasets include diverse document types, such as articles, reports, and research summaries. Data collection and loading the data 402 prepares the input for further stages in the model pipeline by organizing the content and initiating storage into the data architecture.
[0088] Data pre-processing 404 prepares the raw data for training by applying text cleaning techniques, removing special characters, correcting inconsistent formatting, and normalizing case structures. Data pre-processing 404 ensures that the cleaned data is compatible with tokenises and prevents irrelevant noise from affecting model accuracy during training.
[0089] Train-Test split 406 separates the pre-processed data into two sets to evaluate generalization and prevent over fitting. Train-Test split 406 allocates a majority of data for training and a smaller portion for validation and evaluation, establishing reproducibility and balanced assessment.
[0090] X_train, Y_train 408 forms the input and target sequence for model training. X_train, Y_train 408 organizes the processed text pairs in a format expected by the encoder-decoder, representing input texts and corresponding summaries required for supervised learning.
[0091] X_test, Y_test 410 comprises unseen data samples used for testing the model's generalization ability after training. X_test, Y_test 410 allows performance validation and ensures that the model generates accurate summaries for previously unknown inputs.
[0092] Model training 412 executes the back propagation algorithm across the encoder-decoder network using X_train, Y_train 408 and optimizes weights using a loss function. Model training 412 applies multiple epochs and batch cycles, ensuring convergence and stability.
[0093] Tokenization and Encoding 414 uses a tokenize compatible with the T5 model to convert text into token IDs and encode them as input embedding’s. Tokenization and Encoding 414 standardizes the data structure and embeds syntactic relationships between words and phrases.
[0094] T5 model fine tuning 416 adjusts a pre-trained T5 model using domain-specific data. T5 model fine tuning 416 leverages transfer learning to align model weights with the target summarization task and adapt to specific language styles or structures.
[0095] Saving the model for future evaluation 418 persists the fine-tuned model's state, parameters, and configuration to local or cloud storage. Saving the model for future evaluation 418 ensures that the trained architecture remains reusable for inference and re-evaluation.
[0096] Visualizing the training process 420 generates real-time plots of training loss, validation accuracy, and convergence trends. Visualizing the training process 420 provides insights into over fitting, training stability, and the effect of hyper parameters.
[0097] Generating summaries for the test data for evaluation 422 uses the saved model to create summaries from X_test 410. Generating summaries for the test data for evaluation 422 benchmarks the model output against reference summaries.
[0098] Evaluation and results 424 collects the generated and reference summaries and calculates performance using quantitative metrics. Evaluation and results 424 highlights areas for improvement and certifies the model's effectiveness.
[0099] Similarity scores 426 contains numerical indicators that assess semantic closeness between model-generated summaries and reference content. Similarity scores 426 incorporates cosine similarity and BERT-based semantic matching.
[0100] Cosine 428 calculates vector similarity by measuring the cosine angle between generated summary vectors and reference vectors. Cosine 428 evaluates alignment in terms of direction rather than magnitude.
[0101] Bert 430 performs contextual similarity comparison using pre-trained language embedding’s to evaluate token-level and sentence-level meaning. BERT 430 enhances semantic evaluation for nuanced understanding
[0102] ROUGE metrics 432 assess textual overlap between generated summaries and references using n-gram, word sequence, and sentence-level matching. ROUGE metrics 432 includes ROUGE-1, ROUGE-2, and ROUGE-L for comprehensive evaluation.
[0103] Recall 434 determines how many relevant units in the reference summary are present in the generated summary. Recall 434 reflects the model's ability to cover the key content.
[0104] Precision 436 calculates the proportion of relevant units in the generated summary that appear in the reference. Precision 436 indicates the focus and relevance of the model output.
[0105] F1-score 438 harmonizes precision and recall by computing their harmonic mean. F1-score 438 presents an overall effectiveness indicator that balances coverage and conciseness.
[0106] Visualizing evaluation 440 presents’ performance metrics through bar graphs, heat maps, or scatter plots to assist in comparative analysis. Visualizing evaluation 440 enhances interpretability and supports fine-tuning decisions.
[0107] User input summarization 442 accepts live textual input from the user and applies the trained model to generate and display real-time summaries. User input summarization 442 validates usability and deploys the model for practical application.
[0108] FIG. 5 illustrates a 3D bar chart of the performance of T5 model, in accordance with an exemplary embodiment of the present disclosure.
[0109] Figure 5 illustrates a three-dimensional bar chart representing the ROUGE scores analysis of the proposed T5 model. The chart displays metrics including ROUGE-1, ROUGE-2, ROUGE-N, and F1 scores across recall, precision, and overall values. The performance of the T5 model remains consistent across evaluation criteria, indicating strong summarization capabilities and reliable metric-based outcomes.
[0110] While the invention has been described in connection with what is presently considered to be the most practical and various embodiments, it will be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims.
[0111] A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware, computer software, or a combination thereof.
[0112] The foregoing descriptions of specific embodiments of the present disclosure have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described to best explain the principles of the present disclosure and its practical application, and to thereby enable others skilled in the art to best utilize the present disclosure and various embodiments with various modifications as are suited to the particular use contemplated. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but such omissions and substitutions are intended to cover the application or implementation without departing from the scope of the present disclosure.
[0113] Disjunctive language such as the phrase “at least one of X, Y, Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
[0114] In a case that no conflict occurs, the embodiments in the present disclosure and the features in the embodiments may be mutually combined. The foregoing descriptions are merely specific implementations of the present disclosure, but are not intended to limit the protection scope of the present disclosure. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in the present disclosure shall fall within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
, Claims:I/We Claim:
1. A text summarization system (100) comprising:
a pre-processing unit (102), configured to receive an input data stream comprising unstructured or structured text and to perform tokenization, noise removal, and content normalization on the input data stream;
a multimodal input processor (104), operatively connected to the pre-processing unit (102), and configured to extract textual information from different input formats including image data and speech data using embedded conversion mechanisms for image-to-text transformation and speech-to-text transcription;
a hybrid encoder-decoder unit (106), operatively connected to the pre-processing unit (102), and the multimodal input processor (104), the hybrid encoder-decoder unit (106), comprising a transformer-based encoder and a transformer-based decoder configured to convert processed input into contextual embedding’s and generate corresponding text summaries using sequential decoding and attention-based analysis;
a retrieval-augmented generation module (108), integrated within the hybrid encoder-decoder unit (106), the retrieval-augmented generation module (108), configured to dynamically retrieve relevant content from one or more external information sources and incorporate the retrieved content into contextual embedding’s for enriched summary generation;
a domain-specific fine-tuning module (110), integrated within the hybrid encoder-decoder unit (106), the domain-specific fine-tuning module (110), configured to refine the summarization process based on linguistic structures, content patterns, and terminology associated with a particular application domain;
a communication network (112), operatively connected to the hybrid encoder-decoder unit (106), and configured to transmit generated summaries and contextual data;
a user device (114), operatively connected to the communication network (112), and configured to receive, display, and store generated summaries, configuration inputs, and system feedback;
a user interface (116) associated with the user device (114), the user interface (116), configured to provide visualization of the generated summaries, accept user-defined summary preferences, and allow real-time configuration of summarization parameters;
a storage unit (118), operatively connected to the hybrid encoder-decoder unit (106), and configured to maintain historical content data, previously generated summaries, and feedback entries for adaptive learning and performance improvement.
2. The system (100) as claimed in claim 1, wherein the pre-processing unit (102), further comprises a language detection module configured to identify the input language and activate language-specific normalization pipelines.
3. The system (100) as claimed in claim 1, wherein the multimodal input processor (104), further comprises an optical character recognition module configured to extract text from scanned image content embedded within documents.
4. The system (100) as claimed in claim 1, wherein the multimodal input processor (104), further comprises an automatic speech recognition module configured to convert voice-based or recorded input into structured textual content for summarization.
5. The system (100) as claimed in claim 1, wherein the hybrid encoder-decoder unit (106), further comprises a cross-modal alignment unit configured to merge and align textual content derived from multiple input formats for unified context interpretation.
6. The system (100) as claimed in claim 1, wherein the retrieval-augmented generation module (108), further comprises an index-ranking sub module configured to rank retrieved content from external databases based on contextual proximity to the input content.
7. The system (100) as claimed in claim 1, wherein the domain-specific fine-tuning module (110), further comprises a domain classifier configured to automatically identify the topical domain of the input text and apply corresponding adaptation strategies.
8. The system (100) claimed in claim 1, wherein the user interface (116), further comprises a multilingual rendering module configured to display summaries and interface prompts in user-selected regional or international languages.
9. The system (100) as claimed in claim 1, wherein the storage unit (118), further comprises a vector embedding archive configured to retain semantic representations of input and summary texts for historical comparison and adaptive learning.
10. A text summarization method (100) comprising:
receiving an input data stream at a pre-processing unit (102), the input data stream comprising at least one of structured or unstructured text, image data, or speech data;
processing the input data stream at the pre-processing unit (102), by performing tokenization, noise removal, and content normalization, and transmitting the processed data to a multimodal input processor (104);
extracting text from the image data and speech data using the multimodal input processor (104), operatively connected to the pre-processing unit (102), wherein the multimodal input processor (104), transforms image-based content using optical character recognition and transcribes voice-based input using speech recognition;
generating contextual embedding’s at a hybrid encoder-decoder unit (106), operatively connected to the multimodal input processor (104), and the pre-processing unit (102), the hybrid encoder-decoder unit (106), applying a transformer-based encoding and decoding process to generate an intermediate representation of the content;
retrieving external knowledge through a retrieval-augmented generation module (108), integrated within the hybrid encoder-decoder unit (106), the retrieval-augmented generation module (108), dynamically sourcing relevant information and enriching the contextual embedding’s with the retrieved content;
refining the contextual embedding’s at a domain-specific fine-tuning module (110), integrated within the hybrid encoder-decoder unit (106), the domain-specific fine-tuning module (110), modifying summary content based on application-specific linguistic structure and domain terminology;
transmitting the generated summary and associated metadata through a communication network (112), operatively connected to the hybrid encoder-decoder unit (106);
receiving and displaying the generated summary at a user device (114), operatively connected to the communication network (112), wherein the user device (114), comprises a user interface (116), configured to display the summary, accept feedback, and modify system parameters in real time;
storing the input data, generated summaries, and user feedback into a storage unit (118), operatively connected to the hybrid encoder-decoder unit (106), for continual learning and iterative improvement.
| # | Name | Date |
|---|---|---|
| 1 | 202541041800-STATEMENT OF UNDERTAKING (FORM 3) [30-04-2025(online)].pdf | 2025-04-30 |
| 2 | 202541041800-REQUEST FOR EARLY PUBLICATION(FORM-9) [30-04-2025(online)].pdf | 2025-04-30 |
| 3 | 202541041800-POWER OF AUTHORITY [30-04-2025(online)].pdf | 2025-04-30 |
| 4 | 202541041800-FORM-9 [30-04-2025(online)].pdf | 2025-04-30 |
| 5 | 202541041800-FORM FOR SMALL ENTITY(FORM-28) [30-04-2025(online)].pdf | 2025-04-30 |
| 6 | 202541041800-FORM 1 [30-04-2025(online)].pdf | 2025-04-30 |
| 7 | 202541041800-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [30-04-2025(online)].pdf | 2025-04-30 |
| 8 | 202541041800-DRAWINGS [30-04-2025(online)].pdf | 2025-04-30 |
| 9 | 202541041800-DECLARATION OF INVENTORSHIP (FORM 5) [30-04-2025(online)].pdf | 2025-04-30 |
| 10 | 202541041800-COMPLETE SPECIFICATION [30-04-2025(online)].pdf | 2025-04-30 |