Abstract: The present invention provides a system and method for compressing contextually significant data into predefined memory blocks without losing semantic integrity or structural relationships. The system (100) comprising of Data Analysis Module (120), Entity Extraction Module (130), Encoding Module (140), Serialized Storage module (150), Graph Representation Module (160), Context Window Management Module (170) and Query Execution Module (180). The system identifies, extracts, and encodes data entities and their interrelationships into compressed representations enriched with metadata. This metadata ensures accurate reconstruction and seamless retrieval. The system employs a serialized storage format for efficient organization, along with a graph-based structuring methodology to maintain contextual fidelity and enable advanced querying. A dynamic adjustment mechanism optimizes compression to adapt to varying memory capacities, ensuring efficient processing and storage of large datasets. Applicable across diverse industries, the system preserves data relevance, scalability, and accessibility, offering a robust, context-preserving solution for modern data management challenges.
Description:FIELD OF INVENTION
The present invention relates to a system and method for advanced data compression and contextual management. The present invention further relates to the system and method for compressing contextually significant data into predefined memory blocks while preserving semantic integrity and structural relationships.
BACKGROUND
The increasing complexity of modern computing systems and growth of data have led to significant challenges in effectively storing, processing, and retrieving large-scale datasets. In particular, the limitations of memory capacity in computational environments, such as large language models (LLMs) and other advanced processing systems, create substantial barriers to efficient data management. These systems often operate within constrained memory spaces, which limits their ability to analyse and extract meaningful information from voluminous datasets. As such, preserving the integrity and relevance of the data—while ensuring it remains usable and accessible has become an imperative.
Traditionally, various techniques for compressing and storing data have been employed, primarily focusing on reducing the physical size of datasets. However, such methods often sacrifice critical aspects of data, such as its semantic meaning and the relationships between different data points. Common methods, including basic encoding and static mapping, are designed to minimize data size but fail to account for the intricate interdependencies and contextual relevance of the information. These conventional approaches are typically static, inflexible, and not equipped to handle the complex nature of modern data, particularly when data sources are heterogeneous or highly interrelated.
Additionally, many of these methods fail to perform optimally when memory resources are limited, leading to inefficiencies and difficulties in accessing and processing the data. These traditional systems lack the ability to adapt dynamically to fluctuating memory conditions. In cases where large datasets exceed the available memory space, they struggle to prioritize or represent the most important aspects of the data, often resulting in diminished accuracy, difficulty in retrieval, and a general inability to retain the essential structure and meaning of the original dataset.
Compared to existing methods that manage context compression primarily under GPU constraints or use multiple scoring models, a solution is needed that enhances efficiency with dynamic memory management, graph structuring, and serialized data encoding. These improvements can provide a solution to the existing problems with more accurate compression while preserving contextual fidelity, making it more adaptable for a broader range of applications.
As such, there remains a significant gap in the available technologies for efficiently managing and compressing data without losing essential context and relationships between data entities. The need for an innovative solution that addresses the inherent limitations of traditional methods is evident. Modern applications, especially those in the realm of advanced data processing, require a system that can compress data while maintaining its semantic integrity and adaptability to memory constraints.
PRIOR ART
CN117271780A discusses a method and system for compressing context by acquiring a text to be compressed, adding task prompts, and optimizing GPU resource utilization. The method comprises the following steps: acquiring a text to be compressed, and adding task description, separator and compression groove; under the condition that GPU resources are short, the text to be compressed is compressed by utilizing the existing large language model, a projection layer is additionally trained, and when the GPU resources are abundant, the text to be compressed is compressed by the pre-training large language model; and reasoning the trained large language model to generate text replies.
CN118261254A details processing method and a device for compressing a long text, wherein the method comprises the following steps: constructing a first scoring model, a second scoring model and a first decision model and training; receiving an arbitrary length text and a corresponding problem text after model training; performing text noise reduction, text standardization and sentence conversion on the long text and the problem text to obtain sentence sequence problem sentences; inputting the sentence sequence and the question sentence into a first scoring model for processing to obtain a correlation scoring sequence; scoring semantic consistency of each sentence of the sentence sequence based on a second scoring model; inputting the sentence feature vector sequence corresponding to the sentence sequence into a first decision model for processing to obtain a decision type sequence; deleting the sentences corresponding to the decision types which are specifically deleted from the long text, and outputting the deleted long text as a compressed text.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system and method for efficiently compressing context data into predefined memory blocks while preserving the semantic meaning and structural relationships inherent within the data.
Another object of the present invention is to provide a system and method for compressing context data that ensures that data remains contextually relevant and accessible, even within constrained memory environments.
Another object of the invention is to provide a system and method for compressing context data that uses a dynamic data compression approach that adapts to varying memory capacities, allowing for optimal storage, retrieval, and querying of large datasets without compromising data integrity.
A further object of the invention is to provide a system and method for compressing context data that facilitates the efficient categorization, encoding, and storage of data entities and their interrelationships, ensuring that the data retains its contextual fidelity and can be effectively analysed and processed within limited memory constraints.
Yet another object of the invention is to provide a system and method for compressing context data that overcomes the limitations of traditional data compression techniques, thereby providing a scalable, adaptable solution for modern data processing systems, particularly in environments where memory resources are limited.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention provides a system and method for compressing contextually significant data into predefined memory blocks while maintaining the semantic integrity and structural relationships of the data. Leveraging advanced computational techniques, including Large Language Models (LLMs), the invention identifies, extracts, and encodes data entities and their interrelationships into compressed formats stored in a serialized structure enriched with metadata. A graph-based organization is employed to enable efficient querying and retrieval, while a dynamic adjustment mechanism ensures adaptability to constrained memory environments, such as context windows in LLMs. The invention addresses limitations of traditional compression methods by offering scalability, preserving data relevance, and ensuring efficient processing without compromising usability or contextual coherence. This description serves as a general overview and is not intended to restrict the scope of the invention, which is adaptable to various methodologies and materials as recognized by those skilled in the art.
BRIEF DESCRIPTION OF DRAWINGS
The present invention, together with further objects and advantages thereof, is more particularly described in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart illustrating a system according to an illustrative embodiment of the invention;
FIG. 2 is a block diagram illustrating the system and method according to an illustrative embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention provides a system and method for compressing contextually significant data into predefined memory blocks while maintaining the semantic integrity and structural relationships of the data. This approach integrates advanced data processing techniques, optimized resource management, and enhanced metadata handling, distinguishing it from existing methods. The present invention is directed to operate through a systematic series of procedural steps to achieve the objective of effective data compression while maintaining the semantic integrity and contextual relationships of the data. The invention introduces a novel system and method designed to overcome the limitations of traditional compression techniques by employing advanced computational strategies and dynamic adaptability to constrained memory environments.
Sr. No. Component
100 system
120 Data Analysis Module
130 Entity Extraction Module
140 Encoding Module
150 Serialized Storage module
160 Graph Representation Module
170 Context Window Management Module
180 Query Execution Module
According to an embodiment of the present invention, the system (100) comprises of a processing means that consists of the following core components: Data Analysis Module (120), Entity Extraction Module (130), Encoding Module (140), Serialized Storage module (150), Graph Representation Module (160), Context Window Management Module (170) and Query Execution Module (180).
According to FIG. 1 of the present invention, the system (100) initiates with a Data Analysis Module (120) that performs comprehensive analysis of input data. It identifies data types and categorizes attributes to ensure structured inputs for subsequent processing stages. Following data analysis, the system advances to the Entity Extraction Module (130). This module focuses on extracting specific entities from the data and identifying relationships among them. This step is crucial for preparing the data for encoding and ensures that meaningful associations are captured. The Encoding Module (140) encodes both entities and their relationships into a machine-readable format. This step ensures that the data is structured, consistent, and efficiently retrievable, acting as a bridge between data extraction and storage. The system stores encoded data in a serialized format and manages it through a Serialized Storage Module (150). The Graph Representation Module (160) then organizes the data into a visual graph structure for efficient query execution. The Context Window Management module (170) dynamically adjusts the compression process to fit within constrained memory environments (e.g., LLM context windows). It also handles the memory limits. The Query Execution Module (180) allows for the retrieval of data and generation of responses, enabling the system to deliver actionable insights.
According to the embodiment of the present invention, the system and method of the present invention is directed to operate through a systematic series of procedural steps to achieve the objective of effective data compression while maintaining the semantic integrity and contextual relationships of the data. The invention describes a novel system and method designed to overcome the limitations of traditional compression techniques by employing advanced computational strategies and dynamic adaptability to constrained memory environments.
According to FIG. 2 of the present invention, the method of compressing connect data of the present invention is described. Initially, the data undergoes a comprehensive analysis to ascertain its structural components, functional attributes, and overall composition in the data analysis module (120). This detailed analysis facilitates the categorization of the data into distinct groups based on its type, format, or inherent attributes. The categorization serves as the foundation for targeted and precise processing. Data entities, representing the core components or elements of the dataset, are meticulously identified and extracted by the entity extraction module (130). The extraction of these entities is critical as it forms the basis for defining and preserving the relationships inherent within the data. Once the entities are identified, the invention employs algorithms to encode the relationships between these entities into a compressed representation. The encoding process in the encoding module (140) transforms the data into a compact yet contextually rich format, ensuring no loss of meaning or structural coherence. This step is performed using advanced computational techniques, such as contextual embeddings and dynamic analysis, to retain the intricate interdependencies within the data.
To further enhance data utility, the compressed representation is enriched with metadata. The metadata provides detailed descriptions of the attributes and interrelations of the entities, enabling accurate reconstruction and seamless analysis during data retrieval operations. The metadata enrichment ensures that the contextual relevance of the data is preserved throughout the compression process, facilitating efficient access and usability.
According to the embodiment of the present invention, in the next step, the serialized storage format is employed to systematically organize and store the compressed data in the serialised storage module (150). This format is pivotal in maintaining data accessibility and integrity. The serialized approach includes mechanisms for indexing and linking data entities, thereby enhancing retrieval efficiency and supporting the execution of complex queries across the dataset. This ensures that the data remains contextually coherent and ready for a variety of applications, even under memory-constrained conditions.
According to the embodiment of the present invention, the system further incorporates a graph-based structuring methodology to organize the compressed data in the graph representation module (160). In this methodology, data entities are represented as nodes, while their interrelationships are depicted as edges within a graph and all data codes are linked in a graph structure, enabling efficient streaming and querying. This graphical representation provides a clear and intuitive visualization of the dataset’s connections. The graph-based structure not only preserves the contextual relationships but also enables advanced querying and analytical operations, ensuring data fidelity at all stages of compression and retrieval.
According to the embodiment of the present invention, while addressing scenarios where datasets exceed predefined memory constraints, the system employs a dynamic adjustment mechanism by memory block management. This mechanism enables the system to iteratively process and compress data, allowing large and complex datasets to be accommodated effectively. The iterative approach ensures that all data, irrespective of its volume or intricacy, is compressed and stored within the specified memory limitations without compromising its usability or contextual integrity. The Context Window Management module (170) then dynamically adjusts the compression process to fit within constrained memory environments (e.g., LLM context windows). It also handles the memory limits. The Query Execution Module (180) at the end of the process allows for the retrieval of data and generation of responses, enabling the system to deliver actionable insights.
According to the embodiment of the present invention, the present invention utilizes advanced methods to compress data while preserving its meaning and interrelationships. This invention employs an innovative approach to identify and categorize data entities, extract their relationships, and encode them in a manner that maintains both their integrity and contextual relevance. By leveraging modern computational techniques, the system ensures that the essential elements of the dataset are preserved even when memory resources are limited. This solution enables efficient querying and analysis of large datasets within constrained memory environments, overcoming the inefficiencies and limitations of traditional data compression methods. Thus, this invention provides a comprehensive and scalable solution for modern data processing, offering significant improvements in both performance and accuracy, and ensuring that data remains actionable and contextually meaningful.
The invention’s adaptability to varying memory capacities is achieved through a Context Window Management Module (170), which dynamically adjusts the compression process. This module leverages real-time analysis to optimize memory usage, ensuring the efficient storage and retrieval of data while maintaining its contextual relevance. The system’s ability to dynamically resize context windows based on input data complexity and target environment constraints further enhances its efficiency and scalability.
This invention finds utility across a wide range of industries, including but not limited to telecommunications, healthcare, finance, and information management. For instance, in telecommunications, it enables the compression of extensive call records and user data into manageable formats, facilitating efficient analysis and storage. In the healthcare domain, the system can compress large-scale medical records while preserving their structural and contextual information for seamless retrieval and review. Similarly, in the financial sector, the invention supports the compression and analysis of transactional and customer data, ensuring efficient data management without loss of integrity.
By presenting a systematic and well-defined method for data compression, the present invention achieves an optimal balance between storage efficiency and accessibility. The use of graph-based organization, dynamic adjustment mechanisms, and metadata enrichment ensures that the data remains fully accessible and meaningful, even in constrained computational environments. The invention thus provides a robust solution to contemporary data management challenges, eliminating the risk of memory overload and offering a scalable, efficient, and context-preserving approach to data compression and retrieval.
The invention hereafter will be cited by way of examples only for a better and detailed understanding.
EXAMPLE-
The present invention can be used in a repository. The input for the system is a 5GB software repository consisting of source code files, binaries, and metadata. Applying the method of the present invention, the Source code files are first compressed to include definitions, method abstracts, and high-level designs. The Binaries retain only their relationships with other files. The data is structured as a graph and streamed for querying. Compression iterates until the entire repository fits within the required 128K context window. As an outcome, the repository is structured within the LLM’s context window, enabling efficient and contextual querying.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:We claim,
1. A system and method for compressing context data into defined memory blocks
characterised in that,
the system (100) comprising of Data Analysis Module (120), Entity Extraction Module (130), Encoding Module (140), Serialized Storage module (150), Graph Representation Module (160), Context Window Management Module (170) and Query Execution Module (180); and
the method comprises the steps of
a. Identifying and categorizing source data and attributes using a Large Language Model (LLM);
b. Extracting entities and determining relationships within the source data using data-specific algorithms;
c. Encoding the identified entities and relationships into compressed data codes using an LLM;
d. Storing the compressed data codes alongside associated metadata in serialized storage;
e. Structuring the compressed data codes into a graph representation to facilitate efficient querying and retrieval;
f. Adjusting the compression process to fit within predefined memory blocks, such as those in a context window of an LLM.
2. The system and method as claimed in claim 1, wherein the system comprises of
a. Data Analysis Module (120) configured to utilize a Large Language Model (LLM) for identifying data types and their associated attributes;
b. Entity Extraction Module (130) designed to extract entities and determine relationships within the data;
c. Encoding Module (140) responsible for transforming entities and relationships into compressed data codes using an LLM;
d. Serialized Storage Module (150) configured to store the compressed data codes alongside metadata;
e. Graph Representation Module (160) for organizing the compressed data codes into a graph structure for efficient querying and retrieval;
f. Context Window Management Module (170) configured to adjust the compression process to fit predefined memory blocks in constrained environments;
g. Query Execution Module (180) that allows for the retrieval of data and generation of responses, enabling the system to deliver actionable insights.
3. The system and method as claimed in claim 1, wherein the Data Analysis Module (120) utilizes contextual embeddings generated by the LLM to improve the accuracy of data type identification and attribute extraction.
4. The system and method as claimed in claim 1, wherein the data entities that represent the core components or elements of the dataset, are identified and extracted by the entity extraction module (130) that forms the basis for defining and preserving the relationships inherent within the data.
5. The system and method as claimed in claim 1, wherein the encoding process in the encoding module (140) encodes the relationships between the said data entities into a compressed representation by transforming the data into a compact and contextually rich format, ensuring no loss of meaning or structural coherence by using contextual embeddings and dynamic analysis, to retain the intricate interdependencies within the data.
6. The system and method as claimed in claim 1, wherein the compressed representation is enriched with metadata that provides detailed descriptions of the attributes and interrelations of the entities, enabling accurate reconstruction and seamless analysis during data retrieval operations and this ensures that the contextual relevance of the data is preserved throughout the compression process, facilitating efficient access and usability.
7. The system and method as claimed in claim 1, wherein the serialized storage format is employed to systematically organize and store the compressed data in the serialised storage module (150) and this includes mechanisms for indexing and linking data entities, thereby enhancing retrieval efficiency and supporting the execution of complex queries across the dataset.
8. The system and method as claimed in claim 1, wherein the system incorporates the graph-based structuring methodology to organize the compressed data in the graph representation module (160) where the data entities are represented as nodes, while their interrelationships are depicted as edges within a graph and all data codes are linked in a graph structure, enabling efficient streaming and querying.
9. The system and method as claimed in claim 1, wherein the Context Window Management module (170) then dynamically adjusts the compression process to fit within constrained memory environments for e.g. LLM context windows and it also handles memory limits and the Query Execution Module (180) at the end of the process allows for the retrieval of data and generation of responses, enabling the system to deliver actionable insights.
10. The system and method as claimed in claim 1, wherein the adjustment of the compression process involves dynamically resizing context windows based on the complexity of the input data and the constraints of the target environment.
| # | Name | Date |
|---|---|---|
| 1 | 202421101207-STATEMENT OF UNDERTAKING (FORM 3) [20-12-2024(online)].pdf | 2024-12-20 |
| 2 | 202421101207-FORM 1 [20-12-2024(online)].pdf | 2024-12-20 |
| 3 | 202421101207-FIGURE OF ABSTRACT [20-12-2024(online)].pdf | 2024-12-20 |
| 4 | 202421101207-DRAWINGS [20-12-2024(online)].pdf | 2024-12-20 |
| 5 | 202421101207-DECLARATION OF INVENTORSHIP (FORM 5) [20-12-2024(online)].pdf | 2024-12-20 |
| 6 | 202421101207-COMPLETE SPECIFICATION [20-12-2024(online)].pdf | 2024-12-20 |
| 7 | 202421101207-FORM-26 [23-12-2024(online)].pdf | 2024-12-23 |
| 8 | Abstract1.jpg | 2025-02-06 |
| 9 | 202421101207-POA [22-02-2025(online)].pdf | 2025-02-22 |
| 10 | 202421101207-MARKED COPIES OF AMENDEMENTS [22-02-2025(online)].pdf | 2025-02-22 |
| 11 | 202421101207-FORM 13 [22-02-2025(online)].pdf | 2025-02-22 |
| 12 | 202421101207-AMMENDED DOCUMENTS [22-02-2025(online)].pdf | 2025-02-22 |
| 13 | 202421101207-FORM-9 [25-09-2025(online)].pdf | 2025-09-25 |
| 14 | 202421101207-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |