System And Method For Data Compression And Aggregation

< Back

System And Method For Data Compression And Aggregation

Abstract: ABSTRACT SYSTEM AND METHOD FOR DATA COMPRESSION AND AGGREGATION The present invention relates to a system (108) and a method (600) for data compression and aggregation. The method (600) includes step of retrieving, data from a distributed file system (110). The method (600) further includes step of analysing, utilizing an Artificial Intelligence/Machine Learning (AI/ML) model (220), at least one of, data types, data patterns, data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. The method (600) further includes step of selecting, utilizing the AI/ML model (220), one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis. The method (600) further includes step of bucketing, utilizing the AI/ML model (220), compressed data to group the retrieved data in a plurality of buckets. The method (600) further includes step of aggregating, utilizing the AI/ML model (220), the data within the plurality of buckets. The method (600) further includes step of storing, the aggregated data in the distributed file system (110). Ref. Fig. 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

19 January 2024

Publication Number

30/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

JIO PLATFORMS LIMITED

OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA

Inventors

1. Aayush Bhatnagar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

2. Ankit Murarka

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

3. Jugal Kolariya

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

4. Gaurav Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

5. Kishan Sahu

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

6. Rahul Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

7. Sunil Meena

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

8. Gourav Gurbani

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

9. Sanjana Choudhary

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

10. Chandrakumar Ganveer

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

11. Yogesh Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

12. Kunal Talgote

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

13. Niharika Patnam

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

14. Avinash Kushwaha

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

15. Dharmendra Kumar Vishwakarma

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

Specification

DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003

COMPLETE SPECIFICATION
(See section 10 and rule 13)
1. TITLE OF THE INVENTION
SYSTEM AND METHOD FOR DATA COMPRESSION AND AGGREGATION
2. APPLICANT(S)
NAME NATIONALITY ADDRESS
JIO PLATFORMS LIMITED INDIAN OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA
3.PREAMBLE TO THE DESCRIPTION

THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE NATURE OF THIS INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.

FIELD OF THE INVENTION
[0001] The present invention relates to the field of wireless communication systems, more particularly relates to a method and a system for data compression and aggregation.
BACKGROUND OF THE INVENTION

[0002] A network incorporating various network elements generate large volume of data which must be stored for further analysis, in order to assess the network variables and network performance as well as to improve on service quality. Such huge amount of data possesses a larger storage requirement and needs longer preprocessing time during analysis.
[0003] The data generated for the network elements such as various Network Functions (NFs) is stored in a distributed File system and the data primarily pertains to network performance data. The volume of the data generated for each NF for a day can vary from 1TB to 50TB. To store such large volume of data for a certain time period, for example, 6 months would entail a requirement of data space of 150TB to 7.5PB (peta byte).
[0004] Moreover, analyzing such sheer volume of data needs longer time to preprocess so as to remove any unnecessary information. In other words, as the data generated for the NFs which are stored in the distributed file system is huge. So, longer time is required to read, extract, format and process the data to obtain useful information from raw information. Therefore, there is a requirement of an organized mechanism to address the issues of storage requirement and time consumption for analysis.
[0005] There is, therefore, a need for a system and method thereof to compress and aggregate the generated data in the network.
SUMMARY OF THE INVENTION
[0006] One or more embodiments of the present disclosure provides a method and a system for data compression and aggregation.
[0007] In one aspect of the present invention, the method for data compression and aggregation is disclosed. The method includes the step of retrieving, by one or more processors, data from a distributed file system. The method further includes the step of analysing, by the one or more processors, utilizing an Artificial Intelligence/Machine Learning (AI/ML) model, data types, data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. The method further includes the step selecting, by the one or more processors, utilizing the AI/ML model, one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis. The method further includes the step of bucketing, by the one or more processors, utilizing the AI/ML model, compressed data to group the compressed data in a plurality of buckets. The method further includes the step of aggregating, by the one or more processors, utilizing the AI/ML model, the data within the plurality of buckets. The method further includes the step of storing, by the one or more processors, the aggregated data in the distributed file system.
[0008] In another embodiment, the data corresponds to network performance data of one or more Network Functions (NFs).
[0009] In yet another embodiment, upon retrieving the data, the method comprises the steps of identifying, by the one or more processors, a format of the retrieved data, wherein the format of the retrieved data is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format and extracting, by the one or more processors, one or more features from the retrieved data.
[0010] In yet another embodiment, the output format is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format.
[0011] In yet another embodiment, the data types are one of a numerical data, categorical data, and textual data.
[0012] In yet another embodiment, the bucketing and aggregating is performed based on the requirement of the user.
[0013] In another aspect of the present invention, the system for data compression and aggregation is disclosed. The system includes a retrieving module configured to retrieve, data from a distributed file system. The system further includes an analysing module configured to analyse, utilizing an Artificial Intelligence/Machine Learning (AI/ML) model, at least one of, data types, data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. The system further includes a compressor module configured to select, utilizing the AI/ML model, one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis. The system further includes an aggregation module configured to bucket, utilizing the AI/ML model, compressed data to group the compressed data in a plurality of buckets. The aggregation module is further configured to aggregate, utilizing the AI/ML model, the data within the plurality of buckets. The aggregation module is further configured to store the aggregated data in the distributed file system.
[0014] In yet another aspect of the present invention, a non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor. The processor is configured to retrieve, data from a distributed file system. The processor is further configured to analyse, utilizing the Artificial Intelligence/Machine Learning (AI/ML) model, data types, data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. The processor is further configured to select, utilizing the AI/ML model, one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis. The processor is further configured to bucket, utilizing the AI/ML model, compressed data to group the compressed data in a plurality of buckets. The processor is further configured to aggregate, utilizing the AI/ML model, the data within the plurality of buckets. The processor is further configured to store the aggregated data in the distributed file system.
[0015] Other features and aspects of this invention will be apparent from the following description and the accompanying drawings. The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art, in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0017] FIG. 1 is an exemplary block diagram of an environment for data compression and aggregation, according to one or more embodiments of the present invention;
[0018] FIG. 2 is an exemplary block diagram of a system for the data compression and aggregation, according to one or more embodiments of the present invention;
[0019] FIG. 3 is an exemplary architecture of the system of FIG. 2, according to one or more embodiments of the present invention;
[0020] FIG. 4 is an exemplary architecture for the data compression and aggregation, according to one or more embodiments of the present disclosure;
[0021] FIG. 5 is an exemplary signal flow diagram illustrating the flow for the data compression and aggregation; and
[0022] FIG. 6 is a flow diagram of a method for the data compression and aggregation, according to one or more embodiments of the present invention.
[0023] The foregoing shall be more apparent from the following detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0024] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0025] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0026] A person of ordinary skill in the art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments.
[0027] Various embodiments of the present invention provide a system and a method for data compression and aggregation. The system is configured to address the problem of larger data space or storage requirement as well as higher time consumption for processing the data. The most unique aspect of the invention lies in the ability of the system to utilize Artificial Intelligence/ Machine Learning (AI/ML) to reduce the data space or storage requirement up to 90 % in a Distributed File System. The system is configured to at least one of, identify a suitable format for writing the data, identify a suitable compression methodology/technique to compress the data, and to group the compressed data in a plurality of buckets . Further the data within the plurality of buckets is aggregated. In particular, the system is configured to decrease the data size by compressing and aggregating the data which also decreases the time taken to perform the analysis utilizing the compressed data.
[0028] Referring to FIG. 1, FIG. 1 illustrates an exemplary block diagram of an environment 100 for data compression and aggregation according to one or more embodiments of the present invention. The environment 100 includes a User Equipment (UE) 102, a server 104, a network 106, a system 108, and a distributed file system 110.
[0029] For the purpose of description and explanation, the description will be explained with respect to one or more user equipment’s (UEs) 102, or to be more specific will be explained with respect to a first UE 102a, a second UE 102b, and a third UE 102c, and should nowhere be construed as limiting the scope of the present disclosure. Each of the at least one UE 102 namely the first UE 102a, the second UE 102b, and the third UE 102c is configured to connect to the server 104 via the network 106.
[0030] In an embodiment, each of the first UE 102a, the second UE 102b, and the third UE 102c is one of, but not limited to, any electrical, electronic, electro-mechanical or an equipment and a combination of one or more of the above devices such as smartphones, Virtual Reality (VR) devices, Augmented Reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device.
[0031] The network 106 includes, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof. The network 106 may include, but is not limited to, a Third Generation (3G), a Fourth Generation (4G), a Fifth Generation (5G), a Sixth Generation (6G), a New Radio (NR), a Narrow Band Internet of Things (NB-IoT), an Open Radio Access Network (O-RAN), and the like.
[0032] The network 106 may also include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth.
[0033] The environment 100 includes the server 104 accessible via the network 106. The server 104 may include by way of example but not limitation, one or more of a standalone server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, a processor executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an embodiment, the entity may include, but is not limited to, a vendor, a network operator, a company, an organization, a university, a lab facility, a business enterprise side, a defense facility side, or any other facility that provides service.
[0034] The environment 100 further includes the distributed file system 110 which acts as at least one source among a plurality of sources. In one embodiment, the distributed file system 110 is the origin from which the data is retrieved and utilized for at least one of, but not limited to, analysis, research, and decision-making. In one embodiment, the distributed file system 110 is a file system that allows access to files stored on multiple servers or machines in the network 106, as the files are stored on a single system. In particular, the distributed file system 110 allows users to access and manage files seamlessly, regardless of where the data is physically stored.
[0035] In an alternate embodiment, the distributed file system 110 acts as the one or more sources 110 which includes at least one of, but not limited to, one or more network nodes, a plurality of applications and one or more databases. In one embodiment, the one or more network nodes refer to the various devices and components present in the network 106 that facilitate communication, manage traffic, and ensure connectivity. In one embodiment, the one or more network nodes include at least one of, but not limited to, Radio Access Network (RAN), core network, centralized unit, and gNodeB (gNB). In another embodiment, the one or more network nodes include at least one of, but not limited to, servers, routers, switches, and load balancers. In one embodiment, the plurality of applications are applications from which multiple types of data are received in the system 108 related to the one or more network failure events. The plurality of applications typically involves a combination of various monitoring, logging, and diagnostic tools.
[0036] The environment 100 further includes the system 108 communicably coupled to the server 104, the UE 102, and the distributed file system 110 via the network 106. The system 108 is adapted to be embedded within the server 104 or is embedded as the individual entity.
[0037] Operational and construction features of the system 108 will be explained in detail with respect to the following figures.
[0038] FIG. 2 is an exemplary block diagram of the system 108 for data compression and aggregation, according to one or more embodiments of the present invention.
[0039] As per the illustrated and preferred embodiment, the system 108 for the data compression and aggregation, includes one or more processors 202, a memory 204, a storage unit 206 and an Artificial Intelligence/Machine Learning (AI/ML) model 220. The one or more processors 202, hereinafter referred to as the processor 202, may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions. However, it is to be noted that the system 108 may include multiple processors as per the requirement and without deviating from the scope of the present disclosure. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 204.
[0040] As per the illustrated embodiment, the processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 204 as the memory 204 is communicably connected to the processor 202. The memory 204 is configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed for the data compression and aggregation. The memory 204 may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as disk memory, EPROMs, FLASH memory, unalterable memory, and the like.
[0041] The environment 100 further includes the storage unit 206. As per the illustrated embodiment, the storage unit 206 is configured to store data retrieved from the distributed file system 110. The storage unit 206 is one of, but not limited to, a centralized database, a cloud-based database, a commercial database, an open-source database, a distributed database, an end-user database, a graphical database, a No-Structured Query Language (NoSQL) database, an object-oriented database, a personal database, an in-memory database, a document-based database, a time series database, a wide column database, a key value database, a search database, a cache databases, and so forth. The foregoing examples of the storage unit 206 types are non-limiting and may not be mutually exclusive e.g., the database can be both commercial and cloud-based, or both relational and open-source, etc.
[0042] As per the illustrated embodiment, the system 108 includes the AI/ML model 220. In another embodiment, the system 108 includes a plurality of AI/ML models 220. The model 220 facilitates the system 108 in performing tasks such as at least one of, detecting anomalies, recognizing patterns, making predictions, solving problems, enhancing decision-making, and providing insights across various fields. For example, the AI/ML model 220 facilitates solving real-world problems without extensive manual intervention. In an alternate embodiment, the AI/ML model 220 is pretrained.
[0043] As per the illustrated embodiment, the system 108 includes the processor 202 for data compression and aggregation. The processor 202 includes a retrieving module 208, an analysing module 210, a training unit 212, a compressor module 214, and an aggregation module 216. The processor 202 is communicably coupled to the one or more components of the system 108 such as the memory 204, the storage unit 206 and the AI/ML model 220. In an embodiment, operations and functionalities of the retrieving module 208, the analysing module 210, the training unit 212, the compressor module 214, the aggregation module 216, and the one or more components of the system 108 can be used in combination or interchangeably.
[0044] In one embodiment, initially the retrieving module 208 of the processor 202 is configured to retrieve data from the distributed file system 110. In one embodiment, the data corresponds to network performance data of one or more Network Functions (NFs). In one embodiment, the one or more Network Functions (NFs) are logical, software-based components or entities that perform specific tasks within the network 106. The one or more NFs are designed to handle various aspects of network operation, such as at least one of, but not limited to, user data handling, control, management, authentication, and service delivery. Herein, the one or more NFs include at least one of, but not limited to, an Access and Mobility Management Function (AMF), a Session Management Function (SMF), and a User Plane Function (UPF).
[0045] In one embodiment, the network performance data of the one or more NFs refers to at least one of, but not limited to, metrics and Key Performance Indicators (KPIs) that facilitates the system 108 to monitor and evaluate an efficiency, reliability, and scalability of the one or more NFs in the network 106. For example, the network performance data of one or more NFs includes at least one of, but not limited to, traffic metrics, resources utilization, and connection metrics. Herein, the traffic metrics include at least one of, but not limited to, packet loss rate, throughput, and latency. Herein, the resource utilization includes at least one of, but not limited to, Central Processing Unit (CPU) usage, and memory usage. Herein, the connection metrics pertains to at least one of, but not limited to, number of active connections managed by the one or more NFs.
[0046] In one embodiment, the performance indicators are the performance data aggregated over a group of NFs, such as, for example, average latency along the network slice. The performance indicators can be derived from the performance measurements collected at the NFs that belong to the group. The aggregation method is identified in the performance indicator definition. The performance indicators at a network slice subnet level can be derived from the performance measurements collected at the NFs that belong to the network slice subnets or to the constituent network slice subnets. The performance indicators at the network slice subnet level can be made available via the corresponding performance management service for network slice subnet. The performance indicators at the network slice level, can be derived from the network slice subnet level Performance Indicators collected at the constituent network slice subnets and/or NFs. The network slice level performance indicators can be made available via the corresponding performance management service for network slice.
[0047] In one embodiment, the retrieving module 208 receives the data from the distributed file system 110 which are present within the network 106 and outside the network 106. In one embodiment, the distributed file system 110 periodically transmits the data to the system 108. In one embodiment, the distributed file system 110 is located at the NFs. In particular, the distributed file system 110 is integrated within the NFs. For example, in cloud-native environments (e.g., 5G core network functions running in a cloud environment), the distributed file system 110 stores and manages data related to NFs such as at least one of, the Access and Mobility Management Function (AMF), the Session Management Function (SMF), and the User Plane Function (UPF).
[0048] In one embodiment, the retrieving module 208 retrieves the data from the distributed file system 110 via an interface. In one embodiment, the interface includes at least one of, but not limited to, one or more Application Programming Interfaces (APIs) which are used for retrieving the data from the distributed file system 110. The one or more APIs are sets of rules and protocols that allow different entities to communicate with each other. The one or more APIs define the methods and data formats that entities can use to request and exchange information, enabling integration and functionality across various platforms. In particular, the APIs are essential for integrating different systems, accessing services, and extending functionality.
[0049] In one embodiment, upon retrieving the data from the distributed file system 110, the retrieving module 208 is configured to identify a format of the retrieved data. Herein, the format of the retrieved data is at least one of, but not limited to, a Comma-Separated Values (CSV) file format, a parquet format, a Tape Archive (TAR) file format, JavaScript Object Notation (JSON), eXtensible Markup Language (XML) and ZIP. In one embodiment, the retrieving module 208 identifies by checking file extensions. For example, the CSV format is identified by checking for the “.CSV” extension. In another example, the JSON format is identified by checking for the “.JSON” extension.
[0050] In another embodiment, the retrieving module 208 identifies the format of the retrieved data by checking a structure of the retrieved data. For example, let us consider that the retrieved data is delimited by commas or other delimiters like tabs and contains rows and columns, then based on checking the retrieving module 208 identifies that the retrieved data is in CSV format.
[0051] In one embodiment, upon identifying the format of the retrieved data, the retrieving module 208 is further configured to extract one or more features from the retrieved data. In one embodiment, the one or more features are information related to the retrieved data. In other words, the information pertains to the metadata of the retrieved data. For example, the information related to the retrieved data includes at least one of, but not limited to, a size of the retrieved data such as 50 Megabyte (MB), a source of the retrieved data such as a database, a format of the retrieved data such as JSON, and a time stamp of the retrieved data such as retrieval date of the retrieved data.
[0052] In one embodiment, upon extracting the one or features from the retrieved data, the retrieving module 208 is further configured to preprocess the data associated with the extracted one or more features. In particular, the retrieving module 208 is configured to preprocess the data to ensure the data consistency and quality of the data within the system 108. The retrieving module 208 performs at least one of, but not limited to, data normalization, data definition and data cleaning procedures.
[0053] In one embodiment, for preprocessing, the retrieving module 208 performs at least one of, but not limited to, reorganizing the data, removing the redundant data from the data, formatting the data, removing null values from data, cleaning the data, and handling missing values in the retrieved data. The main goal of the preprocessing is to achieve a standardized data format across the system 108. While preprocessing, the duplicate data and inconsistencies are eliminated from the retrieved data. Subsequent to preprocessing, the retrieved data is referred to pre-processed data. The retrieving module 208 is further configured to store the pre-processed data in at least one of, the distributed file system 110 and the storage unit 206 for subsequent retrieval and analysis.
[0054] Upon preprocessing the retrieved data, the training unit 212 of the processor 202 is configured to train the AI/ML model 220 using the pre-processed data and the extracted one or more features. In order to train the AI/ML model 220, the training unit 212 configures one or more hyperparameters of the AI/ML model 220. In one embodiment, the one or more hyperparameters of the AI/ML model 220 includes at least one of, but not limited to, a learning rate, a batch size, and a number of epochs.
[0055] Upon configuring the one or more hyperparameters of the AI/ML model 220, the training unit 212 is further configured to split the pre-processed data into at least one of, but not limited to, training data and testing data for training. For example, the training unit 212 splits the pre-processed data such that 90% of the pre-processed data is considered as the training data and 10% of the pre-processed data is considered as the testing data. Thereafter, the training unit 212 feeds the training data to the AI/ML model 220 based on which the AI/ML model 220 is trained by the training unit 212.
[0056] In one embodiment, the training unit 212 trains the AI/ML model 220 by applying one or more logics. In one embodiment, the one or more logics may include at least one of, but not limited to, a k-means clustering, a hierarchical clustering, a Principal Component Analysis (PCA), an Independent Component Analysis (ICA), a deep learning logics such as Artificial Neural Networks (ANNs), a Convolutional Neural Networks (CNNs), a Recurrent Neural Networks (RNNs), a Long Short-Term Memory Networks (LSTMs), a Generative Adversarial Networks (GANs), a Q-Learning, a Deep Q-Networks (DQN), a Reinforcement Learning Logics, etc.
[0057] In one embodiment, the AI/ML model 220 is trained using historical data pertaining to historical data types, data patterns, and data anagrams of the retrieved data. The historical data refers to past data that has been collected over a period of time. Herein, the data patterns refer to recurring trends or relationships in the retrieved data. Herein, the data anagram in machine learning is an arrangement of data which contains at least one of, but not limited to, same words and characters within the data but an order of at least one of, the words and the characters are different. The AI/ML model 220 is trained to recognize and understand different packet arrangements such as in a network 106, packets arrive in a different order from their original transmission sequence due to routing or congestion which is seen as a form of "anagram," where the same data is transmitted, but the order of packets is different.
[0058] In one embodiment, while training the AI/ML model 220 with the pre-processed data, the AI/ML model 220 learns at least one of, but not limited to, data types and data patterns associated with the retrieved data. Herein, the data types are at least one of, but not limited to, a numerical data, categorical data, and textual data. Herein, the data patterns refer to at least one of, but not limited to, regularities, repeated sequences, or recurring characteristics related to the retrieved data. In another embodiment, the AI/ML model 220 learns at least one of, but not limited to, trends/patterns based on historical data related to the data corresponds to network performance data of the one or more NFs.
[0059] In one embodiment, the trained AI/ML model 220 is fed with the testing data in order to evaluate performance of the trained AI/ML model 220. Upon training the AI/ML model 220, the analysing module 210 of the processor 202 is configured to analyse, utilizing the trained AI/ML model 220, at least one of, but not limited to, the data types, the data patterns and data anagrams of the retrieved data to identify at least one pattern related to the one or more formats in the retrieved data. Herein, the pattern refers to a repetition of the one or more output formats provided by the system 110 over a period of time. In one embodiment, the data organized or structured in the specific format is at least one pattern. The use of delimiters in CSV files is the such as how data is, such as, the structure of JSON objects, or XML tags. In one embodiment, the analysing module 210 identifies at least one pattern by applying the one or more logics. Herein, identifying patterns are related to one or more formats pertains to identifying structures, or behaviors that emerge due to the way the data is formatted. Herein, the patterns are identified based on the past more or more outputs formats. So, based on the past one or more output formats, the analysing module 210 identifies that the data is stored in at least one of, the CSV format and the JSON format in the distributed file system 110.
[0060] In one embodiment, the analysing module 210 identifies at least one pattern related to the one or more formats which are mostly utilized by the system 108 for analysis. In particular, the analysing module 210 determines the one or more formats based on the identified at least one pattern that can influence the choice of the one or more output formats. In particular, the analysing module 210 identifies at least one pattern related to the one or more formats based on the requirement of the user. For example, the user predefines the requirement for the one or more formats. The requirement of the user pertains to the one or more output formats desired by the user. In one embodiment, the user has predefined a criteria such as a list of the one or more output formats that meets the user’s specific needs. In other words, the user decides the format of the output. Herein, the user is at least one of, but not limited to, a network operator. Based on the user requirement, the analysing module 210 identifies at least one format that may be output format. In one embodiment, the one or more formats is different from the initial known one or more formats. In an alternate embodiment, the one or more formats are same depending on the identified pattern from analysis.
[0061] In one embodiment, the analysing module 210 identifies at least one pattern related to the one or more formats based on historical data pertaining to execution of most frequent buckets in the past. Herein, the most frequent buckets refer to the data buckets or categories that appear most frequently in the historical data, potentially indicating common or significant patterns of the one or more formats that are utilized to store data in the distributed file system 110.
[0062] Upon identifying at least one pattern related to the one or more formats, the compressor module 214 of the processor 202 is configured to select one or more output format based on the identified at least one pattern to compress the retrieved data based on the analysis Herein, based on the identified at least one pattern, the compressor module 214 selects the one or more output formats. For example, let us consider that the identified at least one pattern suggests at least one of, but not limited to, trends over time and user preferences such as preferred one or more output formats.
[0063] In one embodiment, upon selecting the one or more output formats, the compressor module 214 compresses the retrieved data based on the analysis. In order to compress the retrieved data, the compressor module 214 utilizes a compression methodology associated with the one or more output formats. In one embodiment, the data compression methodology is crucial for optimizing storage, enhancing transmission speeds, reducing network bandwidth usage, and speeding up data processing. There are various types of compression methods, each suitable for different types of data (e.g., textual, binary, multimedia).
[0064] Herein, the compression methodologies are techniques used to reduce the size of retrieved data due to which the size of the retrieved data is reduced which makes the retrieved data more efficient to store in the distributed file system 110. Furthermore, due to the compression of the retrieved data, the compressed data is efficient for transmission and processing. In one embodiment, the compression methodology includes at least one of, but not limited to, a lossless compression, a lossy compression, a transform based compression, and a protocol specific compression.
[0065] In one embodiment, the lossy compression reduces the size of the retrieved data without any loss of information. In one embodiment, the lossy compression sacrifices some retrieved data accuracy to achieve higher compression ratios. In one embodiment, the transform based compression converts the retrieved data into a format that allows more efficient representation, often by applying mathematical transforms to the retrieved data. In one embodiment, the protocols like Hypertext Transfer Protocol (HTTP), Secure Shell (SSH), or Virtual Private Networks (VPNs) often employ own compression mechanisms such as the protocols specific compression protocols to optimize the transfer of data.
[0066] In one embodiment, for example, let us consider that the analysed data type is the numerical data, then the compressor module 214 selects the at least one compression methodology such that the size of the numerical data is reduced to maximum level. In another embodiment, the compressor module 214 selects the compression methodology based on comparing a compression level of the retrieved data. For example, let us consider a compression methodology A and a compression methodology B. Herein the compression methodology A compresses the retrieved data to 80% of actual size and the compression methodology B compresses the retrieved data to 90% of actual size. Therefore, the compression methodology B is selected by the compressor module 214.
[0067] Subsequent to selecting the one or more outputs formats, the compressor module 214 is configured to compress the retrieved data utilizing the selected compression methodology. Upon compressing the retrieved data, the aggregation module 216 of the processor 202 is configured to bucket, utilizing the AI/ML model 220, the compressed data to group the compressed data in a plurality of buckets. For example, the aggregation module 216 performs bucketing of the retrieved data which is compressed. Herein, based on the requirements of the user the bucketing is performed by the the aggregation module 216. In particular, the requirements of the user are predefined by the user. For example, let us consider the retrieved data corresponds to the network performance data. Furthermore, let us consider that the network performance data pertains to the traffic associated with the NFs. Herein, based on the requirement of the user, the traffic is grouped into the plurality of buckets based on the amount of data being transmitted over the network 106 at any given time. The aggregation module 216 buckets traffic into categories like "low traffic," "medium traffic," and "high traffic” such as Bucket 1: Low Traffic (0-50 Mbps), Bucket 2: Medium Traffic (51-200 Mbps), and Bucket 3: High Traffic (201+ Mbps).
[0068] In another example, aggregation module 216 buckets the compressed data in the plurality of buckets such as at least one of, but not limited to, hourly retrieved data, daily retrieved data or monthly retrieved data. In another example, traffic data such as packets, throughput, latency, and errors are often grouped into the plurality of buckets for easier analysis. The plurality of buckets typically represents data grouped based on certain characteristics, such as time intervals, ranges of values, categories, or event types.
[0069] Upon bucketing the compressed data in the plurality of buckets, the aggregation module 216 is further configured to aggregate the data within the plurality of buckets. Herein, aggregating the data within the plurality of buckets refers to the process of performing operations using aggregation methods which summarizes the data within the plurality of buckets into a single value or a smaller set of values. Herein, the aggregation methods include at least one of, but not limited to, sum, count, average, maximum, and minimum. In one embodiment, the aggregation facilitates in reducing complexity by providing an overview of the data, making the data within the plurality of buckets easier to analyze and interpret. For example, let us consider the network performance data which are bucketed based on bandwidth usage such as Bucket 1:0-50 Mbps, Bucket 2: 51-100 Mbps, Bucket 3: 101-200 Mbps, Bucket 4: 201-500 Mbps and Bucket 5: 500+ Mbps. Each bucket represents a range or category of values.
[0070] Further, the aggregation module 216 selects the aggregation method such as at least one of, but not limited to, the sum, the average, the maximum, and the minimum. Then the data within each of the plurality of buckets is summarized by the aggregation module 216 such as the average of the data within the Bucket 1 is 2000 MB. In another example, the average of the all the buckets are carried out by the aggregation module 216. Let us consider that the sum of the total bandwidth used in each bucket is Bucket 1 (0-50 Mbps): 2000 MB, Bucket 2 (51-100 Mbps): 3000 MB, Bucket 3 (101-200 Mbps): 2500 MB, Bucket 4 (201-500 Mbps): 1500 MB, Bucket 5 (500+ Mbps): 800 MB, then the total bandwidth usage (sum of all buckets) is: 9800 MB. So, the average bandwidth usage per bucket is 9800/5, which is equal to 1600 MB per bucket. So instead of storing the records for the plurality of buckets, the average bandwidth usage is stored in the distributed file system 110.
[0071] Upon aggregating the data within the plurality of buckets, the aggregation module 216 is further configured to store the aggregated data in the distributed file system 110. Herein the aggregated requires much less space within distributed file system 110 as compared to the retrieved data. Advantageously, the size of data to be stored in the distributed file system 110 is reduced. Due to the reduced size of the data, the time required to at least one of, but not limited to, read, process, execute request and generate the output using the compressed and aggregated data is decreased.
[0072] The retrieving module 208, the analysing module 210, the training unit 212, the compressor module 214, and the aggregation module 216in an exemplary embodiment, are implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor 202. In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor 202 may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processor may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory 204 may store instructions that, when executed by the processing resource, implement the processor 202. In such examples, the system 108 may comprise the memory 204 storing the instructions and the processing resource to execute the instructions, or the memory 204 may be separate but accessible to the system 108 and the processing resource. In other examples, the processor 202 may be implemented by electronic circuitry.
[0073] FIG. 3 illustrates an exemplary architecture for the system 108, according to one or more embodiments of the present invention. More specifically, FIG. 3 illustrates the system 108 for data compression and aggregation. It is to be noted that the embodiment with respect to FIG. 3 will be explained with respect to the UE 102 for the purpose of description and illustration and should nowhere be construed as limited to the scope of the present disclosure.
[0074] FIG. 3 shows communication between the UE 102, the system 108, and the distributed file system 110. For the purpose of description of the exemplary embodiment as illustrated in FIG. 3, the UE 102, uses network protocol connection to communicate with the system 108, and the distributed file system 110. In an embodiment, the network protocol connection is the establishment and management of communication between the UE 102, the system 108, and the distributed file system 110 over the network 106 (as shown in FIG. 1) using a specific protocol or set of protocols. The network protocol connection includes, but not limited to, Session Initiation Protocol (SIP), System Information Block (SIB) protocol, Transmission Control Protocol (TCP), User Datagram Protocol (UDP), File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP), Simple Network Management Protocol (SNMP), Internet Control Message Protocol (ICMP), Hypertext Transfer Protocol Secure (HTTPS) and Terminal Network (TELNET).
[0075] In an embodiment, the UE 102 includes a primary processor 302, and a memory 304 and a User Interface (UI) 306. In alternate embodiments, the UE 102 may include more than one primary processor 302 as per the requirement of the network 106. The primary processor 302, may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions.
[0076] In an embodiment, the primary processor 302 is configured to fetch and execute computer-readable instructions stored in the memory 304. The memory 304 may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed for data compression and aggregation. The memory 304 may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as disk memory, EPROMs, FLASH memory, unalterable memory, and the like.
[0077] In an embodiment, the User Interface (UI) 306 includes a variety of interfaces, for example, a graphical user interface, a web user interface, a Command Line Interface (CLI), and the like. The UI 306 of the UE 102 allows users to transmit requests to the one or more processors 202 for the data compression and aggregation. In one embodiment, the user is at least one of, but not limited to, a network operator. Further, the UE 102 receives information regarding the data compressed and aggregated by the one or more processors 202.
[0078] As mentioned earlier in FIG.2, the system 108 includes the processors 202, the memory 204 and the storage unit 206, for data compression and aggregation, which are already explained in FIG. 2. For the sake of brevity, a similar description related to the working and operation of the system 108 as illustrated in FIG. 2 has been omitted to avoid repetition.
[0079] Further, as mentioned earlier the processor 202 includes the retrieving module 208, the analysing module 210, the training unit 212, the compressor module 214, and the aggregation module 216 which are already explained in FIG. 2. Hence, for the sake of brevity, a similar description related to the working and operation of the system 108 as illustrated in FIG. 2 has been omitted to avoid repetition. The limited description provided for the system 108 in FIG. 3, should be read with the description provided for the system 108 in the FIG. 2 above, and should not be construed as limiting the scope of the present disclosure.
[0080] FIG. 4 is an exemplary the system 108 architecture 400 for the data compression and aggregation, according to one or more embodiments of the present disclosure.
[0081] The architecture 400 includes the distributed file system 110, which is in communication with a distributed data aggregator 402. The distributed file system 110 includes the data corresponding to the network performance data associated with the one or more NFs. Herein, the distributed data aggregator 402 includes an AI/ML model 402a, a compressor 402b, and an aggregator 402c communicably coupled to each other via the network 106.
[0082] In one embodiment, the data stored in the distributed file system 110 is split and stored across multiple servers or nodes in the network 106. Herein, then nodes may be located physically apart (e.g., in geographical locations). The distributed file system 110 allows scalability such that more nodes can be added to the network 106 to increase storage.
[0083] In one embodiment, the distributed data aggregator 402 periodically retrieves the data pertaining to the one or more NFs via the one or more APIs. The data acts as the input stream provided by the one or more APIs which is crucial for training the AI/ML model 402a. In one embodiment, the trained model 402a selects the one or more output formats and the compression methodology to compress the retrieved data. Further, the trained model 402a selects the one or more aggregation buckets.
[0084] In one embodiment, the compressor 402b compresses the retrieved data using the selected compression methodology. Herein, the compressor 402b performs the data compression process which reduces the size of the retrieved data by encoding the retrieved data. In one embodiment, the compression includes at least one of, but not limited to, the lossless compression and the lossy compression.
[0085] In one embodiment, the aggregator 402c performs a data aggregation process in which the aggregator 402c collects and summarizes data retrieved from the distributed file system 110. In another embodiment, the aggregator 402c performs bucketing of the retrieved data which is compressed by the compressor 402b. Herein bucketing is performed based on the requirements of the user. In particular, the requirements of the user are predefined by the user. For example, let us consider the retrieved data is related to the network performance data such as network latency of the NFs. Herein, based on the requirement of the user, the aggregator 402c buckets latency values into the plurality of buckets. Herein, the plurality of buckets refers to categories based on the level of performance such as Bucket 1: Low Latency (<50 milliseconds (ms)), Bucket 2: Moderate Latency (50-150 ms), and Bucket 3: High Latency (>150 ms). Furthermore, the aggregator 402c performs aggregation on the data within the plurality of buckets. So, aggregation is indeed an operation performed on bucketed data, where each bucket represents a subset of the data, and aggregation is applied to each subset.
[0086] Herein, the aggregator 402c summerizes the bucketed data into a single, consolidated view. In particular, the goal of aggregator 402c is to simplify the analysis of large datasets by reducing the volume of data and providing a higher-level summary that highlights key patterns, trends, or insights related to the one or more NFs.
[0087] In one embodiment, the distributed file system 110 acts as the storage unit which includes a structured collection of at least one of, but not limited to, the compressed and the aggregated data related to the one or more NFS, which are managed and organized in a way that allows system 108 for easy access, retrieval, and manipulation. The distributed file system 110 is used to store, manage, and retrieve large amounts of information efficiently.
[0088] FIG. 5 is a signal flow diagram illustrating the flow for data compression and aggregation, according to one or more embodiments of the present disclosure.
[0089] At step 502, the system 108 receives the data from the distributed file system 110. For example, the data is associated with at least one of, but not limited to, the performance data corresponding to the network performance data of the one or more NFs. In one embodiment, the system 108 transmits at least one of, but not limited to, a Hyper Text Transfer Protocol (HTTP) request to the distributed file system 110 to retrieve data associated with the one or more NFs. In one embodiment, a connection is established between the system 108 and the distributed file system 110 before retrieving data from the one or more sources 110.
[0090] At step 504, the system 108 trains the AI/ML model 220 with the retrieved data. More particularly, the system 108 trains the AI/ML model 220 with the retrieved data subsequent to retrieving the data. Herein, the retrieved data is pre-processed and the pre-processed data is stored in the storage unit 206 for training the AI/ML model 220.
[0091] At step 506, the system 108 performs the data compression and aggregation utilizing the trained AI/ML model 220. Herein, the system 108 identifies and learns the patterns between the multiple types of data. Based on the identified patterns, the system 108 selects the one or more output formats and the compression methodology to compress the retrieved data. Further, the system 108 compresses the data and thereafter the system 108 buckets the compressed data in the plurality of buckets. Thereafter, the system 108 aggregates the data within the plurality of buckets.
[0092] At step 508, the system 108 stores the compressed and aggregated data in the distributed file system 108. Further, the system 108 transmits a notification pertaining to the compressed and aggregated data generated by the system 108 to the user. Herein, the system 108 transmits the notification which includes at least one of, but not limited to the report related to the compressed and aggregated data by at least one of, but not limited to, the HTTP request. Further, the user can view the notification generated by the system 108 in at least one of, but not limited to, a graphical format and a tabular format on the UI 306 of the UE 102.
[0093] FIG. 6 is a flow diagram of a method 600 for the data compression and aggregation, according to one or more embodiments of the present invention. For the purpose of description, the method 600 is described with the embodiments as illustrated in FIG. 2 and should nowhere be construed as limiting the scope of the present disclosure.
[0094] At step 602, the method 600 includes the step of retrieving the data from the distributed file system 110. In one embodiment, the retrieving module 208 retrieves the data from the distributed file system 110. In particular, the receiving unit 208 utilizes at least one of, but not limited to, the one or more APIs for retrieving the data from the distributed file system 110. Further, the retrieving module 208 identifies the format of the retrieved data. For example, the format includes at least one of, but not limited to, the CSV format, the TAR format, and the parquet format. Furthermore, the retrieving module 208 extracts one or more features from the retrieved data and preprocesses the data associated with the extracted one or more features.
[0095] At step 604, the method 600 includes the step of analysing, utilizing the trained AI/ML model 220, at least one of, the data types, the data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. For example, the analysing module 210 utilizes the trained model 220 to identify the pattern related to at least one format which is used as the output format for maximum number of times in the past. In another example, the analysing module 210 analyses historical data to identify at least one pattern related tothe one or more formats in the retrieved data.
[0096] At step 606, the method 600 includes the step of selecting, utilizing the trained AI/ML model 220, the one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis. For example, based on the analysis the compressor module 214 selects the one or more output formats which is maximum number of times utilized by the system 108 in the past. Further, the compressor module 214 selects the compression methodology to compress the retrieved data. For example, based on the data type such as the textual data and the data pattern related to the one or more output formats, the compressor module 214 selects compression methodology such as the lossless compression.
[0097] At step 608, the method 600 includes the step of bucketing, utilizing the trained AI/ML model 220, the compressed data to group the compressed data in a plurality of buckets. In one embodiment, the aggregation module 216 buckets, the compressed data to group the compressed data. For example, let us consider a scenario such that the distributed file system 110 includes the data associated with the one or more NFs of last 6 months. Herein, the aggregation module 216 compresses the data associated with the one or more NFs of last 6 months and then groups the data associated with the one or more NFs in the plurality of buckets based on the daily basis. In another embodiment, the aggregation module 216 buckets the compressed data in the plurality of the data based on the user requirements. For example, for time-series data, the aggregation module 216 buckets the data by time intervals such as at least one of, but not limited to, daily and monthly.
[0098] At step 610, the method 600 includes the step of, aggregating, utilizing the trained AI/ML model 220 the data within the plurality of buckets. In one embodiment, the aggregation module 216 is further configured to aggregate the data within the plurality of buckets. For example, let us consider the plurality of buckets such as Bucket 1: Traffic data for Router 1 (throughput, packet loss, latency) over the last 10 minutes, Bucket 2: Traffic data for Router 2 over the last 10 minutes and Bucket 3: Traffic data for Router 3 over the last 10 minutes. The aggregation module 216 aggregates the traffic data for each router such as instead of saving records of traffic data of 10 minutes, the aggregation module 216 aggregates the traffic data in order to store the summary of the traffic data. In another example, the aggregation module 216 aggregates the traffic data across all routers to get an overall picture of network performance. Advantageously, due to the data compression and aggregation the size of the data is reduced due to which less space is required in order to store data in the distributed file system 110.
[0099] At step 612, the method 600 includes the step of, storing, the aggregated data in the distributed file system 110. In one embodiment, the aggregation module 216 is further configured to store the aggregated data in the distributed file system 110. Herein the aggregated data requires much less data space within the distributed file system 110. Advantageously, the size of data to be stored in the distributed file system 110 is reduced.
[00100] In yet another aspect of the present invention, a non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor 202. The processor 202 is configured to retrieve, data from a distributed file system 110. The processor 202 is further configured to analyse, utilizing an Artificial Intelligence/Machine Learning (AI/ML) model 220, at least one of, data types, data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data. The processor 202 is further configured to select, utilizing the AI/ML model 220, one or more output format based on the identified at least one pattern to compress the retrieved data based on the analysis. The processor 202 is further configured to bucket, utilizing the AI/ML model 220, compressed data to group the compressed data in a plurality of buckets. The processor 202 is further configured to aggregate, utilizing the AI/ML model, the data within the plurality of buckets. The processor 202 is further configured to store the aggregated data in the distributed file system 110.
[00101] A person of ordinary skill in the art will readily ascertain that the illustrated embodiments and steps in description and drawings (FIG.1-6) are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments.
[00102] The present disclosure provides technical advancements of data compression and aggregation executed by the system in order to save the storage space within the distributed file system. Herein, before storing the one or more NFs data in the distributed file system, the system first compresses and aggregates the data due which the size of data to be stored in the distributed file system is reduced and much less data space is required to store the data in the distributed file system. For Example, previously for storing the data of last 6 months the required space was in a range of 150tb to 7.5pb. Now, due to the data compression and aggregation required space to store the data in the distributed file system is reduced to 15tb to 750tb (reduced by 90 percent). Due to the reduced size of the data, the time required to at least one of, but not limited to, read, process, execute request and generate the output using the compressed and aggregated data is decreased.
[00103] The present invention offers multiple advantages over the prior art and the above listed are a few examples to emphasize on some of the advantageous features. The listed advantages are to be read in a non-limiting manner.

REFERENCE NUMERALS

[00104] Environment - 100;
[00105] User Equipment (UE) - 102;
[00106] Server - 104;
[00107] Network- 106;
[00108] System -108;
[00109] Distributed file system – 110;
[00110] Processor - 202;
[00111] Memory - 204;
[00112] Storage unit – 206;
[00113] Retrieving module – 208;
[00114] Analysing module– 210;
[00115] Training unit – 212;
[00116] Compressor module – 214;
[00117] Aggregation module – 216;
[00118] AI/ML Model – 220;
[00119] Primary Processor – 302;
[00120] Memory – 304;
[00121] User Interface (UI) – 306;
[00122] Distributed data aggregator – 402;
[00123] AI/ML model – 402a;
[00124] Compressor – 402b;
[00125] Aggregator – 402c.

,CLAIMS:CLAIMS
We Claim:
1. A method (600) of data compression and aggregation, the method (600) comprising the steps of:
retrieving, by one or more processors (202), data from a distributed file system (110);
analysing, by the one or more processors (202), utilizing an Artificial Intelligence/Machine Learning (AI/ML) model (220), at least one of, data types, data patterns, and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data;
selecting, by the one or more processors (202), utilizing the AI/ML model (220), one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis;
bucketing, by the one or more processors (202), utilizing the AI/ML model (220), compressed data to group the compressed data in a plurality of buckets;
aggregating, by the one or more processors (202), utilizing the AI/ML model (220), the data within the plurality of buckets; and
storing, by the one or more processors (202), the aggregated data in the distributed file system (110).

2. The method (600) as claimed in claim 1, wherein the data corresponds to network performance data of one or more Network Functions (NFs).

3. The method (600) as claimed in claim 1, wherein upon retrieving the data, the method (600) comprises the steps of:
identifying, by the one or more processors (202), a format of the retrieved data, wherein the format of the retrieved data is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format; and
extracting, by the one or more processors (202), one or more features from the retrieved data.

4. The method (600) as claimed in claim 1, wherein the output format is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format.

5. The method (600) as claimed in claim 1, wherein the data types are one of a numerical data, categorical data, and textual data.

6. The method (600) as claimed in claim 1, wherein the bucketing and aggregating is performed by the one or more processors (202) based on the requirement of the user.

7. A system (108) for data compression and aggregation, the system (108) comprising:
a retrieving module (208) configured to retrieve, data from a distributed file system (110);
an analysing module (210) configured to analyse, utilizing an Artificial Intelligence/Machine Learning (AI/ML) model (220), at least one of, data types, data patterns and data anagrams of the retrieved data to identify at least one pattern related to one or more formats in the retrieved data;
a compressor module (214) configured to select, utilizing the AI/ML model (220), one or more output formats based on the identified at least one pattern to compress the retrieved data based on the analysis;
an aggregation module (216) configured to bucket, utilizing the AI/ML model (220), compressed data to group the compressed data in a plurality of buckets;
the aggregation module (216) configured to aggregate, utilizing the AI/ML model (220), the data within the plurality of buckets; and
the aggregation module (216), configured to, store, the aggregated data in the distributed file system (110).

8. The system (108) as claimed in claim 7, wherein the data corresponds to network performance data of one or more Network Functions (NFs).

9. The system (108) as claimed in claim 7, wherein upon retrieving the data, the analysing module (210) is configured to:
identify, a format of the retrieved data, wherein the format of the retrieved data is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format; and
extract, one or more features from the retrieved data.

10. The system (108) as claimed in claim 7, wherein the output format is one of a Comma-Separated Values (CSV) file format, a parquet format, and a Tape Archive (TAR) file format.

11. The system (108) as claimed in claim 7, wherein the data types are one of a numerical data, categorical data, and textual data.

12. The system (108) as claimed in claim 7, wherein the bucketing and aggregating is performed based on the requirement of the user.

Documents

Application Documents

#	Name	Date
1	202421003929-STATEMENT OF UNDERTAKING (FORM 3) [19-01-2024(online)].pdf	2024-01-19
2	202421003929-PROVISIONAL SPECIFICATION [19-01-2024(online)].pdf	2024-01-19
3	202421003929-PROOF OF RIGHT [19-01-2024(online)].pdf	2024-01-19
4	202421003929-POWER OF AUTHORITY [19-01-2024(online)].pdf	2024-01-19
5	202421003929-FORM 1 [19-01-2024(online)].pdf	2024-01-19
6	202421003929-FIGURE OF ABSTRACT [19-01-2024(online)].pdf	2024-01-19
7	202421003929-DRAWINGS [19-01-2024(online)].pdf	2024-01-19
8	202421003929-DECLARATION OF INVENTORSHIP (FORM 5) [19-01-2024(online)].pdf	2024-01-19
9	202421003929-DRAWING [17-01-2025(online)].pdf	2025-01-17
10	202421003929-CORRESPONDENCE-OTHERS [17-01-2025(online)].pdf	2025-01-17
11	202421003929-COMPLETE SPECIFICATION [17-01-2025(online)].pdf	2025-01-17
12	202421003929-Power of Attorney [24-01-2025(online)].pdf	2025-01-24
13	202421003929-Form 1 (Submitted on date of filing) [24-01-2025(online)].pdf	2025-01-24
14	202421003929-Covering Letter [24-01-2025(online)].pdf	2025-01-24
15	202421003929-CERTIFIED COPIES TRANSMISSION TO IB [24-01-2025(online)].pdf	2025-01-24
16	Abstract-1.jpg	2025-03-13
17	202421003929-FORM 18 [20-03-2025(online)].pdf	2025-03-20