System And Method For Standardizing Network Data

< Back

System And Method For Standardizing Network Data

Abstract: ABSTRACT SYSTEM AND METHOD FOR STANDARDIZING NETWORK DATA The present invention relates to a system (120) and a method (500) for standardizing the network data is disclosed. The system (120) includes an integrated unit (220) configured to retrieve data from one or more data sources. The system (120) includes a selection and input unit (225) configured to receive one or more threshold values via one of a User Interface (UI) (215) and a User Equipment (UE) (110). The system (120) includes a processing unit (230) configured to compare each of the retrieved data assigned to a respective column to the one or more threshold values. The processing unit (230) is further configured to assign a standard value to the respective columns based on a predefined condition, and thereby standardizing the network data. Ref. Fig. 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

06 October 2023

Publication Number

15/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

JIO PLATFORMS LIMITED

OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA

Inventors

1. Aayush Bhatnagar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

2. Ankit Murarka

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

3. Jugal Kishore

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

4. Chandra Ganveer

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

5. Sanjana Chaudhary

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

6. Gourav Gurbani

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

7. Yogesh Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

8. Avinash Kushwaha

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

9. Dharmendra Kumar Vishwakarma

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

10. Sajal Soni

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

11. Niharika Patnam

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

12. Shubham Ingle

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

13. Harsh Poddar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

14. Sanket Kumthekar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

15. Mohit Bhanwria

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

16. Shashank Bhushan

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

17. Vinay Gayki

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

18. Aniket Khade

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

19. Durgesh Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

20. Zenith Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

21. Gaurav Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

22. Manasvi Rajani

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

23. Kishan Sahu

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

24. Sunil Meena

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

25. Supriya Kaushik De

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

26. Kumar Debashish

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

27. Mehul Tilala

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

28. Satish Narayan

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

29. Rahul Kumar

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

30. Harshita Garg

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

31. Kunal Telgote

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

32. Ralph Lobo

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

33. Girish Dange

Reliance Corporate Park, Thane - Belapur Road, Ghansoli, Navi Mumbai, Maharashtra 400701, India

Specification

DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003

COMPLETE SPECIFICATION
(See section 10 and rule 13)
1. TITLE OF THE INVENTION
SYSTEM AND METHOD FOR STANDARDIZING NETWORK DATA
2. APPLICANT(S)
NAME NATIONALITY ADDRESS
JIO PLATFORMS LIMITED INDIAN OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA
3.PREAMBLE TO THE DESCRIPTION

THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE NATURE OF THIS INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.

FIELD OF THE INVENTION
[0001] The present invention relates to the field of wireless communication networks, more particularly relates to a system and a method for standardizing network data.
BACKGROUND OF THE INVENTION
[0002] Telecommunication networks include processing systems for executing a diverse range of algorithms and predictive tasks, including anomaly detection. These processing systems may be powered by Large Language Models (LLMs) and their functions may include conducting thorough analysis of network and operational data using Machine Learning (ML) techniques to extract deep insights into the network data.
[0003] Input network data utilized for training the ML models is expected to be well-defined and cleansed. Generally, datasets are fed to the ML model using a data frame, which includes rows and columns. As such, data cleaning is a crucial step that includes identification and rectification or removal of inaccurate records, inconsistencies, errors, and other noise in the dataset. The quality and reliability of the dataset significantly impacts the performance and dependability of ML models.
[0004] Further, there may be instances when the data stored in the data frame may have different values. For example, each column may have a different value compared to the other columns. Due to the inconsistency and variations in data values of columns, it may be a cumbersome and time-consuming task for the user to filter the data such as removing the columns which do not meet the pre-defined data values. However, in some situations, even when the data values of the columns are not meeting the criteria of pre-defined values, the user may intend to retain them due to some importance the user thinks the said columns may possess which would be suitable to be treated as part of the training data. During these instances, it may be quite a challenge to standardize the data values, which are time consuming and can be error prone. Further, an expert developer may be required to program in order to standardize the data values of the data frame, which may again be a complex and cumbersome task.
[0005] There is, therefore, a need for standardizing the data values for training Machine Learning (ML) models.
SUMMARY OF THE INVENTION
[0006] One or more embodiments of the present disclosure provide a method and a system for standardizing network data.
[0007] In one aspect of the present invention, the method for standardizing the network data is disclosed. The method includes the step of retrieving, by one or more processors, data from one or more data sources. The method includes the step of receiving, by the one or more processors, one or more threshold values via one of a User Interface (UI) and a User Equipment (UE). The method includes the step of comparing, by the one or more processors, each of the retrieved data assigned to a respective column to the one or more threshold values. The method includes the step of assigning, by the one or more processors, a standard value to the respective column based on a predefined condition, and thereby standardizing the network data.
[0008] In one embodiment, the one or more data sources include at least, a file input, a source path, an input stream, a Hyper Text Transfer Protocol 2 (HTTP 2), a Distributed File System (DFS), and a Network Attached Storage (NAS).
[0009] In another embodiment, the retrieved data is stored in a data frame.
[0010] In yet another embodiment, the standardized data is stored in a storage unit and used for Machine Learning (ML) training.
[0011] In yet another embodiment, the standard value is defined by the user based on a historic value.
[0012] In yet another embodiment, the predefined condition includes the retrieved data assigned to the respective column is at least one of, greater than, or less than the one or more threshold values.
[0013] In another aspect of the present invention, the system for standardizing the network data is disclosed. The system includes a retrieving unit configured to retrieve data from one or more data sources. The system includes a receiving unit configured to receive a threshold value via one of a User Interface (UI) and a User Equipment (UE). The system includes a comparing unit configured to compare, each of the retrieved data assigned to a respective column to the threshold value. The system includes an assigning unit configured to assign a standard value to the respective column based on a predefined condition, and thereby standardizing the network data.
[0014] In another aspect of the embodiment, a User Equipment (UE) is disclosed. One or more primary processors are communicatively coupled to the one or more processors. The one or more primary processors are coupled with a memory unit. The memory unit stores instructions which when executed by the one or more primary processors cause the UE to select a column and set a standard value and a threshold value for the column via the User Interface (UI).
[0015] In another aspect of the embodiment, a non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor is disclosed. The processor is configured to retrieve data from one or more data sources. The processor is configured to receive one or more threshold values via one of a User Interface (UI) and a User Equipment (UE). The processor is configured to compare each of the retrieved data assigned to a respective column to the one or more threshold values. The processor is configured to assign a standard value to the respective column based on a predefined condition, and thereby standardizing the network data.
[0016] Other features and aspects of this invention will be apparent from the following description and the accompanying drawings. The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art, in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0018] FIG. 1 is an exemplary block diagram of an environment for standardizing network data, according to one or more embodiments of the present disclosure;
[0019] FIG. 2 is an exemplary block diagram of a system for standardizing the network data, according to the one or more embodiments of the present disclosure;
[0020] FIG. 3 is a schematic representation of the system in which various entities operations are explained, according to the one or more embodiments of the present disclosure;
[0021] FIG. 4 is a block diagram of an architecture that can be implemented in the system of FIG.2, according to the one or more embodiments of the present disclosure;
[0022] FIG. 5 is a signal flow diagram illustrating for standardizing the network data, according to the one or more embodiments of the present disclosure; and
[0023] FIG. 6 is a flow diagram illustrating a method for standardizing the network data, according to the one or more embodiments of the present disclosure.
[0024] The foregoing shall be more apparent from the following detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0025] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0026] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0027] A person of ordinary skill in the art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0028] Embodiments of the present invention provide a system and method for standardizing data values for training Machine Learning (ML) models. The present invention performs pre-processing such as modifying the data values of one or more columns whose data values exceed or do not meet the criteria of user defined threshold values. In an example, the system allows the user to assign a standard value for those columns whose data values do not meet the criteria of user defined threshold values. In the event, the user does not provide the data values for those incompatible columns, then the system assigns the standard value for each of those incompatible columns, thereby standardizing data values of all the columns of the data frame which can be used as training data for ML models. Further, the noise from modified data frame is removed such as any duplicates or missing values or inaccurate values, thereby facilitating the ML model to provide contextual and accurate predictions based on the fed cleansed raw training data.
[0029] Referring to FIG. 1, FIG. 1 illustrates an exemplary block diagram of an environment 100 for standardizing network data, according to one or more embodiments of the present invention. The environment 100 includes a network 105, a User Equipment (UE) 110, a server 115, and a system 120. The UE 110 aids a user to interact with the system 120 for standardizing the data. In an embodiment, the user is at least one of, a network operator, and a service provider. The standardizing data refers to the process of transforming data into a consistent format or scale, ensuring that the data meets defined criteria for uniformity. The standardized data is especially significant in data analysis and Machine Learning (ML), which helps to improve the accuracy and effectiveness of the ML model.
[0030] For the purpose of description and explanation, the description will be explained with respect to the UE 110, or to be more specific will be explained with respect to a first UE 110a, a second UE 110b, and a third UE 110c, and should nowhere be construed as limiting the scope of the present disclosure. Each of the UE 110 from the first UE 110a, the second UE 110b, and the third UE 110c is configured to connect to the server 115 via the network 105. In an embodiment, each of the first UE 110a, the second UE 110b, and the third UE 110c is one of, but not limited to, any electrical, electronic, electro-mechanical or an equipment and a combination of one or more of the above devices such as smartphones, virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device.
[0031] The network 105 includes, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof. The network 105 may include, but is not limited to, a Third Generation (3G), a Fourth Generation (4G), a Fifth Generation (5G), a Sixth Generation (6G), a New Radio (NR), a Narrow Band Internet of Things (NB-IoT), an Open Radio Access Network (O-RAN), and the like.
[0032] The server 115 may include by way of example but not limitation, one or more of a standalone server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an embodiment, the entity may include, but is not limited to, a vendor, a network operator, a company, an organization, a university, a lab facility, a business enterprise, a defense facility, or any other facility that provides content.
[0033] The environment 100 further includes the system 120 communicably coupled to the server 115 and each of the first UE 110a, the second UE 110b, and the third UE 110c via the network 105. The system 120 is configured for standardizing the network data. The system 120 is adapted to be embedded within the server 115 or is embedded as the individual entity, as per multiple embodiments of the present invention.
[0034] Operational and construction features of the system 120 will be explained in detail with respect to the following figures.
[0035] FIG. 2 is an exemplary block diagram of the system 120 for standardizing the network data, according to one or more embodiments of the present disclosure.
[0036] The system 120 includes a processor 205, a memory 210, a user interface 215, and a storage unit 240. For the purpose of description and explanation, the description will be explained with respect to one or more processors 205, or to be more specific will be explained with respect to the processor 205 and should nowhere be construed as limiting the scope of the present disclosure. The one or more processor 205, hereinafter referred to as the processor 205 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions.
[0037] As per the illustrated embodiment, the processor 205 is configured to fetch and execute computer-readable instructions stored in the memory 210. The memory 210 may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory 210 may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.
[0038] The User Interface (UI) 215 includes a variety of interfaces, for example, interfaces for a Graphical User Interface (GUI), a web user interface, a Command Line Interface (CLI), and the like. The user interface 215 facilitates communication of the system 120. In one embodiment, the user interface 215 provides a communication pathway for one or more components of the system 120. Examples of the one or more components include, but are not limited to, the UE 110, and the storage unit 240. The term “storage unit” and “database” are used interchangeably hereinafter, without limiting the scope of the disclosure.
[0039] The storage unit 240 is one of, but not limited to, a centralized database, a cloud-based database, a commercial database, an open-source database, a distributed database, an end-user database, a graphical database, a No-Structured Query Language (NoSQL) database, an object-oriented database, a personal database, an in-memory database, a document-based database, a time series database, a wide column database, a key value database, a search database, a cache databases, and so forth. The foregoing examples of storage unit 240 types are non-limiting and may not be mutually exclusive e.g., a database can be both commercial and cloud-based, or both relational and open-source, etc.
[0040] Further, the processor 205, in an embodiment, may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor 205. In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor 205 may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for processor 205 may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory 210 may store instructions that, when executed by the processing resource, implement the processor 205. In such examples, the system 120 may comprise the memory 210 storing the instructions and the processing resource to execute the instructions, or the memory 210 may be separate but accessible to the system 120 and the processing resource. In other examples, the processor 205 may be implemented by electronic circuitry.
[0041] In order for the system 120 to standardize the network data, the processor 205 includes an integrated unit 220, a selection and input unit 225, and a processing unit 230communicably coupled to each other. In an embodiment, operations and functionalities of the integrated unit 220, the selection and input unit 225, and the processing unit 230can be used in combination or interchangeably.
[0042] Initially, a request is transmitted by the user via the UI 215 for cleaning and normalizing the data from a data frame by assigning a standard value for one or more columns. The data frame is a two-dimensional, tabular data structure used to store and manipulate the data in rows and columns. The data frame allows for easy data analysis and processing, especially in machine learning tasks. Each column in the data frame represents a variable, and each row represents a data point. The data frame contains data to be used for machine learning training.
[0043] Upon receiving the request, the integrated unit 220 is configured to retrieve the data from one or more data sources. In an embodiment, the one or more data sources include at least a file input, a source path, an input stream, a Hyper Text Transfer Protocol 2 (HTTP 2), a Distributed File System (DFS), and a Network Attached Storage (NAS). The file input refers to reading data from the files stored locally or on the server 115. The files can be in different formats, including, but not limited to, Comma Separated Values (CSV), JavaScript Object Notation (JSON), extensible Markup Language (XML), or text files. In an exemplary embodiment, the data is stored in the CSV file, and the integrated unit 220 fetches the data for processing. The integrated unit 220 retrieves the data from the file and loads the data into the memory 210 for further processing.
[0044] The source path typically refers to the directory or network location where the data files are stored. The integrated unit 220 fetches the data by following the provided file path. In an exemplary embodiment for the source path, the system 120 stores images in a specific directory. The integrated unit 220 navigates to a designated source path and retrieves all files that match the required criteria (e.g., .jpg images). The input stream refers to continuous data that is read in real-time from a stream of data (e.g., data being transmitted over the network 105 or generated by sensors). In an exemplary embodiment, the data is being received from an Application Programming Interface (API) or a live data stream, and the integrated unit 220 fetches it in real-time. The HTTP 2 is a protocol used for communication over the web, which improves upon HTTP/1.1 by offering multiplexing and better performance for handling multiple requests. In an exemplary embodiment, the integrated unit 220 retrieves the data from the web server using the HTTP 2. The integrated unit 220 uses HTTP 2 to fetch the data from remote web servers or APIs.
[0045] The DFS is a distributed file system used to store large datasets across multiple machines. The DFS is commonly used in big data environments to store and retrieve large amounts of data. The integrated unit 220 connects to the HDFS to retrieve the files for processing. The NAS is a dedicated file storage system that provides Local Area Network (LAN) access to the data. The NAS allows multiple users or systems to access the data from a centralized storage device. The integrated unit 220 fetches the data from a NAS device over the network 105. In an exemplary embodiment, if the data is stored on the NAS, the integrated unit 220 fetches the data via network protocols. The data from the one or more data sources is useful for analysis, information retrieval and knowledge extraction after training the ML model. Upon retrieving the data from the one or more data sources, the retrieved data is stored in a data frame for further processing.
[0046] Upon retrieving the data, the retrieved data is stored in the data frame. Upon storing the retrieved data, the selection and input unit 225 is configured to receive one or more threshold values via one of the UI 215 and the UE 110. In an embodiment, the one or more threshold values are defined by the user via one of the UI 215 and the UE 110. The one or more threshold values refers to a pre-set limit used to categorize or filter the data. The one or more threshold values acts as a boundary that helps in decision-making, allowing users to determine which data points need to be considered valid, flagged as outliers, or excluded from further analysis. The one or more threshold values are set for managing outliers, defining acceptable ranges, and ensuring data quality. The user defines the one or more threshold values to at least one column of the one or more columns based on the data analysis. Each column is assigned with a specific threshold value. Further, each column is compared with a standard value (such as default value). In the data analysis, the data is often used to filter, classify, or make decisions based on the data being processed.
[0047] Upon receiving the one or more threshold values from the integrated unit 225, the processing unit 230 is configured to compare each of the retrieved data assigned to a respective column to the threshold value. In an embodiment, the processing unit 230 is also referred to as pre-processing unit. The comparing unit 230 obtains each of the retrieved data from the dataset and compares each of the retrieved data assigned to the respective column with the threshold value. If each of the retrieved data assigned to the respective column is within the threshold value, the system 120 retains the retrieved data for further processing without modification. If each of the retrieved data is assigned to the respective column to the threshold value based on a predefined condition. In an embodiment, the predefined condition includes the retrieved data assigned to the respective column is greater than the one or more threshold values. In another embodiment, the predefined condition includes the retrieved data assigned to the respective columns is less than the one or more threshold values. The processing unit 230 is also configured to assign the standard value to the respective column.
[0048] Upon comparing each of the retrieved data assigned to the respective column with the one or more threshold values, the processing unit 230 is configured to assign the standard value to the respective column. In an embodiment, the standard value is defined by the user based on a historic value. The standard value is a predefined value that the user can set, often at least based on the historical data trends or statistical distribution such as average, mean, mode, median, maximum, minimum. In an exemplary embodiment, the dataset of latency measurements from multiple network devices. Consider, the measured latency for router1 is 25 milliseconds, and the measured latency for router2 is 35 milliseconds. The threshold value for the measured latency is 30 milliseconds. The processing unit 230 compares the measured latency for router1 and the measured latency for router2 to the threshold value of the measured latency. According to the dataset, the measured latency for the router1 is within the threshold value, the system 120 retains each of the retrieved data for further processing without modification. The measured latency for the router2 exceeds the one or more threshold values, then the processing unit 230 assigns the standard value such as 30 milliseconds to the respective column. If the data point exceeds or does not meet the threshold value, the processing unit 230 removes the missing or inconsistent data values with the standard value, which improves the accuracy and quality of the dataset, and makes the dataset more reliable.
[0049] Upon assigning the standard value to the respective column, the standardized data is generated. The standardized data is stored in the storage unit 240 and used for Machine Learning (ML) training. Upon storing the standardized data, the valid data is stored and transmitted to the ML training unit 415 (as shown in FIG.4) to train the data. The standardized data is performed to enhance the data by assigning the standard value to the respective column, which helps Artificial Intelligence/Machine Learning (AI/ML) models to understand the structure and learn the patterns at a more granular level. The standardized data is provided to the AI/ML model training, which enhances the data clarity by removing the inconsistent or irrelevant data in the dataset. The standardized data helps to generate more accurate and contextually appropriate responses by the AI/ML model. In one embodiment, a few-shot learning is a machine learning framework in which the AI/ML model learns to make accurate predictions by training on a small number of labeled examples. By doing so, the system 120 reduces the number of unique data present in the one or more data sources and removes the variations in the data. Further, the AI/ML model utilizes a variety of ML techniques, such as supervised learning, unsupervised learning, and reinforcement learning.
[0050] In one embodiment, the supervised learning is a type of machine learning algorithm, which is trained on a labeled dataset. The supervised learning refers to each training example paired with an output label. The supervised learning algorithm learns to map inputs to a correct output. The supervised learning uses various algorithms (such as linear regression, decision trees, or neural networks) to learn the mapping from inputs to outputs. It minimizes the error between predicted outputs and actual labels through techniques like gradient descent. In one embodiment, the unsupervised learning is a type of machine learning algorithm, which is trained on data without any labels. The unsupervised learning algorithm tries to learn the underlying structure or distribution in the data in order to discover patterns or groupings. The unsupervised learning algorithm uses various techniques such as k-means clustering, and hierarchical clustering are used to group similar data points. In one embodiment, the reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize cumulative reward. The agent receives feedback in the form of rewards or penalties based on the actions it takes, and it learns a path that maps states of the environment to the best actions.
[0051] In an embodiment, the system 120 provides for pre-processing the data such as pursuant to receiving the user defined threshold value. The data values of the one or more columns are modified at code level automatically by assigning the standard value. Further, the processor 205 has the capability to perform pre-processing in order to remove noise such as, but not limited to, duplicates, missing values or inaccurate values from the remaining columns at the code level post deleting the unwanted rows, thereby facilitating the ML model to provide contextual and accurate predictions based on providing raw cleansed data. Advantageously, the present invention ensures that only the filtered data which does not have variations, and which is cleansed, and relevant data is only fed as training data to the ML model, which is as per user’s intent of how the ML model requires to be trained.
[0052] By performing standardization of the data, the system 120 removes the noise from the respective column based on the threshold value. Data cleaning is performed by comparing the column values based on the threshold value and the standardized value, which reduces the number of unique data present in the data source. Further, the system 120 removes the variations in the data and also cleans the data by removing the redundant data or the irrelevant data, thus improving data accuracy and efficiency, increasing processing speed of the processor 205 and reducing requirement of memory space.
[0053] FIG. 3 is a schematic representation of the system 120 in which various entities operations are explained, according to one or more embodiments of the present disclosure. Referring to FIG. 3, it describes the system 120 for standardizing the network data. It is to be noted that the embodiment with respect to FIG. 3 will be explained with respect to the first UE 110a for the purpose of description and illustration and should nowhere be construed as limited to the scope of the present disclosure.
[0054] As mentioned earlier in FIG.1, In an embodiment, the first UE 110a may encompass electronic apparatuses. These devices are illustrative of, but not restricted to, personal computers, laptops, tablets, smartphones (including phones), or other devices enabled for web connectivity. The scope of the first UE 110a explicitly extends to a broad spectrum of electronic devices capable of executing computing operations and accessing networked resources, thereby providing users with a versatile range of functionalities for both personal and professional applications. This embodiment acknowledges the evolving nature of electronic devices and their integral role in facilitating access to digital services and platforms. In an embodiment, the first UE 110a can be associated with multiple users. Each UE 110 is communicatively coupled with the processor 205 via the network 105.
[0055] The first UE 110a includes one or more primary processors 305 communicably coupled to the one or more processors 205 of the system 120. The one or more primary processors 305 are coupled with a memory unit 310 storing instructions which are executed by the one or more primary processors 305. Execution of the stored instructions by the one or more primary processors 305 enables the first UE 110a to receive, at least one of, the burst critical alert via the SMS or the email, when the total count of alarms is greater than the first predefined threshold for the pre-defined time.
[0056] Furthermore, the one or more primary processors 305 within the UE 110 are uniquely configured to execute a series of steps as described herein. This configuration underscores the processor 205 capability to standardize the network data. The operational synergy between the one or more primary processors 305 and the additional processors, guided by the executable instructions stored in the memory unit 310, facilitates a seamless standardization of the network data.
[0057] As mentioned earlier in FIG.2, the system 120 includes the one or more processors 205, the memory 210, and the user interface 215. The operations and functions of the one or more processors 205, the memory 210, and the user interface 215 are already explained in FIG. 2. For the sake of brevity, a similar description related to the working and operation of the system 120 as illustrated in FIG. 2 has been omitted to avoid repetition.
[0058] Further, the processor 205 includes the integrated unit 220, the selection and input unit 225, and the processing unit 230. The operations and functions of the integrated unit 220, the selection and input unit 225, and the processing unit 230 are already explained in FIG. 2. Hence, for the sake of brevity, a similar description related to the working and operation of the system 120 as illustrated in FIG. 2 has been omitted to avoid repetition. The limited description provided for the system 120 in FIG. 3, should be read with the description provided for the system 120 in the FIG. 2 above, and should not be construed as limiting the scope of the present disclosure.
[0059] FIG. 4 is a block diagram of an architecture 400 that can be implemented in the system of FIG.2, according to one or more embodiments of the present disclosure. The architecture 400 of the system 120 includes an integrated unit 220, a load balancer 405, and the processor 205. The processor 205 includes the selection and input unit 225, a pre-processing unit 230, a data source unit 410, and a machine learning training unit 415.
[0060] The architecture 300 of the system 120 is configured to interact with the integrated unit 220 and the load balancer 405. The integrated unit 220 is configured to access the relevant data in the network 105 and is capable of interacting with the server 115, the storage unit 240 to collect the data. In an embodiment, the integrated unit 220 includes, but not limited to, the one or more data sources, from where the data can be retrieved. In an embodiment, the data is retrieved as the file input, the source path, the input stream, the HTTP2, the DFS and the NAS.
[0061] The load balancer 405 includes distributing the one or more data sources request traffic across the one or more processors 205. The distribution of the one or more data source request traffic helps in managing and optimizing the workload, ensuring that no single processor is overwhelmed while improving overall system performance and reliability.
[0062] Upon retrieving the data and stored in the data frame, the selection and input unit 225 is configured to receive the one or more threshold values via one of the UI 215 and the UE 110. In an embodiment, the one or more threshold values are defined by the user via one of the UI 215 and the UE 110. The user selects the one or more threshold values to be set to the column values. Setting the threshold value is essential for managing outliers, defining acceptable ranges, and ensuring data quality. The user selects the one or more threshold values for setting the one or more columns based on the data analysis. In the data analysis, the data is often used to filter, classify, or make decisions based on the data being processed.
[0063] Upon selection of the one or more threshold values, the pre-processing unit 230 is configured to compare each of the retrieved data assigned to the respective column to the one or more threshold values. If each of the retrieved data assigned to the respective column is within the one or more threshold values, the system 120 retains each of the retrieved data for further processing without modification. If each of the retrieved data assigned to the respective column is greater than the one or more threshold values, the system 120 assigns the standard value to the respective column. In an embodiment, the standard value is defined by the user based on the historic value. The standard value is the predefined value that the user can set, often at least based on the historical data trends or statistical distribution such as average, mean, mode, median, maximum, minimum. In one embodiment, the standard value is defined by a pre-trained model.
[0064] Upon assigning the standard value to the respective column, the standardized data is generated. The data source unit 410 is configured to store the standardized data and used for Machine Learning (ML) training. The data source is updated after the preprocessing of the data. The pre-processed standardized data is stored in the data source unit 410. Upon storing the standardized data in the data source unit 410, the data source unit 410 transmits the standardized data to the ML training unit 415. The ML training unit 415 is configured to train the ML model by using the standardized data from the one or more data sources. Further, the ML training unit 415 applies various machine learning algorithms to the standardized data to create predictive or analytical models.
[0065] FIG. 5 is a signal flow diagram illustrating for standardizing the network data, according to one or more embodiments of the present disclosure.
[0066] At 505, Initially, the request is transmitted by the user via the UI 215 for cleaning and normalizing the data from the data frame by assigning the standard value to the one or more columns.
[0067] At 510, upon receiving the request, the integrated unit 220 is configured to retrieve the data from the one or more data sources. In an embodiment, the one or more data sources include at least, the file input, the source path, the input stream, the HTTP 2, the DFS, and the NAS. Upon retrieving the data, the retrieved data is stored in the data frame for further processing.
[0068] At 515, upon retrieving the data and stored in the data frame, the selection and input unit 225 is configured to receive the one or more threshold values via one of the UI 215 and the UE 110. In an embodiment, the one or more threshold values is defined by the user via one of the UI 215 and the UE 110. The user selects the one or more threshold values to be set to the column values. Setting the threshold value is essential for managing outliers, defining acceptable ranges, and ensuring data quality. The user selects the one or more threshold values for setting the one or more columns based on the data analysis. In the data analysis, the data is often used to filter, classify, or make decisions based on the data being processed.
[0069] At 520, upon selection of the one or more threshold values, the pre-processing unit 230 is configured to compare each of the retrieved data assigned to the respective column to the one or more threshold values. If each of the retrieved data assigned to the respective column is within the one or more threshold values, the system 120 retains each of the retrieved data for further processing without modification. If each of the retrieved data assigned to the respective column exceeds the one or more threshold values, the system 120 assigns the standard value to the respective column. In an embodiment, the standard value is defined by the user based on the historic value. The standard value is the predefined value that the user can set, often at least based on the historical data trends or statistical distribution such as average, mean, mode, median, maximum, minimum. In one embodiment, the standard value is defined by a pre-trained model.
[0070] At 525, upon assigning the standard value to the respective column, the standardized data is generated. The data source unit 410 is configured to store the standardized data and used for Machine Learning (ML) training. The data source is updated after the preprocessing of the data. The pre-processed standardized data is stored in the data source unit 410. Upon storing the standardized data in the data source unit 410, the data source unit 410 transmits the standardized data to the ML training unit 415. The ML training unit 415 is configured to train the ML model by using the standardized data from the one or more data sources. Further, the ML training unit 415 applies various machine learning algorithms to the standardized data to create predictive or analytical models.
[0071] FIG. 6 is a flow diagram illustrating a method 600 for standardizing the network data, according to one or more embodiments of the present disclosure.
[0072] At step 605, the method 600 includes the step of retrieving the data from the one or more data sources by the integrated unit 220. In an embodiment, the one or more data sources include at least, the file input, the source path, the input stream, the Hyper Text Transfer Protocol 2 (HTTP 2), the Distributed File System (DFS), and the Network Attached Storage (NAS). The file input refers to reading data from the files stored locally or on the server 115. The files can be in different formats, including, but not limited to, the Comma Separated Values (CSV), the JavaScript Object Notation (JSON), the extensible Markup Language (XML), or the text files. In an exemplary embodiment, the data is stored in the CSV file, and the integrated unit 220 fetches the data for processing. The integrated unit 220 retrieves the data from the file and loads it into the memory 210 for further processing.
[0073] At step 610, the method 600 includes the step of receiving the one or more threshold values via one of the UI 215 and the UE 110 by the selection and input unit 225. In an embodiment, the one or more threshold values are defined by the user via one of the UI 215 and the UE 110. The one or more threshold values refers to the predetermined limit that helps in making decisions about how to handle specific data points. Setting the one or more threshold values are essential for managing outliers, defining acceptable ranges, and ensuring data quality. The user defines the one or more threshold values for the one or more columns based on the data analysis. In the data analysis, the data is often used to filter, classify, or make decisions based on the data being processed.
[0074] At step 615, the method 600 includes the step of comparing each of the retrieved data assigned to a respective column to the one or more threshold values by the processing unit 230. The processing unit 230 obtains each data point from the retrieved dataset and compares each of the retrieved data assigned to the respective column with the one or more threshold values. If each of the retrieved data assigned to the respective column is within the threshold value, the system 120 retains each of the retrieved data for further processing without modification. If each of the retrieved data assigned to the respective column greater than the one or more threshold values, the processing 230 is configured to assign a standard value to the respective column.
[0075] At step 620, the method 600 includes the step of assigning the standard value to the respective column based on a predefined condition by the processing unit 230. In an embodiment, the predefined condition includes the retrieved data assigned to the respective column is greater than the one or more threshold values. In another embodiment, the predefined condition includes the retrieved data assigned to the respective columns is less than the one or more threshold values. In an embodiment, the standard value is defined by the user based on the historic value. The standard value is the predefined value that the user can set, often at least based on the historical data trends or statistical distribution such as average, mean, mode, median, maximum, minimum. Upon assigning the standard value to the respective column, the standardized data is generated. The standardized data is stored in the storage unit 240 and used for Machine Learning (ML) training. Upon storing the standardized data, the valid data is stored and transmitted to the ML training unit 415 to train the data. The standardized data is performed to enhance the data by assigning the standard value to the respective column, which helps Artificial Intelligence/Machine Learning (AI/ML) models to understand the structure and learn the patterns at a more granular level. The standardized data is provided to the AI/ML model training, which enhances the data clarity by removing the inconsistent or irrelevant data in the dataset. The standardized data helps to generate more accurate and contextually appropriate responses by the AI/ML model.
[0076] A person of ordinary skill in the art will readily ascertain that the illustrated embodiments and steps in description and drawings (FIGS.1-6) are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0077] The present disclosure provides technical advancement for assigning the standard value to the one or more columns, pre-processing the data such as pursuant to receiving user defined threshold values or ranges, the data values of one or more columns are modified at code level automatically by assigning the standard value. Further, the processor 205 also has the capability to perform pre-processing in order to remove noise such as, but not limited to, duplicates, missing values or inaccurate values from the remaining columns at the code level post deleting the unwanted rows, thereby facilitating the ML model to provide contextual and accurate predictions based on providing raw cleansed data. Advantageously, the present invention ensures that only filtered data which does not have variations, and which is cleansed, and the relevant data is only fed as training data to the ML model, which is as per user’s intent of how the ML model requires to be trained.
[0078] The present invention offers multiple advantages over the prior art and the above listed are a few examples to emphasize on some of the advantageous features. The listed advantages are to be read in a non-limiting manner.

REFERENCE NUMERALS

[0079] Environment - 100
[0080] Network-105
[0081] User equipment- 110
[0082] Server - 115
[0083] System -120
[0084] Processor - 205
[0085] Memory - 210
[0086] User interface-215
[0087] Storage unit– 240
[0088] Integrated system-220
[0089] Selection and input unit-225
[0090] Pre-processing unit- 230
[0091] Primary processor- 305
[0092] Memory unit- 310
[0093] Load balancer- 405
[0094] Data source unit- 410
[0095] Machine Learning training unit- 415
,CLAIMS:CLAIMS
We Claim:
1. A method (500) of standardizing network data, the method (500) comprising the steps of:
retrieving, by one or more processors (205), network data from one or more data sources;
receiving, by the one or more processors (205), one or more threshold values via one of a User Interface (UI) (215) and a User Equipment (UE) (110);
comparing, by the one or more processors (205), each of the retrieved data assigned to a respective column to the one or more threshold values; and
assigning, by the one or more processors (205), a standard value to the respective column based on a predefined condition, and thereby standardizing the network data.

2. The method (500) as claimed in claim 1, wherein the one or more data sources include at least, a file input, a source path, an input stream, a Hyper Text Transfer Protocol 2 (HTTP 2), a Distributed File System (DFS), and a Network Attached Storage (NAS).

3. The method (500) as claimed in claim 1, wherein the retrieved data is stored in a data frame.

4. The method (500) as claimed in claim 1, wherein the standardized data is stored in a storage unit (240) and used for Machine Learning (ML) training.

5. The method (500) as claimed in claim 1, wherein the standard value is defined by a user at least based on a historic value or statistical distribution.
6. The method (500) as claimed in claim 1, wherein the predefined condition comprises the retrieved data assigned to the respective column is at least one of, greater than, or less than the one or more threshold values.

7. A system (120) for standardizing network data, the system (120) comprising:
an integrated unit (220) configured to retrieve, data from one or more data sources;
a selection and input unit (225) configured to receive, a one or more threshold values via one of a User Interface (UI) (215) and a User Equipment (UE) (110);
a processing unit (230) configured to compare, each of the retrieved data assigned to a respective column to the one or more threshold values; and
the processing unit (230) configured to assign, a standard value to the respective column based on a predefined condition, and thereby standardizing the network data.

8. The system (120) as claimed in claim 7, wherein the one or more data sources include at least, a file input, a source path, an input stream, a Hyper Text Transfer Protocol 2 (HTTP 2), a Distributed File System (DFS), and a Network Attached Storage (NAS).

9. The system (120) as claimed in claim 7, wherein the retrieved data is stored in a data frame.

10. The system (120) as claimed in claim 7, wherein the standardized data is stored in a storage unit (240) and used for Machine Learning (ML) training.

11. The system (120) as claimed in claim 7, wherein the standard value is defined by the user based on a historic value.

12. The system (120) as claimed in claim 7, wherein the predefined condition comprises the retrieved data assigned to the respective column is at least one of, greater than, or less than the one or more threshold values.

13. A User Equipment (UE) (110), comprising:
one or more primary processors (305) communicatively coupled to the one or more processors (205), the one or more primary processors (305) coupled with a memory unit (310), wherein said memory unit (310) stores instructions which when executed by the one or more primary processors (305) cause the UE (110) to:
select a column and set a standard value and the one or more threshold values for the column via the User Interface (UI) (215); and
wherein the one or more processors (205) are configured to perform the steps as claimed in claim 1.

Documents

Application Documents

#	Name	Date
1	202321067269-STATEMENT OF UNDERTAKING (FORM 3) [06-10-2023(online)].pdf	2023-10-06
2	202321067269-PROVISIONAL SPECIFICATION [06-10-2023(online)].pdf	2023-10-06
3	202321067269-FORM 1 [06-10-2023(online)].pdf	2023-10-06
4	202321067269-FIGURE OF ABSTRACT [06-10-2023(online)].pdf	2023-10-06
5	202321067269-DRAWINGS [06-10-2023(online)].pdf	2023-10-06
6	202321067269-DECLARATION OF INVENTORSHIP (FORM 5) [06-10-2023(online)].pdf	2023-10-06
7	202321067269-FORM-26 [27-11-2023(online)].pdf	2023-11-27
8	202321067269-DRAWING [06-10-2024(online)].pdf	2024-10-06
9	202321067269-COMPLETE SPECIFICATION [06-10-2024(online)].pdf	2024-10-06
10	Abstract.jpg	2024-12-07