System And Method Enabling Load Balancing Of Distributed Computing

System And Method Enabling Load Balancing Of Distributed Computing Platforms Involved In Large Data Processing

Abstract: A computer-implemented method and system for balancing load in a distributed computing platform while performing data processing tasks is disclosed herein. According to the method and system of the invention, data processing tasks of data extract, data transform and data load are performed by a cluster of nodes at specific time stamp. The data to be processed is extracted in assorted formats. Further the extracted data is combination of structured, unstructured and semi-structured data. The system further develops and scheduling the data flow model that represents the flow of data from the source node to a target node. Further, the system is adapted to transform the extracted data by injecting one or more rules and loading the data to the target node for future analysis and references. [F1G.1]

Patent Information

Application #

Filing Date

30 August 2012

Publication Number

11/2014

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Patent Number

Legal Status

Grant Date

2020-10-15

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

NIRMAL BUILDING, 9TH FLOOR, NARIMAN POINT, MUMBAI 400021, MAHARASHTRA, INDIA.

Inventors

1. ANTONY, SINO

TATA CONSULTANCY SERVICES ABHILASH SOFTWARE DEVELOPMENT CENTRE, PLOT NO. 96, EPIP INDUSTRIAL AREA, WHITEFIELD, BANGALORE - 560066, KARNATAKA, INDIA

2. VARGHESE, ABRAHAM

TATA CONSULTANCY SERVICES ABHILASH SOFTWARE DEVELOPMENT CENTRE, PLOT NO. 96, EPIP INDUSTRIAL AREA, WHITEFIELD, BANGALORE - 560066, KARNATAKA, INDIA

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD ENABLING LOAD BALANCING OF DISTRIBUTED COMPUTING PLATFORMS INVOLVED IN LARGE DATA PROCESSING
Applicant:
TATA Consultancy Services Limited
A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to I performed.

FIELD OF THE INVENTION
The present invention relates to the field of management and processing of data such as Big Data. More particularly, it relates to a system and method for enabling load balancing of distributed computing platforms facilitating Big Data analytics and processing.
BACKGROUND OF THE INVENTION
With the advent of Information Technology (IT), the amount of data to he stored and processed is escalating. In general, there has been tremendous growth in storage of large of amount of data received from different sources which is further processed and analyzed to retrieve information to meet the business and technical needs of the different IT-enabled enterprises worldwide. The different sources from where data originates includes structure data sources, unstructured data sources, semi-structured data sources or combinations thereof.
The unstructured data source are those having unstructured data in which the data are stored in assorted formats. The unstructured data does not have any pre-defined data model and does not fit well into relational tables. The term unstructured data refers to data that has no identifiable structure. The examples of such unstructured data includes heavy text data, email data, rich documents, images, relational data and social media data, dates, numbers, and facts etc. Further the unstructured data lack building a relation data schema having certain fields, attributes and properties for different data types. Hence, the data received from the unstructured resources is un-organized, staggered and lacks relational metadata.
In contrast, the structured data is stored in a structured manner using a database table comprising several rows and columns. The intersection of each row and column is represented as a data cell and data is stored in each cell based on its data type. Further the structured data refers to data that is identifiable as it is organized in a structure manner that enables to extract the relevant data using few of the extracting tools known in the art. The

structured data is also searchable based on its data type within content and easily understood by computers/human readers as it is efficiently organized in the source storing it.
As the data to be processed and/or analyzed is received from different data sources, this result in irregularities and ambiguities in understanding the complexities associated with the data for analysis using traditional computer programs. Such complexities are further increased if the data extracted from variety of said data sources is of significant size commonly referred to as big data. Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target ranging from a few dozen terabytes to many petabytes of data in a single data set. The variety of such big data available across data warehouses, data marts, content management systems, file systems, streaming data etc need to be analyzed/processed to extract information to meet the business and technical needs of the target.
At present, there are some platforms available to process the un-meaningful data for extracting the meaningful data for fulfilling the operational needs of the organization. One of the most common platforms available is a traditional Extraction, Transform and Load (ETL) framework that enables extracting the data from data sources, transforming the extracted data and loading the transformed data into the end target, usually the data warehouse. It has also been observed that large data processing (in the like of 100s of Terabytes or Petabytes of Data), when processed in tradition systems like ETL framework, the performance of the systems degrade as the load increases. Further supporting variety of data extracted from different data sources in a scalable model is challenging for the existing ETL framework. Further as the data size grows, it is become very difficult to manage and process the data with traditional ETL tools. In addition, there is ample amount of information in the unstructured data that needs to be extracted from variety of data sources. Since, most of the relevant information available in the unstructured data sources is in assorted formats, it becomes very challenging for the existing systems and method to analyze and process the structured and unstructured data in a scalable manner. Further, the processing of data in the range of several

terabytes and/or petabytes using existing frameworks may result in the degradation, overloading and imbalance of the systems.
Thus, in view of the lacunae observed in the art there is a long-felt need in the art for a system and method that enables for scalable data processing in a network by facilitating parallel execution of multiple processing tasks thereby avoiding degradation, imbalance and overloading of the network. Further, there is a need in the art for the system and method that enables processing of large amount data such as big data received from structured, unstructured and semi-structured resources.
OBJECTS OF THE INVENTION
The primary object of a present invention is to enable a system and method for load balancing across distributed computing platforms while performing data processing tasks of data extraction, data transformation or processing and data loading.
Yet another object of the invention is to provide a system and method for modeling a data flow to be extracted, transformed and load from a source node to a target node by a cluster of nodes.
Yet another object of the invention is to provide a system and method for scheduling the modeled data flow for execution of the data processing tasks by the cluster of nodes
Still another object of the invention is to provide a system and method for extracting data in assorted formats from variety of data sources, injecting a series of rules to the extracted data to derive a transformed data and loading the transformed data into the target node for analysis.

SUMMARY OF THE INVENTION
Before the present systems and methods, enablement are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application.
In one embodiment, the present invention discloses a system and method for balancing load in a distributed computing platform by parallel execution of data processing tasks of data extract, data transform and data loading. Further, the method and system enables generating a model data flow representing the flow of data and inter-platform dependencies amongst the plurality of nodes in the distributed computing platforms. The method and system also enables to store the generated model flow in a model flow repository that can be dynamically modified as per the load balancing requirements using a web-based interface.
In an aspect of the present invention, a method for load balancing while processing a large volume of data in a distributed computing platform is disclosed herein. According to the method of the present invention, said load balancing is achieved by parallel execution of data processing tasks of data extract, data transform and data load by one or more nodes from a cluster of nodes. In an aspect of the invention, the method is adapted to identify the data to be extracted from a structured or an unstructured data source. The said method further adapted to identify a set of nodes in the cluster to execute said data processing tasks at a particular time stamp on the basis of current load on the cluster of nodes. The method further models a data flow representing the flow of data to extract and load data from a source node to target node. It further represents dependencies between tasks amongst identified set of nodes for scheduling the modeled data flow for execution of the data processing tasks. The said method • enables processing large amount of data extracted from both structured and unstructured data by concurrently executing the steps of extracting data from a source node in assorted formats,

transforming the extracted data into the target node's compatible format by injecting one or more rules and loading the transformed data to the target node based on the scheduled modeled data flow.
In another embodiment of the present invention discloses a system for load balancing across distributed computing platforms. The system comprises a pre-processing module adapted to identify a set of nodes in the cluster to execute said data processing tasks at a particular time stamp on the basis of current load on the cluster of nodes. A data flow modeler is configured to derive a data flow representing the flow of data to extract and load data from a source node to target node and to derive dependencies between tasks amongst identified set of nodes. The system further comprises a data scheduler adapted to schedule the derived data flow for execution of the data processing tasks. In one embodiment of the invention, a data extraction module is configured to extract data from a source node in assorted formats wherein said data extraction module further comprising of a text query language module and a data transformation language module. The text query language module and the data transformation language module are configured to extract data from unstructured data sources and structured respectively, a data transforming module to convert the extracted data into the target node's compatible format by injecting one or more rules. Further a data loader is adapted to load the transformed data to the target node based on the scheduled modeled data flow and a report generator configured to generate the real-time reports for future analysis and reference. The system further enables dynamic modifications of derived data flow to modify the job processing tasks by the individual nodes in the distributed platform by means of common modeling environment.
BRIEF DESCRIPTION OF DRAWINGS
The foregoing summary, as well as the following detailed description of embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the present document example constructions of

the invention; however, the invention is not limited to the specific methods and apparatus disclosed in the document and the drawings:
Figure 1 is an architecture diagram (100) illustrating various system elements enabling the process for balancing load while performing big data processing tasks of data extraction, data transformation and data loading in a distributed computing platform according to an exemplary embodiment of the invention.
Figures 2 illustrate a flow diagram (200) showing various steps implemented by various system elements collectively for balancing load while performing big data processing tasks across distributed computing platforms according to an exemplary embodiment of the invention.
The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION OF THE INVENTION
Some embodiments of this invention, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the exemplary, systems and methods are now

described. The disclosed embodiments are mefely exemplary of the invention, which may be embodied in various forms.
The present invention enables a system and method of improving the processing speed of a distributed computing platform adapted to process large volume of data herein referred to as big data. The data to be processed is extracted from variety of different data sources in assorted formats. The platform is further adapted to identify whether the data extracted is from a structured or an unstructured data source. The platform real-time monitors the load in the network to identify a set of nodes in the duster for performing the big data processing tasks of data extraction, data transformation and data loading. In an embodiment of the invention, a common data processing language (CPL) is configured to extract big data from both structured and unstructured data sources. The system comprises a connector library configured to connect the structured data sources and unstructured data sources. The distributed computing platform further distributes the data processing tasks by dividing the tasks into splits. Each split of data operated on the identified cluster of nodes. Further the distributed computing platform enables parallel processing of the data processing tasks by divided into splits that get distributed amongst said cluster of nodes. The distributed computing platform further integrates plurality of modules that assists in assigning the big data processing tasks amongst said cluster of nodes. Thus it enables the distributed computing platform to balance load while processing large volume of data processing tasks and improving the big data processing speed by distributing the data processing tasks amongst identified cluster of nodes.
In one embodiment of the present invention, the system and method is configured to generate a common flow model to model a dataflow representing the flow of data while performing the big data processing tasks of data extraction, data transformation and data loading from the source node to the target node. The system and method further orchestrates dependencies between big data processing tasks amongst identified cluster of nodes. The system and method further provides a common monitoring environment across the distributed computing platform to monitor the tasks performed by the cluster of nodes involved in processing the

tasks. In one embodiment of the invention, the common monitoring can be performed through a web based interface for monitoring and managing the modeling of data flow amongst said cluster of nodes. The system and method further enables dynamic modeling of data flow model using an extreme data controller wherein system enables modifications in the modeled data flow based on the load on each node in the cluster. It further enables to modify the cluster of nodes by adding or removing a node from the cluster based on the basis of load on each node at a particular time stamp through the web based interface. In an alternative embodiment of the invention, the system and method further enables a user to manipulate the identified cluster of nodes by adding or removing a node from the cluster for performing data processing through the web based interface by verifying the load on each node at a particular time stamp.
In one embodiment of the present invention, the system and method adapted to schedule the derived data flow for execution of the big data processing tasks through a common data schedule. The system and method further allows storing the schedule of each node of the distributed computing platform in the data flow repository for future references. In one embodiment of the invention, the system further comprises system and method configured to convert the extracted data into the target node's compatible format by injecting one or more rules. The transformation of extracted data can be performed by applying a series of rules/functions to derive the data for loading into the end target's node compatible format. In an embodiment, data extracted from some data sources will require very little or even no manipulation of data to meet the business and technical needs of the target's node. The system and method further configured to load the transformed data into the target node based on the scheduled modeled data flow.
In one embodiment of the invention, the above defined parallel data processing tasks of data extraction, data transformation and data load performed by the distributed computing platform in a common data processing runtime environment (CDPR). This is achieved using a using common data processing language (CPL) and with the help of common flow modeling. The common processing language further comprising of text query language

(TQL) and data transformation language (DTL) to extract data from unstructured and structured data sources. The platform further connects to various structured and unstructured data sources and extract or load data in parallel by using the connector library. The platform is further adapted to monitor tasks and scheduling data processing tasks assigned to each node in the cluster in a common monitoring and common scheduling environment.
Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. For example, although the present invention will be described in the context of a system and method for balancing load in a distributed computing platform for performing big data (large volume of structured and unstructured data) processing tasks, one of ordinary skill in the art will readily recognize that the system and method can be utilized in any situation where there is a need to extract, transform and load (ETL) data, especially in data warehousing that involves: Extracting data from outside sources, Transforming it to fit operational needs and loading it into the end target (database or data warehouse) for future analysis and references. Thus, the present invention is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein. Various embodiments of the present invention will now be described with the help of appended figures.
Referring to figure 1 is a system architecture diagram illustrating multiple system elements adapted to for balancing load while performing big data processing tasks of data extraction, data transformation and data loading in a distributed computing platform (100) according to an exemplary embodiment of the invention. As illustrated in figure 1, the distributed computing platform (100) comprises variety of data source node (104) receiving data from combination of structured data sources (106) and unstructured data sources (108). The structured data sources (106) generally include data wherein specific information is stored based on a methodology of columns and rows. In contrast, the term unstructured data sources (108) refers to data that has no identifiable structure. Few of the examples of unstructured data include images, videos, email, documents and text etc which can all be combined within

a dataset. In an exemplary embodiment, the distributed computing platform (100) is an electronic device selected from a group consisting of a computer, a laptop, a Smartphone and combinations thereof. In one embodiment, the distributed computing platform (100) is a stand-alone electronic device. In alternative embodiments, the distributed computing platform (100) is an electronic device electronically coupled to various other electronic devices in a communication network.
As illustrated in figure 1, in an exemplary embodiment, the distributed computing platform (100) further is electronically coupled with a web based interface (102). In one embodiment, said web based interface (102) resides in the distributed computing platform (100). In another embodiment, the web based interface (102) is a computing device electronically coupled to the distributed computing platform (100) through a communication network. In alternative embodiments, said communication network includes from a group consisting of a LAN, a MAN, a WAN, an intranet, an internet, Wi-Fi, a cellular network and combination thereof. Further the web based interface (102) is built on a common modeling platform, hereinafter referred to as an extreme data controller (XDC) (110) configured to model a dataflow and responsible for controlling and managing the injection of rules enabling transformation of data in the distributed computing platform (100).
Further, as illustrated in figure 1, the distributed computing platform (100) further comprises a report generator (112), a data scheduler (114), a data flow modeler (116), a rule engine (118), a monitoring module (120), a data profiler (122), a Pre-Processing module (124), a data extraction module (126), wherein said data extraction module (126) further comprising of a text query language module (128) and a Data transform language module (130) for extracting data from the data source node (104) in assorted formats. Further the system (100) comprises a data transforming module (132) that along with said data extraction module (126) and a data loader (134) performs data processing jobs on a large volume of data such as big data. The system further comprises a target node (134) for loading the transformed data based on the scheduled modeled data flow.

In an exemplary embodiment of the invention, the method of the present invention enables balancing load in a distributed computing platform during processing of large volume of structured and unstructured data such as big data. In an exemplary embodiment such big data is extracted from structured data sources (106) and unstructured data sources (108) in the distributed computing platform (100) using a connector library (not shown in figure 1). The balancing of load in the platform is achieved by parallel execution of data processing tasks of data extract, data transform and data load on a cluster of nodes. In one embodiment of the invention, the data can be extracted from the structured data source and unstructured data source using a common data processing language (CPL). The structured data is extracted from the structured data source (106) such as relational databases like oracle, My-SQL or combinations thereof. The data stored in relational database are arranged in a sequential manner and can be easily retrieved by using various third party open source tools. In an embodiment of the present invention the data can also be extracted from the unstructured data sources (108) like email, rich documents, images, relational data and social media data etc.
In an exemplary embodiment of the invention, the pre-processing module (124) is configured to identify the cluster of nodes that are assigned to execute the big data processing tasks in a common data processing runtime environment. The big data processing tasks are performed in a specific data flow at a particular time stamp based on current load on the cluster of nodes. Further the cluster of nodes (not shown in the figure) is the set of individual processing units capable of processing the big data in the distributed computing platform. The processing units can be any computing device comprising a processor, said processor coupled to a memory storing instructions that when executed enables the processor to processes the big data processing tasks.
In an exemplary embodiment of the invention, the data flow modeler (116) is configured to derive a common data flow model representing the flow of data having both structured and unstructured data to extract and load from a data extracted from the source node and loaded into the target node. The data flow modeler is adapted to organize the network of the

identified nodes and designing the flow of data amongst the network of nodes for further processing. The data flow modeler (116) further assists in interpreting the flow modeled and trigger the required tasks to complete the flow. The data flow modeler (116) is further enabled to arrange dependencies amongst identified set of platforms. In an embodiment of the invention, the data flow modeler (116) further assists in classifying the load on each node of the cluster that enables the pre-processing module (124) to identify and allocate specific nodes for processing specific tasks. In one embodiment of the invention, the derived current and any of the previous data flow models are stored in a data flow repository (not shown in the figure) for future references and monitoring.
In an exemplary embodiment of the invention, the distributed computing platform (100) further provides a common monitoring environment across the distributed computing platform to monitor the overall data processing tasks. More particularly, the data processing tasks of data extraction, data transformation and data load for data received from both structured and unstructured resources is monitored using a common monitoring environment such as the monitoring module (120). Further the monitoring module (120) is adapted to monitor the performance and load on each node in the derived data flow for execution of the big data processing tasks. In one embodiment of the invention, the said platform (100) further configures the web-based interface (102) for monitoring and managing the modeled data flow. The data flow modeler (116) further enables dynamic modifications based on the load on each node in the cluster through said web based interface (102). It further enables to modify the cluster of nodes by adding or removing a node from the cluster based on the basis of load on each node at a particular time stamp within the distributed computing platform (100). In an another embodiment of the invention, the modeler further allows a user to manipulate the identified cluster of nodes by adding or removing a node from the cluster for performing big data processing through the web based interface (102) by verifying the load on each node at a particular time stamp. Further, in one embodiment of the invention, the platform (100) enables to schedule the modeled data flow through a common scheduler. The said scheduler is the data scheduler (114) configured to schedule said derived data flow for execution of the data processing tasks for big data received from both structured and

unstructured resources. The data scheduler (114) is responsible for scheduling the data flow derived by said data flow modeler (116).
In an exemplary embodiment of the invention, the big data can be extracted from the variety of data sources such as structured and unstructured resources using a common data processing language (CPL) The CPL enables to extract structured and unstructured data from the variety of data sources by triggering text query language (TQL) and data transformation language (DTL) respectively. The system and method further enables the connector library to connect the structured data sources and unstructured data sources for data extraction and retrieval. In an embodiment of the invention, the data extraction module (126) is configured to extract data from the data source node (104) in assorted formats wherein said data extraction module (126) further comprising of the text query language module (128) and the data transform language module (130). The data extraction module (126) is further adapted to extract data from the structured data source (106) or the unstructured data source (108). The text query language module (128) and the data transform language module (130) are configured to extract data from unstructured and structured data sources respectively. In an embodiment of the invention, the text query language module (128) is configured to extract the unstructured data using text query language (TQL). In another embodiment of the invention, the data transform language module (130) is configured to extract the structured data using data transform language (DTL). The data extraction module (126) further allows import and export of data from structured data stores such as relational databases, enterprise data warehouses, and No-SQL systems etc
In an exemplary embodiment of the invention, the data transforming module (132) is further configured to convert the extracted data into target node's (136) compatible format. The rules are generated by the rule engine (118) consisting of rules, functions or alike. The rule engine (118) is adapted to transform the extracted data by injecting one or more rules. The rules are used to externalize the data transformation rules and parameters wherein rules metadata are stored in a rule repository (not shown in the figure I). The transformed data is further loaded into the target node (136) by the data loader (134) based on the scheduled modeled data flow.

In an alternative embodiment, the data extracted from variety of sources require very little or even no manipulation of data as the extracted data is already in the target's node (136) compatible format. Further one or more of the transformation types are required to meet the business and technical needs of the target's node (136).
In addition to the above defined components, the system further comprises a report generator (112) configured to generate the real-time dynamic reports from the transformed data, including data retrieval using JDBC (Java Database Connectivity), as well as support for parameters, expressions, variables, and groups for future references and analysis. In one embodiment, the system further comprises a data profiler (122). The data profiler (122) profiles the source data (structured and unstructured) before data transformation for providing the data quality while performing big data processing tasks.
Figures 2 illustrate a flow diagram (200) showing various steps implemented by various system elements collectively for balancing load while performing big data processing tasks in a distributed computing platform according to an exemplary embodiment of the invention.
At step (202), the system and method of the present invention identifies a set of nodes to execute big data processing tasks.
At step (204), the system and method of the present invention develops a model of a data flow representing the flow of data between identified set of nodes.
At step (206), the system and method of the present invention schedules the modeled data flow and stores the scheduled modeled flow for future references.
The system and method of the invention then concurrently executes the steps from (208) to (212). At step (208), the system and method of the present invention, extract data from a source node in assorted formats.

At step (210), the system and method of the present invention further transforms the extracted data into one or more formats by injecting one or more rules; and
At step (212) the system and method of the present invention loads the transformed data into the target node for analysis and future references.
ADVANTAGES OF THE INVENTION
The present invention enables reduction in time required to process large volume of data extracted from variety data sources.
The present invention also enables data extraction from unstructured data sources consisting of variety of data like email, rich documents, images, relational data and social media data etc.
The present invention enables load balancing in a distributed computing platform while performing parallel data processing tasks such as data extraction, data transformation and data loading from a source node to a target node.
The present invention enables transformation of extracted data based on the target node's compatible format by injecting certain rules.
The present invention enables to model the data flow representing the flow of data to extract and load data from a source node to target node and further enables a user to monitor through a web based interface.
The methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above. The machine may comprise a server computer, a client user computer, a

personal computer (PC), a tablet PC, a laptop computer, a desktop computer, or any machine capable of executing a set of instructions (seqnential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The machine may include a processor (e.g., a central processing unit (CPU)), a memory which communicates with each other via a bus.The memory stores the instructions when executed, may cause the processor of the machine to perform any one or more of the methodologies discussed above.
The illustrations of arrangements described herein are intended to provide a general understanding of the structure of various embodiment and they are not intended to serve as a complete description of all the elements and features of systems that might make use of the structures described herein. Many other arrangements will be apparent to those of skill in the art upon reviewing the above description. Other arrangements may be utilized and derived there from, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Figures are also merely representational and may not be drawn to scale. Certain proportions there of may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

CLAIMS:
1. A method for load balancing in a distributed computing platform characterized by
parallel execution of data processing tasks of data extract, data transform and data
load on a cluster of nodes, wherein said data being extracted from a structured or an
unstructured data source, the method comprising processor implemented steps of:
a) identifying a set of nodes in the cluster to execute said data processing tasks at a particular time stamp on the basis of current load on the cluster of nodes;
b) modeling a data flow representing the flow of data to extract and load data from a source node to target node and dependencies between tasks amongst identified set of nodes;
c) scheduling the modeled data flow for execution of the data processing tasks;
d) concurrently executing the steps of:
i. extracting data from a source node in assorted formats; ii. transforming the extracted data into the target node's compatible
format by injecting one or more rules; and iii. loading the transformed data to the target node based on the scheduled
modeled data flow; and;
e) dynamically modifying the modeled and scheduled data flow through a web-
based interface.
2. The method of claim 1, wherein said assorted formats can be selected from group consisting of structured, unstructured data sources or combinations thereof.
3. The method of claim 2, wherein said structured data sources can be selected from group consisting of relational databases, flat files or Information Management System (IMS), data structures or combinations thereof.

4. The method of claim 2, wherein said unstructured data sources can be selected from group consisting of email, rich documents, images, relational data, social media data or combinations thereof.
5. The method of claim 1, wherein said one or more rules are injected to select certain columns in the database, translating coded values, encoding free-form values, deriving a new calculated value, sorting, joining data from multiple sources, aggregation, disaggregation or alike or combinations thereof.
6. The method of claim 1, wherein said modeled data flow is stored in a flow repository for future reference and monitoring.
7. The method of claim 1, wherein each node in the scheduled model flow is assigned to perform the data processing tasks of extracting data, transforming data and loading data at the specified time period.
8. A system for load balancing in a distributed computing platform characterized by parallel execution of data processing tasks of data extract, data transform and data load on a cluster of nodes, wherein said data being extracted from a structured or an unstructured data source, the system comprising:

a) a pre-processing module adapted to identify a set of nodes in the cluster to execute said data processing tasks at a particular time stamp on the basis of current load on the cluster of nodes;
b) a data flow modeler configured to derive a data flow representing the flow of data to extract and load data from a source node to target node and orchestrate dependencies between tasks amongst identified set of nodes;
c) a data scheduler adapted to schedule the derived data flow for execution of the data processing tasks;

d) a data extraction module to extract data from a data source node in assorted formats wherein said data extraction module further comprising of a text query language module and a data transformation language module;
e) a data transforming module to convert the extracted data into the target node's compatible format by injecting one or more rules;
f) a data loader adapted to load the transformed data to the target node based on the scheduled modeled data flow;
g) a report generator configured to generate a real-time report from the transformed data for future references and analysis; and
h) an extreme data controller adapted to dynamically modifying the modeled and scheduled data flow through a web-based interface.
9. The system of claim 8, further comprising a data flow repository storing the current and previous data flow models for future references and monitoring.
10. The system of claim 8, wherein said one or more rules are injected by a rule-engine from a rule repository database.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2527-MUM-2012-RELEVANT DOCUMENTS [28-09-2023(online)].pdf	2023-09-28
1	ABSTRACT1.jpg	2018-08-11
2	2527-MUM-2012-FORM 3.pdf	2018-08-11
2	2527-MUM-2012-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
3	2527-MUM-2012-IntimationOfGrant15-10-2020.pdf	2020-10-15
3	2527-MUM-2012-FORM 2[TITLE PAGE].pdf	2018-08-11
4	2527-MUM-2012-PatentCertificate15-10-2020.pdf	2020-10-15
4	2527-MUM-2012-FORM 26(20-9-2012).pdf	2018-08-11
5	2527-MUM-2012-PETITION UNDER RULE 137 [28-07-2020(online)].pdf	2020-07-28
5	2527-MUM-2012-FORM 2.pdf	2018-08-11
6	2527-MUM-2012-RELEVANT DOCUMENTS [28-07-2020(online)].pdf	2020-07-28
6	2527-MUM-2012-FORM 18.pdf	2018-08-11
7	2527-MUM-2012-Written submissions and relevant documents [28-07-2020(online)].pdf	2020-07-28
7	2527-MUM-2012-FORM 1.pdf	2018-08-11
8	2527-MUM-2012-FORM 1(4-12-2012).pdf	2018-08-11
8	2527-MUM-2012-Correspondence to notify the Controller [14-07-2020(online)].pdf	2020-07-14
9	2527-MUM-2012-DRAWING.pdf	2018-08-11
9	2527-MUM-2012-FORM-26 [14-07-2020(online)].pdf	2020-07-14
10	2527-MUM-2012-DESCRIPTION(COMPLETE).pdf	2018-08-11
10	2527-MUM-2012-Response to office action [14-07-2020(online)].pdf	2020-07-14
11	2527-MUM-2012-CORRESPONDENCE.pdf	2018-08-11
11	2527-MUM-2012-US(14)-HearingNotice-(HearingDate-15-07-2020).pdf	2020-06-23
12	2527-MUM-2012-CLAIMS [16-04-2019(online)].pdf	2019-04-16
12	2527-MUM-2012-CORRESPONDENCE(4-12-2012).pdf	2018-08-11
13	2527-MUM-2012-COMPLETE SPECIFICATION [16-04-2019(online)].pdf	2019-04-16
13	2527-MUM-2012-CORRESPONDENCE(20-9-2012).pdf	2018-08-11
14	2527-MUM-2012-CLAIMS.pdf	2018-08-11
14	2527-MUM-2012-FER_SER_REPLY [16-04-2019(online)].pdf	2019-04-16
15	2527-MUM-2012-ABSTRACT.pdf	2018-08-11
15	2527-MUM-2012-OTHERS [16-04-2019(online)].pdf	2019-04-16
16	2527-MUM-2012-FER.pdf	2018-11-02
17	2527-MUM-2012-OTHERS [16-04-2019(online)].pdf	2019-04-16
17	2527-MUM-2012-ABSTRACT.pdf	2018-08-11
18	2527-MUM-2012-FER_SER_REPLY [16-04-2019(online)].pdf	2019-04-16
18	2527-MUM-2012-CLAIMS.pdf	2018-08-11
19	2527-MUM-2012-COMPLETE SPECIFICATION [16-04-2019(online)].pdf	2019-04-16
19	2527-MUM-2012-CORRESPONDENCE(20-9-2012).pdf	2018-08-11
20	2527-MUM-2012-CLAIMS [16-04-2019(online)].pdf	2019-04-16
20	2527-MUM-2012-CORRESPONDENCE(4-12-2012).pdf	2018-08-11
21	2527-MUM-2012-CORRESPONDENCE.pdf	2018-08-11
21	2527-MUM-2012-US(14)-HearingNotice-(HearingDate-15-07-2020).pdf	2020-06-23
22	2527-MUM-2012-DESCRIPTION(COMPLETE).pdf	2018-08-11
22	2527-MUM-2012-Response to office action [14-07-2020(online)].pdf	2020-07-14
23	2527-MUM-2012-DRAWING.pdf	2018-08-11
23	2527-MUM-2012-FORM-26 [14-07-2020(online)].pdf	2020-07-14
24	2527-MUM-2012-FORM 1(4-12-2012).pdf	2018-08-11
24	2527-MUM-2012-Correspondence to notify the Controller [14-07-2020(online)].pdf	2020-07-14
25	2527-MUM-2012-Written submissions and relevant documents [28-07-2020(online)].pdf	2020-07-28
25	2527-MUM-2012-FORM 1.pdf	2018-08-11
26	2527-MUM-2012-RELEVANT DOCUMENTS [28-07-2020(online)].pdf	2020-07-28
26	2527-MUM-2012-FORM 18.pdf	2018-08-11
27	2527-MUM-2012-PETITION UNDER RULE 137 [28-07-2020(online)].pdf	2020-07-28
27	2527-MUM-2012-FORM 2.pdf	2018-08-11
28	2527-MUM-2012-PatentCertificate15-10-2020.pdf	2020-10-15
28	2527-MUM-2012-FORM 26(20-9-2012).pdf	2018-08-11
29	2527-MUM-2012-IntimationOfGrant15-10-2020.pdf	2020-10-15
29	2527-MUM-2012-FORM 2[TITLE PAGE].pdf	2018-08-11
30	2527-MUM-2012-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
30	2527-MUM-2012-FORM 3.pdf	2018-08-11
31	2527-MUM-2012-RELEVANT DOCUMENTS [28-09-2023(online)].pdf	2023-09-28
31	ABSTRACT1.jpg	2018-08-11

Search Strategy

1	Search_01-11-2018.pdf