Abstract: A system and a method for managing distributed data processing across a distributed computing environment. The method includes receiving a data processing task with one or more template features via a user interface (UI) and determining one or more requirements of a data processing task by the processor. The data processing task comprising at least one of: understanding an input data, a desired output, a processing logic, one or more dependencies and a plurality of additional configurations or parameters. A set of jobs are created and defined based on a job schedule. The method includes compiling the one or more jobs into an executable format for the distributed computing environment by the processor. The method includes executing one or more jobs through one or more nodes in the distributed computing environment and aggregating results of the one or more jobs and storing the one or more results by the processor. FIGURE 5
FORM 2
THE PATENTS ACT, 1970 (39 of 1970) THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10; rule 13)
TITLE OF THE INVENTION
DISTRIBUTED DATA PROCESSING ORCHESTRATOR SYSTEM AND METHOD THEREOF
APPLICANT
JIO PLATFORMS LIMITED
of Office-101, Saffron, Nr. Centre Point, Panchwati 5 Rasta, Ambawadi, Ahmedabad -
380006, Gujarat, India; Nationality : India
The following specification particularly describes
the invention and the manner in which
it is to be performed
RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material that is subject to intellectual property rights such as but is not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, 5 belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
10
FIELD OF DISCLOSURE
[0002] The present disclosure relates to the field of distributed computing or data engineering. More particularly, the present disclosure relates to a system and a method for managing an execution of distributed data processing tasks across a distributed
15 computing environment.
BACKGROUND OF DISCLOSURE
[0003] The following description of related art is intended to provide
background information pertaining to the field of the disclosure. This section may
20 include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art. [0004] Traditionally there are several problems being encountered in executing tasks
25 on a distributed computing environment. The execution of the tasks on a distributed computing cluster involves manual code development where users have to manually write code for each job or task, that consumes time and requires expertise in distributed computing frameworks. Further, deploying code the manually written code on the production environment and configuring it for the distributed cluster are complex and
2
error-prone processes, involving infrastructure setup, dependency management, and compatibility concerns.
[0005] Furthermore, scaling task execution across the cluster was difficult, requiring manual workload distribution, resource monitoring, and performance optimization. In 5 addition, there was a lack of flexibility as adapting to changing requirements or adding new tasks may require modifying or writing new code, disrupting workflows thereby causing delays. There were also technical expertise requirements for utilizing a distributed computing environment, which makes it inaccessible to users without expertise in distributed systems.
10 [0006] Hence, the conventional systems and methods face difficulty in streamlining task execution, eliminating manual code development, simplifying deployment, improving scalability, enabling flexibility, and reducing the requirement for technical expertise. [0007] There is, therefore, a need in the art to provide a method and a system that can
15 overcome the shortcomings of the existing prior arts.
OBJECTS OF THE PRESENT DISCLOSURE
[0008] Some of the objects of the present disclosure, which at least one embodiment
herein satisfies are as listed herein below. 20 [0009] An object of the present disclosure is to provide a system and method to allow
users to create jobs and tasks without the need of programming expertise.
[0010] An object of the present disclosure is to provide a system and method to allow
users to eliminate the need for manual code deployment in order to save significant
time and effort for users. 25 [0011] An object of the present disclosure is to provide a system and method to
improve efficiency in distributed data processing.
[0012] An object of the present disclosure is to provide a system and method to enable
quick adaptation to change in requirements.
3
[0013] An object of the present disclosure is to promote interoperability by abstracting the underlying infrastructure and providing a unified interface for managing and executing tasks.
[0014] These and other objectives and advantages of the embodiments of the present 5 invention will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
SUMMARY
[0015] The present disclosure discloses a system and a method for managing an
10 execution of distributed data processing tasks across a distributed computing environment. The system and method enable dynamic job creation and deployment on a distributed computing environment without the requirement of coding. The dynamic job creation and deployment on a distributed computing environment includes multiple job scheduling, job compilation, parallel execution, and data aggregation, and
15 execution of tasks over distributed computing environment. The data is collected from the distributed file system and is pre-processed. The system then performs computation over the pre-processed data and sends the computed response to the user interface (UI). The user receives a notification for network performance based on the executed task. In this manner, the presented solution saves time, eliminates manual effort, and
20 improves the efficiency of distributed data processing. In an embodiment, a method for managing distributed data processing in a distributed computing environment includes receiving a data processing task with one or more template features via a user interface UI. The one or more template features include a set of job parameters. The method includes determining, by a processor, one or more requirements of the data
25 processing task. The determination of the one or more requirements comprising at least
one of: understanding an input data, a desired output, a processing logic, one or more
dependencies, and a plurality of at least one of: additional configurations or parameters,
by creating and defining a set of jobs based on a job schedule. The method includes
compiling, by the processor, one or more jobs into an executable format for the
4
distributed computing environment based on the determination. The method includes executing, by the processor, the one or more jobs through one or more nodes in the distributed computing environment. The method includes aggregating, by the processor, results of the one or more jobs and storing the results. 5 [0016] In an embodiment, determining the one or more requirements includes incorporating, by the processor, one or more fault tolerance mechanisms, and detecting, by the processor, at least one of one or more failures in execution of the jobs and re¬routing the jobs associated with the one or more failures to healthy nodes. [0017] In an embodiment, executing one or more jobs in parallel includes dividing an
10 input data into one or more chunks, and distributing the one or more chunks to corresponding nodes across available processing nodes in the distributed computing environment based on at least one of: load balancing strategies and data locality. [0018] In an embodiment, aggregating the results further comprises performing at least one of: merging results from the one or more nodes, combining partial results, or
15 applying specific aggregation functions comprising at least one of: a sum, an average, and a count, based on the one or more requirements of the data processing task. [0019] In an embodiment, a system for managing distributed data processing across a distributed computing environment includes a memory for storing one or more executable modules and data associated with one or more parameters associated with
20 the distributed data processing, an interface to facilitate communication through the system, a processor for executing the one or more executable modules for managing and coordinating an execution of distributed data processing tasks across a network of interconnected computing resources. The one or more executable modules include a dynamic job module configured to receive a data processing task with one or more
25 template features via a user interface UI and determine one or more requirements of
the data processing task. The one or more template features include a set of job
parameters. The determining of the one or more requirements comprises at least one
of: understanding an input data, a desired output, a processing logic, one or more
dependencies, and a plurality of at least one of: additional configurations or parameters,
5
by creating and defining a set of jobs based on a job schedule. The one or more executable modules include a compilation module configured to compile one or more jobs into an executable format for the distributed computing environment based on the determination and execute one or more jobs through one or more nodes in the 5 distributed computing environment. The one or more executable modules include a collection module configured to: aggregate results of the one or more jobs and storing the results.
[0020] In an embodiment, the dynamic job module is further configured to incorporate one or more fault tolerance mechanisms, and detect at least one of one or more failures,
10 and reroute the jobs associated with the one or more failures to healthy nodes.
[0021] In an embodiment, the compilation module is further configured to scheduling, the one or more jobs to be executed in parallel and distributing the one or more jobs across multiple nodes, and splitting a job into smaller tasks and distribute the smaller tasks to different nodes for concurrent execution.
15 [0022] In an embodiment, the collection module is further configured for performing at least one of: merging results from the one or more nodes, combining partial results or applying specific aggregation functions comprising at least one of: sum, average, and count based on the requirements of the data processing task. [0023] In an embodiment, a computer program product comprising a non-transitory
20 computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of receiving a data processing task with one or more template features via a user interface UI, wherein the one or more template features includes a set of job parameters, determining , by a processor, one or more requirements of the data processing task, wherein the
25 determining the one or more requirements comprising at least one of: understanding an
input data, a desired output, a processing logic, one or more dependencies, and a
plurality of at least one of: additional configurations or parameters, by creating and
defining a set of jobs based on a job schedule, compiling , by the processor, one or
more jobs into an executable format for the distributed computing environment based
6
on the determination, executing , by the processor, the one or more jobs through one or more nodes in the distributed computing environment, and aggregating , by the processor, results of the one or more jobs and storing the results. [0024] In an embodiment, a user equipment communicatively coupled to a system, the 5 user equipment includes a processor and a computer readable storage medium storing programming for execution by the processor, the programming including instructions to: provide a data processing task with one or more template features via a user interface UI, wherein the one or more template features includes a set of job parameters, and receiving results of the data processing task on the UI based on processing the data
10 processing task using the method for managing distributed data processing in a distributed computing environment.
[0025] The various embodiments of the present disclosure provide several advantages including simplified job creation where users can create jobs or tasks without the need for programming knowledge. This user-friendly approach makes it accessible to a
15 wider range of users, regardless of their technical background. The system automates job deployment and execution, eliminating the need for manual code development and deployment, that saves significant time and effort for users, allowing them to focus on job requirements and analysis. The present disclosure allows the user for the creating of dynamic jobs tailored to specific requirements. Additionally, the present disclosure
20 allows the users to define parameters, configurations, and dependencies, enabling flexibility in executing various tasks on the distributed cluster. With automated job deployment and execution, the present disclosure improves overall efficiency in distributed data processing. The users can focus on defining requirements and analyzing results instead of dealing with code intricacies. The present disclosure
25 enables quick adaptation to changing requirements by easily creating new jobs or
modifying existing ones. This promotes agility and responsiveness in addressing
evolving data processing needs. The system in the present disclosure simplifies job
creation, saves time and effort, enables dynamic task execution, enhances efficiency,
and offers agility in adapting to changing requirements. In an aspect, the most unique
7
feature of the present disclosure lies in its ability to enable dynamic job creation and deployment on a distributed computing environment without the requirement of coding. This feature allows users to define job requirements and configurations through a user-friendly platform, eliminating the need for manual coding and making it 5 accessible to a wider range of users. By automating the deployment and execution processes, the present disclosure streamlines job execution, saving time and effort. Hence, the present disclosure overcomes the technical barriers, enhances efficiency, and provides a flexible solution that can easily adapt to changing requirements. The present disclosure provides a system for efficiently executing a distributed data
10 processing orchestration. The present disclosure provides a system wherein users can create jobs or tasks without the need for programming knowledge as the user-friendly approach makes it accessible to a wider range of users, regardless of their technical background. The present disclosure provides a system that automates job deployment and execution, eliminating the need for manual code development and deployment thus
15 saving significant time and effort for users, allowing them to focus on job requirements and analysis. The present disclosure provides a system that allows for the creation of dynamic jobs tailored to specific requirements wherein users can define parameters, configurations, and dependencies, enabling flexibility in executing various tasks on the distributed cluster. The present disclosure provides a system that improves overall
20 efficiency in distributed data processing so that users can focus on defining requirements and analyzing results instead of dealing with code intricacies. The present disclosure provides a system that enables quick adaptation to changing requirements by easily creating new jobs or modifying existing ones to promote agility and responsiveness in addressing evolving data processing needs. The present disclosure
25 provides a system that simplifies job creation, saves time and effort, enables dynamic
task execution, enhances efficiency, and offers agility in adapting to changing
requirements. The users are allowed to create jobs or tasks without the need for
programming knowledge. This user-friendly approach makes it accessible to a wider
range of users, regardless of their technical background. The system automates job
8
deployment and execution, eliminating the need for manual code development and deployment. This saves significant time and effort for users, allowing them to focus on job requirements and analysis. It allows for the creation of dynamic jobs tailored to specific requirements. Users can define parameters, configurations, and dependencies, 5 enabling flexibility in executing various tasks on the distributed cluster. With automated job deployment and execution, the present disclosure improves overall efficiency in distributed data processing. Users can focus on defining requirements and analyzing results instead of dealing with code intricacies. It enables quick adaptation to changing requirements by easily creating new jobs or modifying existing ones. This
10 promotes agility and responsiveness in addressing evolving data processing needs. In summary, the present disclosure simplifies job creation, saves time and effort, enables dynamic task execution, enhances efficiency, and offers agility in adapting to changing requirements. [0026] Various objects, features, aspects, and advantages of the inventive subject
15 matter will become apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
[0027] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments of the present invention that others can, by applying
20 current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
25 BRIEF DESCRIPTION OF THE DRAWINGS
[0028] The specifications of the present disclosure are accompanied with drawings of
the system and method to aid in better understanding of the said invention. The
drawings are in no way limitations of the present disclosure, rather are meant to
illustrate the ideal embodiments of the said disclosure.
9
[0029] In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description 5 is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[0030] FIG. 1 illustrates an exemplary architecture for managing and coordinating an execution of distributed data processing tasks across a network of interconnected computing resources, according to an embodiment of the present disclosure;
10 [0031] FIG. 2 illustrates an exemplary block diagram of all the executing modules in the system, according to an embodiment of the present disclosure; [0032] FIG. 3 illustrates a block diagram representing a process flow in a distributed data processing orchestrator, according to an embodiment of the present disclosure; [0033] FIG. 4 illustrates an exemplary flow diagram for execution on the distributed
15 data processing orchestrator, according to an embodiment of the present disclosure; [0034] FIG. 5 illustrates a flowchart of a method for managing and coordinating an execution of distributed data processing tasks across a network of interconnected computing resources, according to an embodiment of the present disclosure; and [0035] FIG. 6 illustrates an exemplary computer system in which or with which
20 embodiments of the present disclosure may be implemented.
[0036] FIG. 7 illustrates another exemplary computer system in which or with which embodiments of the present disclosure may be implemented.
[0037] The foregoing shall be more apparent from the following more detailed description of the disclosure.
25
LIST OF REFERENCE NUMERALS
100- Network Architecture
102-1,102-2...102-N- User equipment
10
104-1, 104-2…104-N- Computing device
106- Network
108-System
110- Distributed Data processing unit 5 202- Processor(s)
204- Memory
206- Interface(s)
208- Processing Engine
210- Dynamic jobs module 10 212- Compilation Module
214- Collection Module
216- Other modules
218-Database
302- User 15 304- User Interface
314- Job template creation
322- Data lake
326- Job processing
330- Job deployment 20 602- Input devices
604- Central processing unit (CPU)
606- Data flow & Control flow
608- Output devices
610- Secondary storage devices 25 612- Control unit
614- Arithmetic and Logical Unit
616- Memory unit
DETAILED DESCRIPTION OF THE EMBODIMENTS
11
[0038] In the following description, for explanation, various specific details are outlined in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter 5 can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein. [0039] The ensuing description provides exemplary embodiments only and is not
10 intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
15 [0040] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments
20 in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[0041] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or
25 a block diagram. Although a flowchart may describe the operations as a sequential
process, many of the operations can be performed in parallel or concurrently. In
addition, the order of the operations may be re-arranged. A process is terminated when
its operations are completed but could have additional steps not included in a figure. A
process may correspond to a method, a function, a procedure, a subroutine, a
12
subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function. [0042] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter 5 disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,”
10 and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive like the term “comprising” as an open transition word without precluding any additional or other elements.
[0043] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature,
15 structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable
20 manner in one or more embodiments.
[0044] The terminology used herein is to describe particular embodiments only and is not intended to be limiting the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or
25 “comprising,” when used in this specification, specify the presence of stated features,
integers, steps, operations, elements, and/or components, but do not preclude the
presence or addition of one or more other features, integers, steps, operations, elements,
components, and/or groups thereof. As used herein, the term “and/or” includes any
combinations of one or more of the associated listed items.
13
[0045] In an embodiment, the system and method are described for task execution on a distributed computing environment. The system and method enable dynamic job creation and deployment on a distributed computing without the requirement of coding. The dynamic job creation and deployment on a distributed computing environment 5 includes multiple job scheduling, job compilation, parallel execution, and data aggregation execution of tasks over distributed computing environment. The data is collected from the distributed file system and is pre-processed. Further, the system performs computation over the pre-processed data and sends the computed response to the user interface (UI). The user receives a notification for network performance based
10 on the executed task. In this manner, the present disclosure enables to saves time, eliminates manual effort, and improves the efficiency of distributed data processing. [0046] In an embodiment, the distributed data processing orchestrator operates by the method including users interacting with a user-friendly interface to create jobs without coding, defining parameters and dependencies. In the present disclosure, the users input
15 specifications for their jobs, including data sources, computations, and output format. Further, the present disclosure includes compiling the job specifications into an executable format for the distributed computing environment. The jobs are executed in parallel across the distributed cluster, distributing the workload effectively and the results are aggregated and collected in a centralized location for easy access and
20 analysis. The distributed data processing orchestrator simplifies task execution on a distributed computing environment. Additionally, the users can create jobs without coding, specify job requirements, and the method automates job compilation, parallel execution, and data aggregation. This streamlined approach saves time, eliminates manual effort, and improves the efficiency of distributed data processing.
25 [0047] The term UI used herein, refers to user interface is the point of human-computer
interaction and communication in a device.
[0048] The term data lake used herein, is a centralized repository designed to store,
process, and secure large amounts of structured, semi structured, and unstructured data.
It can store data in its native format and process any variety of it, ignoring size limits.
14
[0049] The term distributed computing environment used herein, divides the resource intensive processes among multiple CPUs and computers. The distributed computing makes multiple computers work together to solve a common problem. [0050] The term node used herein, can be a server, a client or a peer. It communicates 5 with each other over a network, sharing information and collaborating to perform tasks. [0051] The term data processing used herein, it is an approach of handling and analyzing data across multiple interconnected devices or nodes. [0052] The term parallel execution used herein, is an approach of processing the data simultaneously, and increase the computational speed of computer system.
10 [0053] The term synchronization mechanism used herein, is a technique that ensure the correct and consistent operation of concurrent tasks in a system. [0054] The term fault tolerance mechanisms used herein, involves tuning the distributed environment to continue operating without interruption when one or more of the components fail to work.
15 [0055] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGS. 1-7.
[0056] The present disclosure relates to the field of distributed computing or data engineering. More precisely, the present disclosure relates to the system and method for managing and coordinating the execution of distributed data processing tasks across
20 a network of interconnected computing resources.
[0057] In an embodiment the present disclosure discloses a system and method for task execution on a distributed computing environment. The system and method enable dynamic job creation and deployment on a distributed computing environment without the requirement of coding. The dynamic job creation and deployment on a distributed
25 computing environment includes multiple job scheduling, job compilation, parallel execution, and data aggregation, and execution of tasks over distributed computing environment. The data is collected from the distributed file system and are pre-processed. The system then performs computation over the pre-processed data and
sends the computed response to the User Interface (UI). The user receives a notification
15
for network performance based on the executed task. In this manner, the presented solution saves time, eliminates manual effort, and improves the efficiency of distributed data processing.
[0058] FIG. 1 illustrates an exemplary architecture 100 for managing and coordinating 5 an execution of distributed data processing tasks across a network of interconnected computing resources, according to an embodiment of the present disclosure. In an aspect, the execution of distributed data processing tasks across a network of interconnected computing resources may be executed through a mobile application. [0059] In an exemplary aspect, the interconnected computing resources may be a
10 hardware and a software component within a network. The interconnected computing resources may enable communication between one node and another node. The interconnected computing resources may enable data processing between one node and another node. The interconnected computing resources may enable resource sharing between one node and another. For example, the node may include be a client, a server
15 and a peer. The node may be a computing device.
[0060] Referring to FIG. 1, the network architecture 100 may include one or more computing devices or user equipments 104-1, 104-2…104-N associated with one or more users 102-1, 102-2…102-N in an environment. A person of ordinary skill in the art will understand that one or more users 102-1, 102-2…102-N may be individually
20 referred to as the user 102 and collectively referred to as the users 102. Similarly, a person of ordinary skill in the art will understand that one or more user equipments 104-1, 104-2…104-N may be individually referred to as the user equipment 104 and collectively referred to as the user equipment 104. A person of ordinary skill in the art will appreciate that the terms “computing device(s)” and “user equipment” may be used
25 interchangeably throughout the disclosure. Although three user equipments 104 are
depicted in FIG. 1, however any number of the user equipments 104 may be included
without departing from the scope of the ongoing description.
[0061] In an embodiment, the user equipment 104 may include, but is not limited to, a
handheld wireless communication device (e.g., a mobile phone, a smart phone, a
16
phablet device, and so on), a wearable computer device(e.g., a head-mounted display computer device, a head-mounted camera device, a wristwatch computer device, and so on), a Global Positioning System (GPS) device, a laptop computer, a tablet computer, or another type of portable computer, a media playing device, a portable 5 gaming system, and/or any other type of computer device with wireless communication capabilities, and the like. In an embodiment, the user equipment 104 may include, but is not limited to, any electrical, electronic, electro-mechanical, or an equipment, or a combination of one or more of the above devices such as virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal
10 digital assistant, tablet computer, mainframe computer, or any other computing device, wherein the user equipment 104 may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as a camera, an audio aid, a microphone, a keyboard, and input devices for receiving input from the user 102 or the entity such as touch pad, touch enabled screen, electronic pen, and the like. A
15 person of ordinary skill in the art will appreciate that the user equipment 104 may not be restricted to the mentioned devices and various other devices may be used. The architecture includes a system 108 which performs the execution of distributed data processing tasks across a network of interconnected computing resources by receiving an input via user equipment 104.
20 [0062] In an embodiment, the user equipment 104 may include smart devices operating in a smart environment, for example, an Internet of Things (IoT) system. In such an embodiment, the user equipment 104 may include, but is not limited to, smart phones, smart watches, smart sensors (e.g., mechanical, thermal, electrical, magnetic, etc.), networked appliances, networked peripheral devices, networked lighting system,
25 communication devices, networked vehicle accessories, networked vehicular devices,
smart accessories, tablets, smart television (TV), computers, smart security system,
smart home system, other devices for monitoring or interacting with or for the users
102 and/or entities, or any combination thereof. A person of ordinary skill in the art
will appreciate that the user equipment 104 may include, but is not limited to,
17
intelligent, multi-sensing, network-connected devices, which can integrate seamlessly with each other and/or with a central server or a cloud-computing system or any other device that is network-connected.
[0063] In an aspect, the architecture includes a distributed data processing unit 110 5 coupled with a system 108 which performs the execution of distributed data processing tasks across a network of interconnected computing resources by receiving an input from the user through user equipment 104.
[0064] FIG. 2 illustrates an exemplary block diagram of all the executing modules in the system 108, according to an embodiment of the present disclosure.
10 [0065] As illustrated, in FIG. 2, an exemplary block diagram of the executing modules in the system 108 comprising a dynamic jobs module 210, a compilation module 212, a collection module 214 and the other executing modules 216 are disclosed. The system also includes a processor 202, memory, 204, an interface 206, and a database 218 to facilitate storing of relevant data and fetching and execution of instructions.
15 [0066] In an aspect, the system 108 may include one or more processor(s) 202. The one or more processor(s) 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, edge or fog microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more
20 processor(s) 202 may be configured to fetch and execute computer-readable instructions stored in a memory 204 of the system 108. The memory 204 may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory 204 may include any
25 non-transitory storage device including, for example, volatile memory such as Random
Access Memory (RAM), or non-volatile memory such as Erasable Programmable
Read-Only Memory (EPROM), flash memory, and the like.
[0067] Referring to FIG. 2, the system 108 may include an interface(s) 206. The
interface(s) 206 may include a variety of interfaces, for example, interfaces for data
18
input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 206 may facilitate communication to/from the system 108. The interface(s) 206 may also provide a communication pathway for one or more components of the system 108. The system 108 includes a memory 204 for storing one or more executable 5 modules and data associated with distributed data processing tasks and an interface 206 to facilitate communication through the system 108. In an aspect, the system 108 stores parameters associated with distributed data processing. The parameters may include user input, desired output format and computations. The system 108 further includes a processor 202 for executing the one or more executable modules for managing and
10 coordinating an execution of distributed data processing tasks across a network of interconnected computing resources. The one or more executable modules includes a dynamic job module 210 configured for enabling at least one of a scheduling and execution of one or more jobs in a distributed computing environment and enabling the scheduling of the one or more jobs based on a plurality of criteria including at least one
15 of time, resource availability, dependencies, and data availability. The one or more executable modules further includes a compilation module 212 configured for compiling and preparing one or more job scripts for execution in the distributed computing environment. The one or more executable modules further includes a collection module 214 configured for collecting and aggregating at least one of one or
20 more results or output generated by distributed data processing jobs.
[0068] In an embodiment, the dynamic jobs module 210 in the system is responsible for enabling the scheduling and execution of jobs in a distributed computing environment. The dynamic jobs module 210 allows for the efficient processing of large volumes of data by distributing the workload across multiple nodes or machines. In
25 examples, dynamic jobs module 210 may receive a data processing task with one or
more template features via a user interface (UI). The one or more template features
include a set of job parameter. The dynamic jobs module 210 may determine one or
more requirements of the data processing task and generate one or more jobs. The one
or more jobs configured to be executed based on defined job schedule. In aspect,
19
determining the one or more requirements includes at least one of: understanding an input data, a desired output, a processing logic, one or more dependencies and at least one of: additional configurations or parameters, by creating and defining a set of jobs based on a job schedule. 5 [0069] The dynamic jobs module 210 enables the scheduling of jobs according to the job schedule based on various criteria such as time, resource availability, dependencies, and data availability. The dynamic jobs module 210 allows users to define and manage job schedules dynamically. The dynamic jobs module 210 also manages the allocation of computing resources for executing jobs. It takes into account the availability and
10 capacity of distributed resources such as processing units, memory, and storage. The dynamic jobs module 210 ensures balanced resource utilization by distributing the workload evenly across available nodes. The dynamic jobs module 210 monitors the system's resource usage and dynamically adjusts job assignments to optimize performance and prevent resource bottlenecks. The dynamic jobs module 210
15 incorporates fault tolerance mechanisms to handle failures in the distributed environment. The dynamic jobs module 210 detects failures, reroutes jobs to healthy nodes, and ensures the uninterrupted processing of data. It enables the system to scale horizontally by adding or removing nodes dynamically based on the workload. It allows for the efficient utilization of resources and accommodates changing data
20 processing demands.
[0070] In an embodiment, a compilation module 212 in a distributed data processing orchestrator is a component responsible for compiling and preparing code or job scripts for execution in a distributed computing environment. In examples, the compilation module 212 compiles the one or more jobs into an executable format for the distributed
25 computing environment. The compilation module 212 supports different programming
languages and compilers to convert the source code into executable form. The
compilation module 212 handles the compilation process specific to the programming
language used in the data processing jobs. The compilation module 212 manages
dependencies required by the job scripts or code. The compilation module 212 ensures
20
that all necessary libraries, frameworks, and external resources are available for successful compilation and execution. The compilation module 212 may include code optimization techniques to enhance the performance and efficiency of the compiled code. The compilation module 212 configures and prepares the necessary resources 5 required for executing the compiled code on distributed nodes. The compilation module 212 determines the specific resources needed, such as CPU, memory, and network, and allocates them accordingly. In the distributed computing environment of the system, the compilation module 212 may incorporate techniques for parallelizing and distributing the compiled code across multiple nodes. It splits the code or job into
10 smaller tasks and distributes them to different nodes for concurrent execution. In aspects, the compiled one or more jobs are executed by the processing engine 208 through one or more nodes in the distributed computing environment. [0071] In an embodiment, the collection module 214 is responsible for collecting and aggregating the results or output generated by distributed data processing tasks or jobs.
15 The collection module 214 retrieves the processed data or intermediate results generated by distributed tasks across the distributed system. The retrieval involves communicating with the appropriate nodes or machines to fetch the relevant data. The collection module 214 aggregates the collected data from different nodes into a single result. The aggregation can involve merging data sets, combining partial results, or
20 applying specific aggregation functions like sum, average, count, etc., depending on the requirements of the data processing task. In certain other embodiments, the collection module 214 may ensure the ordered collection of data based on specific criteria. The collection module 214 can sort the collected data based on timestamps, keys, or other attributes to facilitate further processing or analysis. The collection
25 module 214 handles failures or errors that may occur during data retrieval. The collection module 214 employs fault tolerance mechanisms to recover from node failures, network issues, or other exceptional conditions. This ensures the reliability and resilience of the data collection process. The collection module 214 may perform
21
data validation and cleansing operations to ensure the quality and integrity of the collected data.
[0072] In an aspect, the dynamic job module 210 is further configured for incorporating one or more fault tolerance mechanisms to handle failures in the 5 distributed environment and detecting at least one of one or more failures, reroutes jobs associated with the one or more failures to healthy nodes, and ensure an uninterrupted processing of data.
[0073] In an aspect, the compilation module 212 is further configured for incorporating one or more techniques for parallelizing and distributing the compiled jobs or codes
10 across multiple nodes and splitting at least one of a code or a job into smaller tasks and distribute the smaller tasks to different nodes for concurrent execution. In aspects, the tasks or jobs may need input. The an input data is divided into one or more chunks and provided to corresponding nodes across available processing nodes in the distributed computing environment based on at least one of: load balancing strategies and data
15 locality.
[0074] In an aspect, the collection module 214 is further configured for performing at least one of merging one or more data sets, combining partial results, or applying specific aggregation functions including at least one of sum, average, count, depending on the requirements of the data processing task.
20 [0075] As illustrated, in FIG. 3, discloses all the components in the distributed data processing orchestrator 300 (also referred to as orchestrator). In an exemplary aspect, the orchestrator may be a data processing architecture. The orchestrator may be managing and coordinating the execution of tasks across multiple computing resources. The computing resources may be the one or more user equipments 104 connected in
25 the network 106. In an aspect, the user 302 provides login details as input through User
Interface (UI) 304 and the raw data stored in the database 218 is collected from the
system 108. The system 108 is communicatively associated with the Distributed data
processing unit 110 that enables the distributed data processing orchestration. The user
302 creates a job schedule 310 with template features 312 using the raw data obtained
22
from the database 218. The user 302 can interact with a user-friendly interface to create jobs without coding, defining parameters and dependencies. The user 302 can also input specifications for their jobs, including data sources, computations, and output format. Further, the HTTP request 316 is received from the job template creation 314 5 unit and request for execution starts 318 is initiated and response sent 320 module sends the feedback to the job template creation 314 as a confirmation for the request. [0076] In an aspect, the HTTP request 316 includes the user 302 specific requirement to be applied on a resource in the distributed data processing unit 110 to initiate the job. Further, after receiving the request is made the request execution starts 318 in the
10 distributed data processing unit 110. A feedback that the job has been initiated is sent to the user 302 via the response sent 320 module.
[0077] In an embodiment, the distributed data processing orchestrator 300 includes a compile and deploy jobs 328 unit that processes the job specifications into an executable format for the distributed computing environment in a job deployment 330
15 unit. The formatted job specification from the job deployment 330 unit is sent execute task in parallel 324 unit across the distributed cluster, distributing the workload effectively in job processing 326. In an aspect, the formatted job specification involves compiling the user 302 input into machine readable form. The results are aggregated and collected in a centralized location for easy access and analysis. The results
20 aggregation includes merging the results of parallel execution tasks to render desired output. The desired result is aggregated and stored in a central repository. The distributed data processing orchestrator 300 includes a data lake 322 unit connected with the computation master cluster 334. In an aspect, the data lake 322 unit is a centralized repository that stores and allows processing of large volumes of data in its
25 original form.
[0078] In an aspect, the distributed data processing orchestrator 300 simplifies task execution on a distributed computing environment by a computation master cluster 334 to the applications 332A/ 332B of workers. The user 302 (used interchangeably with
23
the term users) can create jobs without coding, specify job requirements, and automates job compilation, parallel execution, and data aggregation.
[0079] In an aspect, the computation master cluster 314 includes one or more application unit with workers 332A/332B.The computation master cluster 334 provides 5 management services and controls the workers 332A/332B. The computation master cluster 334 manages the process for resource allocation, maintenance, scheduling, and monitoring.
[0080] In an aspect, the application unit with workers 332A/332B has one or more core that enables the parallel execution of task. The cores enable execution of parallel task
10 wherein the tasks are split into smaller units and executed simultaneously to speed up the processing time.
[0081] In an aspect, the execute task in parallel 324 in a distributed data processing orchestrator is a key aspect of achieving high performance and scalability. The input data is divided into smaller partitions or chunks to enable parallel processing. Each
15 partition is assigned to a processing node or worker for independent execution. The parallel execution of tasks distributes the tasks across the available processing nodes in the cluster based on load balancing strategies or data locality to minimize data transfer between nodes. Each processing node independently executes its assigned tasks including applying transformations, computations, or analysis on the assigned data
20 partitions. The tasks may operate on a single data partition or perform aggregations across multiple partitions. Sharing intermediate results can be achieved through message passing, shared data structures, or distributed file systems. In some cases, tasks may need to synchronize their execution to ensure correct results. [0082] In an exemplary aspect, when multiple tasks depend on the output of a previous
25 task, they need to wait until the dependency is resolved. The present task may be dependent on the results of the previous task. The issue of dependency can be avoided by synchronization mechanisms such as barriers or locks are used to coordinate task execution. The system typically incorporates fault tolerance mechanisms, check
24
pointing and data replication techniques to handle failures. If a processing node fails, the orchestrator redistributes the failed task to another available node for execution. [0083] In an aspect, as shown in FIG. 3, the user 302 may interact with the distributed data processing unit 110 through the system 108. In an aspect, the data processing unit 5 110 may receive a data processing task via the user interface 304. The data processing task may have one or more template feature. The one or more template features may be provided to the user from the template feature 312. The template feature includes a set of job parameters. The set of job parameters may include, but not be limited to, a job definition, a job ID, a job type, a resource allocation and a scheduling policy. The
10 job template creation 314 may send the job schedule through the HTTP request 316. The request execution starts 318 by determining one or more requirements of a data processing task. The data processing task includes understanding an input data, a desired output, a processing logic, one or more dependencies, and a plurality of at least one of: additional configurations or parameters. The determined set of jobs may be sent
15 to the response sent 320 modules. The response sent 320 module passes on the task from the job template creation 314 module to job deployment 330 module. The compile and deploy jobs 328 module may compile the one or more jobs into an executable format for the distributed computing environment. The job processing 326 module may execute one or more jobs through one or more nodes in the distributed computing
20 environment. The job processing 326 module may distribute the one or more jobs to the computation master cluster 334. The computation master cluster 334 may aggregate results of the one or more jobs. The aggregated results are stored in the data lake 322. [0084] In an aspect, referring to FIG. 1, 2 and 3, the system 108 may be coupled with the distribution data unit 110. The system 108 may be communicatively coupled with
25 the user 302. The system 108 may receive a data processing task with one or more template the user 302 via the user interface 304. The user 302 may provide the data processing task via a job schedule 310. The dynamic jobs module 210 may trigger the job template creation 314 module. The job template creation 314 module may provide
25
the user with one or more template features via the template feature 312 module. Further, the job template creation 314 may send the job schedule through the HTTP request 316. The request execution starts 318 by determining one or more requirements of a data processing task. The data processing task includes understanding an input 5 data, a desired output, a processing logic, one or more dependencies, and a plurality of at least one of: additional configurations or parameters. The dynamic module 210 may send the determined set of jobs to the compilation module 212. The compilation module 212 may pass on the determined set of jobs to the job deployment module 330. The compile and deploy jobs 328 may compile the one or more jobs into an executable
10 format for the distributed computing environment. Further, the job processing 326 module may execute one or more jobs through one or more nodes in the distributed computing environment. The job processing 326 module may distribute the one or more jobs to the computation master cluster 334. The collection module 214 may communicate with the computation master cluster 334. The computation master cluster
15 334 may aggregate results of the one or more jobs. The aggregated results are stored in the data lake 322.
[0085] FIG. 4 illustrates an exemplary flow diagram 400 for execution on the distributed data processing orchestrator, according to an embodiment of the present disclosure.
20 [0086] As illustrated, in FIG. 4, an exemplary flow diagram 400 for the execution of the distributed data processing orchestrator is disclosed.
[0087] At step 402, the method 400 includes creation of jobs through the UI, through the user interaction with the User Interface (UI). In an aspect, the user logs in via user interface and selects the desired template to create a job. The specific requirements for
25 data processing task including understanding the input data, desired output, processing
logic, dependencies, and any additional configurations or parameters and accordingly,
a set of jobs are created and defined.
[0088] At step 404, the method 400 includes job definition by user. The job definition
includes user specified parameters and properties regarding the process to be carried
26
on the data. The user specified parameters and properties may include the time allocation processing, processing logic, prioritising order of execution and allocation of resources. In an aspect, allocation of resources includes assigning the cores in processor for processing the job. 5 [0089] At step 406, the method 400 includes compiling the job specifications into an executable format for the distributed computing environment. Upon compilation of the job specification into the executable format, the step of the parallel execution is performed in order to distribute the workload effectively. [0090] At step 408, the method 400 includes parallel execution of tasks. During the
10 execution of tasks, the overall processing logic is split into smaller, manageable tasks. The user also specifies the input data sources and the output data destination where the processed results will be stored. During the execution of tasks, there may be a need for communication and data exchange between different processing nodes. This can involve sharing intermediate results, combining partial outputs, or coordinating the
15 overall processing flow. Once the tasks are completed, the intermediate results or processed data from each node are collected and aggregated. This can be done using the collection module, as discussed in the previous response, to combine the results and generate a unified output. [0091] At step 410, the method 400 includes data aggregation. In an aspect, the
20 aggregated result can be delivered to the desired destination, which can be a storage
system, a streaming platform, or presented to the user/application consuming the
processed data. This ensures the accessibility and availability of the output for further
analysis, visualization, or integration with downstream systems.
[0092] FIG. 5 illustrates a flowchart 500 of a method for managing and
25 coordinating an execution of distributed data processing tasks across a network of
interconnected computing resources, according to an embodiment of the present
disclosure.
[0093] In an aspect, the distributed computing environment may be a
computing architecture with one or more computing resources connected via a
27
network. The computing resources may include but not limited to, a computer
hardware, a software, a communication device and a computing device. The computing
resources may work together to solve a common goal. The distributed computing
environment may be handling complex computational task.
5 [0094] At step 502, receiving a data processing task with one or more template
feature via a user interface (UI) by a processor. The one or more template features includes a set of job parameters. The set of job parameters may include, but not be limited to, a job definition, a job ID, a job type, a resource allocation and a scheduling policy. In an example, the data processing task may be received by the processor via a
10 user interface (UI) of a computing device by a user. In an aspect, the data processing task may be defining the job details that includes instructions to execute the job, job time, job queue and resource assignment. The user creates the job schedule through the login via the user interface (UI). In example, a job may be a task or work. The task or work may be executed by the computing resource in the distributed environment. For
15 example, a job may be assigned to computer resources to calculate the sum of the given numbers.
[0095] In an exemplary aspect, the user may login to an application via user
interface using the user equipment. The user may use a login credential to login to the application. The login credentials may include but not limited to, a username, a
20 password, a PIN number and a fingerprint.
[0096] In an exemplary aspect, the one or more template features may include
a set of job parameters. The set of job parameters may include, but not be limited to, a job definition, a job ID, a job type, a resource allocation and a scheduling policy. The one or more templates may be used in the application of the user interface. The one or
25 more templates may be helpful in scheduling the job.
[0097] In an exemplary aspect, the job schedule may be a process of
determining when a task is to be executed. The job schedule may include determining
where the task should be executed in a distributed environment. For example, the user
may assign a task via the UI. The user may schedule the task using the job scheduling
28
to the computing resource in the distributed environment.
[0098] At step 504, determining one or more of requirements the data processing task by the processor. The determining one or more requirements comprising at least one of understanding an input data, a desired output, a processing logic, one or more 5 dependencies, and a plurality of at least one of additional configurations or parameters, by creating and defining a set of jobs based on a user interaction with the UI. In an exemplary aspect, a job may be a work or task. The work or task needs to be executed by the computing resources in the distributed environment. The set of jobs are created and defined via user interface. For example, a job creation and defining involves a
10 process of defining setting up a task or a work by using one or more job parameter. The job parameter includes but not limited to, an input source, an expected output, a dependency on other task and a resource location.
[0099] In an aspect, determining one or more requirements of data processing task involves grasping the input data and computing for a desired output by applying a
15 processing logic based on the user specified parameters. The requirements may include timing allocated to the task, processing logic, prioritising order of execution and allocation of resources. In an example, the one or more dependencies may include a data dependency, a resource dependency and an output dependency. For example, a job-1 result may be an input for a job-2. The job-2 has output dependency on job-1.
20 [00100] In an aspect, determining the one or more requirements include
incorporating one or more fault tolerance mechanisms by the processor. The fault tolerance mechanism is incorporated to handle failures in the distributed computing environment. In an aspect, determining the one or more requirements may include detecting one or more failures in the execution of the jobs and re-routing jobs to healthy
25 nodes. The job re-routing may be performed for ensuring an uninterrupted processing
of data. In an example, one or more failures may include but are not limited, a node
failure, a network failure, a data loss, a resource limitation and a coordination issue. In
an exemplary aspect, a healthy node may be a computing resource. The computing
resource can perform all necessary functions. The computing resources may include
29
but not limited to, a computer hardware, a software, a communication device and a computing device.
[00101] In an exemplary aspect, the re-routing job may be a process of re-
assigning the task or job in case of node failure. The re-routing involves assigning an
5 on-going task of one node to another node. For example, the re-route jobs to healthy
nodes involves diverting the jobs to different worker node in a processor in case
assigned worker node failure for ensuring an uninterrupted processing of data.
[00102] In an aspect, the fault tolerance mechanism involves tuning the
distributed environment to continue operating without interruption when one or more
10 of the components fail to work. The fault tolerance mechanism may be used to maintain reliability in the distributed data processing. The fault tolerance mechanism may ensure consistency of data in the distributed data processing. The fault tolerance mechanism may be helpful in minimizing the data processing failures. In an exemplary, the fault tolerance mechanism in the distributed environment may implement a task retry
15 mechanism. The task retry mechanism involves automatically retrying the failed task
by another node in the distributed data processing. For example, a node can be a server,
a client or a peer. The nodes may communicate with other node over a network, sharing
information and collaborating to perform tasks.
[00103] At step 506, the processor may compile one or more jobs into an
20 executable format for the distributed computing environment based on the determination. In an aspect, compiling one or more jobs involves changing the specific requirement data into machine readable form for carrying out the execution process by the distributed computing environment. In an exemplary aspect, the executable format may be a machine-readable format. The machine-readable format may include but is
25 not limited to a binary format, a CSV (comma-separated values), a JSON (JavaScript
object notation) and an XML (eXtensible markup language).
[00104] At step 508, the processor may execute the one or more jobs through
one or more nodes in the distributed computing environment. In an aspect, the jobs
may be distributed among the one or more nodes in the distributed computing system.
30
Each of the node may be assigned with one or more jobs. For example, Node-1 may be
assigned the job to perform addition. The node may include but not limited to, an
individual computer or computing device that are interconnected via network. In an
aspect, execution of jobs in parallel involves several jobs executed simultaneously to
5 speed up the processing time. The parallel execution of task involves executing one or
more tasks at the same time to reduce the load on the processor.
[00105] In an aspect, the parallel execution of tasks distributes the tasks across
the available processing nodes in the cluster based on load balancing strategies or data locality to minimize data transfer between nodes. The load balancing strategies
10 involves distributing equally across the cores of the processor to speed up the data processing.
[00106] In an embodiment, executing one or more jobs in parallel includes an
input data is divided into one or more chunks to corresponding nodes across available processing nodes in the distributed computing environment based on one or more load
15 balancing strategies and data locality. In an exemplary aspect, the input data may be divided into one or more chunks to enable parallel processing. The partitioned data may be a subset of original data. The original data may be the user input. The input data may be the executable form of user input. The executable form of user input may contain a job details. The job details may include but are not limited to, a job
20 description, a job ID, a job type and a resource address.
[00107] In an exemplary aspect, an available processing node may be an actively
participating computing resource in a distributed computing environment. The available processing node may be waiting for the job from a job queue. The job queue may be managing the job schedule of the user.
25 [00108] In an aspect, the one or more load balancing strategies may be a strategy
used to distribute the one or more jobs across the one or more nodes in the distributed
computing environment. The load balancing strategies may be applied to the
distributed computing environment to ensure no single node is overburdened with jobs.
The load balancing strategies may be used to utilize the capacity of one or more nodes
31
in the distributed computing environment. The load balancing strategies may be used
to improve the performance of the computing resources in the distributed computing
environment. In an exemplary aspect, the load balancing strategies may include but not
limited to, a round robin strategy, a randomized load balancing, a weighted round robin
5 strategy, an adaptive load balancing strategy and a content-based routing strategy. For
example, the round robin load balancing strategy involves equal distribution of jobs to
one or more nodes by equal allocation of resources required for job completion.
[00109] In an exemplary aspect, the data locality may be a process of executing
the job at a data residing node in the distributed computing environment. The data
10 residing node may be the node having the input data for the data processing task. The data locality may reduce the data movement across the computing resource of the distributed computing environment. For example, Node-1 may have the set of job parameter for a task-1. The execution of task-1 at node-1 may reduce the time taken for task completion.
15 [00110] At step 510, the processor may aggregate one or more job results and
store the results. In an aspect, aggregating the one or more job results involves combining or summarizing one or more outcomes of one or more jobs. For example, each node in the distributed computing environment may perform a small portion of the job. The results from each node may be combined to arrive a desired result for the
20 job. In an aspect, the aggregated results may be stored in a central storage medium. The central storage medium may be a data lake 322.
[00111] In an aspect, aggregating a result and collecting the result includes
performing at least one of merging data sets, combining partial results, or applying specific aggregation functions including at least one of a sum, an average, a count,
25 depending on one or more requirements of the data processing task. In an exemplary
aspect, the partial results may be an incomplete or interim outcome received from each
of the computing resources. The partial results are generated during parallel execution
of the task. The partial results may be outcome of a single computing resource in the
distributed computing environment.
32
[00112] In exemplary aspect, aggregation of results is combining the parallel
executed output and storing it in a centralized storage. The parallel processed data is
received and aggregated as desired output as required by the user.
[00113] In an embodiment, determining the one or more specific requirements
5 further includes incorporating, by the processor, one or more fault tolerance mechanisms to handle failures in the distributed environment and detecting, by the processor, at least one of one or more failures, re-route jobs to healthy nodes, for ensuring an uninterrupted processing of data. In an aspect, the fault tolerance mechanism involves tuning the distributed environment to continue operating without
10 interruption when one or more of the components fail to work.
[00114] In an aspect, re-route jobs to healthy nodes involves diverting the jobs
to different worker core in a processor in case assigned worker core failure for ensuring
an uninterrupted processing of data.
[00115] In an embodiment, aggregating the results further includes performing
15 at least one of merging results from one or more nodes, combining partial results, or applying specific aggregation functions including at least one of a sum, an average, a count, depending on one or more requirements of the data processing task. In an aspect, merging results from the nodes involves combining the results of each node to arrive at a desired outcome for the received data processing task. For example, merging
20 results from the one or more nodes involves aggregating of outputs generated by the computing resources in the network.
[00116] In an exemplary aspect, the specific aggregation function may be used
to combine results from one or more computing resource in the distributed computing environment. The specific aggregation function may be useful for consolidating a
25 computed result of the computing resource in the distributed computing environment. For example, the specific aggregation function may include a sum, an average and a count. The count function may be used for determining the size of a data set.
33
[00117] FIG. 6 illustrates an exemplary computer system in which or with which
embodiments of the present disclosure may be utilized, according to an embodiment of the present disclosure.
[00118] Referring to FIG. 6, a block diagram 600 of an exemplary computer
5 system 108 is disclosed. The computer system includes input devices 602 connected through I/O peripherals. The system also includes a Central Processing Unit (CPU) 604, and Output Devices 608, connected through the I/O peripherals. The CPU 604 is also attached to a memory unit 616 along with an Arithmetic and Logical Unit (ALU) 614, a control unit 612, along with secondary storage devices 610 such as Hard Disks
10 and a Secure Digital Card (SD). The data flow and control flow 606 is indicated by a straight and dashed arrow respectively. The CPU consists of data registers that hold the data bits, pointers, cache, Random Access Memory (RAM) 604, and a main processing unit containing the processing engine 208. The system also consists of communication buses that are used to transport the data internally in the system.
15 [00119] In an embodiment, the processor 202 of the system is used to process
all the data that is required for the identification of a higher-ranked neighbor cell. A person skilled in the art will appreciate that the system may include more than one processor 202 and communication ports for ease of function. The processor 202 may include various modules associated with embodiments of the present disclosure such
20 as the dynamic jobs module 210, the compilation module 212, the collection module 214 and other modules 216. The input component can also include communication ports, ethernet ports, gigabit ports, parallel port, or another Universal Serial Bus (USB). The communication port can also be chosen depending on a specific network such as a Wide Area Server (WAN), Local Area Network LAN), or a Personal Area Network
25 (PAN). The communication port can be a RS-232 port that can be used with the remote dialling and internet connection options of the system. The Gigabit port can be used to connect the system to the internet at all times. The Gigabit port can use copper or fibre for connection.
34
[00120] FIG. 7 illustrates another exemplary computer system 700 in which or
with which embodiments of the present disclosure may be implemented. For example,
the system 108, the distributed data processing unit 110, the computing device 104-1-
N, etc., may be implemented using the computer system 700.
5 [00121] As shown in FIG. 7, the computer system 700 may include an external
storage device 710, a bus 720, a main memory 730, a read-only memory 1140, a mass storage device 750, a communication port(s) 760, and a processor 770. A person skilled in the art will appreciate that the computer system 700 may include more than one processor and communication ports. The processor 770 may include various modules
10 associated with embodiments of the present disclosure. The communication port(s) 760 may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication ports(s) 760 may be chosen depending on a network, such as a Local Area Network (LAN), Wide Area Network
15 (WAN), or any network to which the computer system 900 connects. In an aspect, referring to FIG. 1 and 7 the exemplary computer system 700 may be implemented in the computing device 104. The exemplary computer system may be implemented in the system 108. The exemplary computer system 700 may be implemented in the distributed data processing unit 110.
20 [00122] In an embodiment, the main memory 730 may be Random Access
Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory 740 may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chip for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor 770. The
25 mass storage device 750 may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives
35
(internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces).
[00123] In an embodiment, the bus 720 may communicatively couple the
processor(s) 1170 with the other memory, storage, and communication blocks. The bus 5 720 may be, e.g. a Peripheral Component Interconnect PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), Universal Serial Bus (USB), or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor 770 to the computer system 700.
10 [00124] In another embodiment, operator and administrative interfaces, e.g., a
display, keyboard, and cursor control device, may also be coupled to the bus 720 to support direct operator interaction with the computer system 700. Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) 760. The components described above are meant
15 only to exemplify various possibilities. In no way should the aforementioned
exemplary computer system 700 limit the scope of the present disclosure.
[00125] It is to be appreciated by a person skilled in the art that while various
embodiments of the present disclosure have been elaborated for a distributed data processing orchestrator system. However, the teachings of the present disclosure are
20 also applicable for other types of applications as well, and all such embodiments are well within the scope of the present disclosure. However, the system and method for a distributed data processing orchestrator is also equally implementable in other industries as well, and all such embodiments are well within the scope of the present disclosure without any limitation.
25 [00126] Moreover, in interpreting the specification, all terms should be
interpreted in the broadest possible manner consistent with the context. In particular,
the terms “comprises” and “comprising” should be interpreted as referring to elements,
components, or steps in a non-exclusive manner, indicating that the referenced
elements, components, or steps may be present, or utilized, or combined with other
36
elements, components, or steps that are not expressly referenced. Where the
specification claims refer to at least one of something selected from the group
consisting of A, B, C….and N, the text should be interpreted as requiring only one
element from the group, not A plus N, or B plus N, etc.
5 [00127] The various embodiments of the present disclosure provide several
advantages including simplified job creation wherein users can create jobs or tasks without the need for programming knowledge. This user-friendly approach makes it accessible to a wider range of users, regardless of their technical background. The platform automates job deployment and execution, eliminating the need for manual
10 code development and deployment. This saves significant time and effort for users, allowing them to focus on job requirements and analysis. The present disclosure allows for the creation of dynamic jobs tailored to specific requirements. Users can define parameters, configurations, and dependencies, enabling flexibility in executing various tasks on the distributed cluster. With automated job deployment and execution, the
15 present disclosure improves overall efficiency in distributed data processing. The users can focus on defining requirements and analyzing results instead of dealing with code intricacies. The present disclosure enables quick adaptation to changing requirements by easily creating new jobs or modifying existing ones. This promotes agility and responsiveness in addressing evolving data processing needs. In an aspect, the system
20 in the present disclosure simplifies job creation, saves time and effort, enables dynamic task execution, enhances efficiency, and offers agility in adapting to changing requirements.
[00128] In an aspect, the most unique feature of the present disclosure lies in its
ability to enable dynamic job creation and deployment on a distributed computing
25 environment without the requirement of coding. This feature allows users to define job
requirements and configurations through a user-friendly platform, eliminating the need
for manual coding and making it accessible to a wider range of users. By automating
the deployment and execution processes, the present disclosure streamlines job
execution, saving time and effort. This removes technical barriers, enhances efficiency,
37
and provides a flexible solution that can easily adapt to changing requirements. The present disclosure provides a system for efficiently executing a distributed data processing orchestration. The present disclosure provides a system wherein users can create jobs or tasks without the need for programming knowledge as the user-friendly 5 approach makes it accessible to a wider range of users, regardless of their technical background. The present disclosure provides a system that automates job deployment and execution, eliminating the need for manual code development and deployment thus saving significant time and effort for users, allowing them to focus on job requirements and analysis. The present disclosure provides a system that allows for the creation of
10 dynamic jobs tailored to specific requirements wherein users can define parameters, configurations, and dependencies, enabling flexibility in executing various tasks on the distributed cluster. The present disclosure provides a system that improves overall efficiency in distributed data processing so that users can focus on defining requirements and analyzing results instead of dealing with code intricacies. The present
15 disclosure provides a system that enables quick adaptation to changing requirements by easily creating new jobs or modifying existing ones to promote agility and responsiveness in addressing evolving data processing needs. In an aspect, the present disclosure provides simplified job creation that saves time and effort, enables dynamic task execution which enhances efficiency, and offers agility in adapting to changing
20 requirements.
[00129] The present disclosure provides technical advancement related to
distributed data processing orchestrator is that its ability to enable dynamic job creation and deployment on a distributed computing environment without the requirement of coding. It allows the user to create dynamic jobs with specific requirements according
25 to user need without the requirement of coding. Further, parallel execution of task
enables distributing workload efficiently across the computing cluster.
[00130] While considerable emphasis has been placed herein on the preferred
embodiments it will be appreciated that many embodiments can be made and that many
changes can be made in the preferred embodiments without departing from the
38
principles of the disclosure. These and other changes in the preferred embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the disclosure and not as a limitation. 5
ADVANTAGES OF THE PRESENT DISCLOSURE
[00131] The present disclosure provides a system and method for managing and
coordinating an execution of distributed data processing tasks across a network of interconnected computing resources to streamline task execution in automated manner.
10 [00132] The present disclosure provides a system and method for simplified job
creation in a distributed data processing.
[00133] The present disclosure provides automated job deployment and
execution that saves time and effort of the user. Also, eliminates the need for manual code development and deployment.
15 [00134] The present disclosure provides dynamic job execution by allowing the
user to define parameters, configurations, and dependencies, enabling flexibility in executing various tasks on the distributed cluster.
[00135] The present disclosure provides automated job deployment and
execution that improves overall efficiency in distributed data processing.
20 [00136] The present disclosure provides a system and method for managing and
coordinating an execution of distributed data processing tasks across a network of interconnected computing resources by enabling quick adaptation to changing requirements according to the user requirements.
39
We Claim:
1. A method (500) for managing distributed data processing in a distributed
computing environment, the method comprising:
5 receiving (502) a data processing task with one or more template
features via a user interface (UI), wherein the one or more template features includes a set of job parameters;
determining (504), by a processor (202), one or more requirements of
the data processing task and generating one or more jobs, wherein the
10 determining the one or more requirements comprising at least one of:
understanding an input data, a desired output, a processing logic, one or more dependencies, and a plurality of at least one of: additional configurations or parameters;
compiling (506), by the processor (202), the one or more jobs into an
15 executable format for the distributed computing environment based on the
determination;
executing (508), by the processor (202), the one or more jobs through one or more nodes in the distributed computing environment; and
aggregating (510), by the processor (202), results of the one or more
20 jobs and storing the results.
2. The method (500) as claimed in claim 1, wherein determining the one or more
requirements further comprises:
incorporating, by the processor (202), one or more fault tolerance
25 mechanisms; and
detecting, by the processor (202), at least one of one or more failures in execution of the jobs and re-routing the jobs associated with the one or more failures to healthy nodes.
40
3. The method (500) as claimed in claim 2, wherein executing one or more jobs in
parallel comprises:
dividing an input data into one or more chunks; and
distributing the one or more chunks to corresponding nodes across
5 available processing nodes in the distributed computing environment based on
one or more load balancing strategies and data locality.
4. The method (500) as claimed in claim 1, wherein aggregating the results further
comprises:
10 performing at least one of: merging results from the one or more nodes,
combining partial results, or applying specific aggregation functions comprising at least one of: a sum, an average, and a count, based on the one or more requirements of the data processing task.
15 5. A system (108) for managing distributed data processing across a distributed
computing environment, the system (108) comprising:
a memory (204) for storing one or more executable modules and data
associated with one or more parameters associated with the distributed data
processing; and
20 an interface (206) to facilitate communication through the system (108);
a processor (202) for executing the one or more executable modules for
managing and coordinating an execution of distributed data processing tasks
across a network of interconnected computing resources, the one or more
executable modules comprising:
25 a dynamic job module (210) configured to:
receive a data processing task with one or more template features via a user interface (UI), wherein the one or more template features include a set of job parameters;
41
determine one or more requirements of the data
processing task and generating one or more jobs, wherein the
determining the one or more requirements comprises at least one
of: understanding an input data, a desired output, a processing
5 logic, one or more dependencies, and a plurality of at least one
of: additional configurations or parameters; a compilation module (212) configured to:
compile one or more jobs into an executable format for
the distributed computing environment based on the
10 determination;
execute one or more jobs through one or more nodes in the distributed computing environment; and a collection module (214) configured to:
aggregate results of the one or more jobs and storing the
15 results.
6. The system (108) as claimed in claim 5, wherein the dynamic job module (210)
is further configured to:
incorporate one or more fault tolerance mechanisms; and
20 detect at least one of one or more failures, and reroute the jobs
associated with the one or more failures to healthy nodes.
7. The system (108) as claimed in claim 5, wherein the compilation module (212)
is further configured to:
25 scheduling, the one or more jobs to be executed in parallel and
distributing the one or more jobs across multiple nodes; and
splitting at least one of: a code or a job into smaller tasks and distribute the smaller tasks to different nodes for concurrent execution.
42
8. The system (108) as claimed in claim 5, wherein the collection module (214) is
further configured for:
performing at least one of: merging results from the one or more nodes,
5 combining partial results, or applying specific aggregation functions
comprising at least one of: sum, average, and count based on the requirements of the data processing task.
9. A user equipment (104) communicatively coupled to a system (108), the user
equipment comprising:
10 a processor; and
a computer readable storage medium storing programming for execution by the processor, the programming including instructions to:
provide (502) a data processing task with one or more template features
via a user interface (UI), wherein the one or more template features includes a
15 set of job parameters; and
receiving results of the data processing task on the UI based on processing the data processing task using the method (500) for managing distributed data processing in a distributed computing environment as claimed in claim 1.
| # | Name | Date |
|---|---|---|
| 1 | 202321051144-STATEMENT OF UNDERTAKING (FORM 3) [29-07-2023(online)].pdf | 2023-07-29 |
| 2 | 202321051144-PROVISIONAL SPECIFICATION [29-07-2023(online)].pdf | 2023-07-29 |
| 3 | 202321051144-FORM 1 [29-07-2023(online)].pdf | 2023-07-29 |
| 4 | 202321051144-DRAWINGS [29-07-2023(online)].pdf | 2023-07-29 |
| 5 | 202321051144-DECLARATION OF INVENTORSHIP (FORM 5) [29-07-2023(online)].pdf | 2023-07-29 |
| 6 | 202321051144-FORM-26 [25-10-2023(online)].pdf | 2023-10-25 |
| 7 | 202321051144-FORM-26 [30-05-2024(online)].pdf | 2024-05-30 |
| 8 | 202321051144-FORM 13 [30-05-2024(online)].pdf | 2024-05-30 |
| 9 | 202321051144-AMENDED DOCUMENTS [30-05-2024(online)].pdf | 2024-05-30 |
| 10 | 202321051144-Request Letter-Correspondence [03-06-2024(online)].pdf | 2024-06-03 |
| 11 | 202321051144-Power of Attorney [03-06-2024(online)].pdf | 2024-06-03 |
| 12 | 202321051144-Covering Letter [03-06-2024(online)].pdf | 2024-06-03 |
| 13 | 202321051144-CORRESPONDENCE(IPO)-(WIPO DAS)-12-07-2024.pdf | 2024-07-12 |
| 14 | 202321051144-FORM-5 [26-07-2024(online)].pdf | 2024-07-26 |
| 15 | 202321051144-DRAWING [26-07-2024(online)].pdf | 2024-07-26 |
| 16 | 202321051144-CORRESPONDENCE-OTHERS [26-07-2024(online)].pdf | 2024-07-26 |
| 17 | 202321051144-COMPLETE SPECIFICATION [26-07-2024(online)].pdf | 2024-07-26 |
| 18 | 202321051144-ORIGINAL UR 6(1A) FORM 26-160924.pdf | 2024-09-23 |
| 19 | 202321051144-FORM 18 [04-10-2024(online)].pdf | 2024-10-04 |
| 20 | Abstract-1.jpg | 2024-10-08 |
| 21 | 202321051144-FORM 3 [12-11-2024(online)].pdf | 2024-11-12 |