Abstract: Title: SYSTEM AND METHOD OF ALLOCATING CORE BASED ON WORKLOAD CHARACTERIZATION ABSTRACT A system (100) of allocating core based on workload characterization, the system comprising: an execution unit (102) and a non-transitory storage medium (104), wherein the non-transitory storage medium (104) instruction modules to be executed by the execution unit (102), the non-transitory storage medium (104) comprises: a workload identifier (106) adapted to identify an existing and/or an upcoming workload on an embedded system; a parameter extraction module (108) adapted to extract and measure workload parameters to provide a set of test data; a workload characterization module (110) adapted to characterize the workload based on the parameters extracted by the parameter extraction module (108); a learning framework (112) adapted to train the workload parameters using the set of test data; a core allocation module (114) adapted to allocate a core based on an input received from the learning framework (112) to improve an efficiency of the embedded system. Claims: 10, Figures: 5 Figure 1 is selected.
Claims:CLAIMS
I/We Claim:
1. A system (100) of allocating core based on workload characterization, the system comprising:
an execution unit (102) and a non-transitory storage medium (104), wherein the non-transitory storage medium (104) comprises instruction modules to be executed by the execution unit (102), the non-transitory storage medium (104) comprises:
a workload identifier (106) adapted to identify an existing and/or an upcoming workload on an embedded system;
a parameter extraction module (108) adapted to extract and measure workload parameters to provide a set of test data;
a workload characterization module (110) adapted to characterize the workload based on the parameters extracted by the parameter extraction module (108);
a learning framework (112) adapted to train the workload parameters using the set of test data and to frame an embedded database with training datasets obtained from the test data collected at multiple instances; and
a core allocation module (114) adapted to predict an accuracy of the test data based on the training datasets of the embedded system and configured to allocate a core based on the workload characterization to improve an efficiency of the embedded system.
2. The system (100) as claimed in claim 1, wherein the parameter extraction module (108) is an integration of simulation tool (202) and/or a timing analyzer (204).
3. The system (100) as claimed in claim 1, wherein the workload parameters are measured on embedded Advanced RISC Machines (ARM) and/or Cortex architectures.
4. The system (100) as claimed in claim 1, wherein the workload parameters are power, performance, memory, branch, Translation Lookaside Buffers (TLB), energy consumption and/or energy-delay product.
5. The system (100) as claimed in claim 1, wherein the learning framework (112) is configured with a machine learning algorithms which is primarily based on Extreme Learning Machines along with cognitive rule sets.
6. The system (100) as claimed in claim 1, wherein after training using the learning framework (112), the test data is used to validate the accuracy of the machine learning algorithm.
7. The system (100) as claimed in claim 1, wherein the test data comprises an Instruction per count (IPC), an L1-D cache, an L1-I cache, an L2-cache access, a cache miss ratio, an Arithmetic integer/float/ add/mul, a Branch mis-prediction data’s, a data-TLB (dTLB) misses, instruction-TLB (iTLB) misses, an average power consumption and/or overall execution time of the workload.
8. The system (100) as claimed in claim 1, wherein the workload parameters are collected using binary management systems.
9. The system (100) as claimed in claim 1, wherein the embedded system architecture is a multi-core architecture.
10. A method (400) of allocating core based on workload characterization, the method (400) comprising:
identifying, using a workload identifier (106), workloads for an embedded system;
extracting, using a parameter extraction module (108), workload parameters of identified the workload in accordance with architectures and to provide a set of test data;
characterizing, using a workload characterization module (110) the workload based on the set of the test data;
implementing, using at a learning framework (112), a learning-based algorithm to collect the test data and to frame the test data as an embedded database to train an embedded environment;
exploring, using an execution unit (102), architectures employing a technique of training and testing; and
allocating core, using a core allocation module (114) in accordance with the workload characterization to achieve a high output and a low power consumption.
Date: 24 March, 2021
Place: Noida
Dr. Keerti Gupta
Agent for the Applicant
(IN/PA-1529)
, Description:FORM 2
THE PATENT ACT 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See Section 10, and rule 13)
TITLE OF THE INVENTION
SYSTEM AND METHOD OF ALLOCATING CORE BASED ON WORKLOAD CHARACTERIZATION
APPLICANT(S)
NAME: DR. PAMULAPTI ANURADHA
NATIONALITY: INDIAN
ADDRESS: S R ENGINEERING COLLEGE, ANANTHASAGAR (V), HASANPARTHY (M), WARANGAL, TELANGANA 506371
The following specification particularly describes the invention and the manner in which it is to be performed
BACKGROUND
Field of Invention
[001] Embodiments of the present invention generally relate to a system and method for optimizing hardware configuration and particularly to a system and method of allocating core based on workload characterization for embedded systems.
Description of Related Art
[002] With the rapid growth in technology, computing systems are shrinking in size and weight and unveiling high performance. This has made embedded systems commonplace in everyday life. These systems are used in many diverse application areas namely, automated industry applications, automotive applications, avionics, defense applications, consumer electronics, etc. The embedded systems are particularly made to perform real-time tasks and in which the timing of tasks plays a vital role.
[003] Additionally, due to the increasing complexity of embedded system design, now in most of the applications, multi-core architectures have replaced a single-core architecture. These multi-core architectures find their place in many applications, such as wearable devices, agriculture, and most importantly, intelligent electronics. Power consumption and operational efficiency are very important criteria for embedded systems. Although the multi-core architecture plays an important role, however, for multi-core architecture of an embedded system environment, power and performance achievements remain unidentified.
[004] Despite the technological advancement in the prior art, there is still a need for an intelligent framework that can describe the workload corresponding to an embedded core efficiency. In the existing technology, a conventional mapping algorithm for multi-threaded workloads is disclosed in which computation threads are allocated on big-cores and memory threads on little cores. With this methodology, the computation threads achieved better performance with high energy consumption, and memory threads achieved low performance with less energy consumption. This allocation method leads to the in-optimal hardware configurations for multithreaded application executions. However, the inherent problem of this method is not utilized the heterogeneity property of asymmetric cores, which degrades the scheduling performance and enhances the overheads.
[005] There is a need for a system and a method for workload characterization and distribution of core in a more efficient manner.
SUMMARY
[006] Embodiments in accordance with the present invention a system and a method of allocating core based on workload characterization in an embedded system-based environment.
[007] The system comprising: an execution unit and a non-transitory storage medium, wherein the non-transitory storage medium comprise instruction modules to be executed by the execution unit, the non-transitory storage medium comprises: a workload identifier adapted to identify an existing and/or an upcoming workload on an embedded system; a parameter extraction module adapted to extract and measure workload parameters to provide a set of test data; a workload characterization module adapted to characterize the workload based on the parameters extracted by the parameter extraction module; a learning framework adapted to train the workload parameters using the set of test data and to frame an embedded database with training datasets obtained from the test data collected at multiple instances; and a core allocation module adapted to predict an accuracy of the test data based on the training datasets of the embedded system and configured to allocate the core based on the workload characterization to improve an efficiency of the embedded system.
[008] Embodiments in accordance with the present invention further provide a method of allocating core based on workload characterization. The method comprising: identifying, using a workload identifier, workloads for an embedded system; extracting, using a parameter extraction module, workload parameters of the identified workload in accordance with architectures and to provide a set of test data; characterizing, using a workload characterization module, the workload based on the set of the test data; implementing, using a learning framework, a learning-based algorithm to collect the test data and to frame the test data as an embedded database to train an embedded environment; exploring, using an execution unit, architectures employing a method of training and testing; and allocating core, using a core allocation module in accordance with the workload characterization to achieve a high output and a low power consumption.
[009] Embodiments of the present invention may provide several advantages depending on its particular configuration. First, embodiments of the present application provide a system and a method of allocating core based on workload characterization.
[0010] Next, embodiments of the present application provide a system which is configured for measuring workload parameters in accordance with the embedded system architectures such as a single core and/or a multi-core architecture.
[0011] Next, embodiments of the present application provide a system and a method which is adapted for dividing workloads in multi-core embedded systems to achieve a high performance and power consumption efficiency.
[0012] Next, embodiments of the present application provide an intelligent computing framework for efficient embedded systems using workload characterization.
[0013] Next, embodiments of the present application provide a cognitive rule set which works on the Extreme Learning Machine (ELM) principle for the allocation of cores in accordance with the workloads for obtaining a high performance and a low energy consumption.
[0014] Next, embodiments of the present application provide a cognitive rule set which is configured to have three subsets such that a calculation of workload parameters, an exploration of architectures using a method of training and testing, and distribution of core workloads to achieve high output and low power consumption.
[0015] Next, embodiments of the present application provide an extreme learning algorithm to optimize design parameters such as power, performance, energy consumption, and energy-delay product.
[0016] These and other advantages will be apparent from the present application of the embodiments described herein.
[0017] The preceding is a simplified summary to provide an understanding of some embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The above and still further features and advantages of embodiments of the present invention will become apparent upon consideration of the following detailed description of embodiments thereof, especially when taken in conjunction with the accompanying drawings, and wherein:
[0019] FIG. 1 illustrates a block diagram depicting a system of allocating core based on workload characterization, according to an embodiment of the present invention;
[0020] FIG. 2 illustrates a functional block diagram of the system of allocating core based on workload characterization, according to an embodiment of the present invention;
[0021] FIG. 3 depicts a functional block diagram of the system of allocating core based on workload characterization, according to an embodiment of the present invention;
[0022] FIG. 4 depicts a flowchart of a method of allocating core based on workload characterization, according to another embodiment of the present invention; and
[0023] FIG. 5 depicts a table-1 for comparison of the system of allocating core based on workload characterization with an existing product, according to another embodiment of the present invention.
[0024] The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including but not limited to. To facilitate understanding, reference numerals have been used, where possible, to designate elements common to the figures. Optional portions of the figures may be illustrated using dashed or dotted lines unless the context of usage indicates otherwise.
DETAILED DESCRIPTION
[0025] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention as defined in the claims.
[0026] In any embodiment described herein, the open-ended terms "comprising," "comprises,” and the like (which are synonymous with "including," "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of," consists essentially of," and the like or the respective closed phrases "consisting of," "consists of, the like.
[0027] As used herein, the singular forms “a”, “an”, and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.
[0028] Embodiments of the present invention may provide an intelligent computing framework for an efficient embedded system. As depicted in FIG. 1, an embodiment of the present invention provides a system 100 of allocating core based on workload characterization. In an embodiment of the present invention, the system 100 may comprise an execution unit 102 and a non-transitory storage medium 104.
[0029] In an embodiment of the present invention, the non-transitory storage medium 104 may comprise non-limiting instruction modules such as a workload identifier 106, a parameter extraction module 108, a workload characterization module 110, a learning framework 112, and a core allocation module 114.
[0030] In an embodiment of the present invention, the execution unit 102 may be adapted to optimize workload parameters for allocating core in the embedded system (not shown) by executing instructions incorporated in form of the non-limiting modules stored in the non-transitory storage medium 104.
[0031] In an embodiment of the present invention, the execution unit 102 may be a processor which maybe, but not limited to, a microcontroller, a microcontroller, a computing unit, a conventional processor, a controller, a state machine, and so forth. In another embodiment of the present invention, the execution unit 102 may be implemented as a combination of computing devices, e.g., a combination of a Digital Signal Processor (DSP) and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a Digital Signal Processor (DSP) core, or any other such configuration.
[0032] In an embodiment of the present invention, the workload identifier 106 may be adapted to identify an existing and/or an upcoming workload. The workload identifier 106 may gather information relating to requests from runtime workloads for resource allocations, responses to the requests for resource allocations, actual usage of resources by the workloads, network traffic and/or any other suitable information.
[0033] The parameter extraction module 108 may be adapted to extract workload parameters. In an embodiment of the present invention, the workload parameters may be including, but not limited to power, performance, memory, branch, Translation Lookaside Buffers (TLB), energy consumption and/or an energy-delay product. The parameter extraction module 108 may identify a set of test data for each parameter such as an Instruction per count (IPC), an L1-D cache, an L1-I cache, an L2-cache access, a cache miss ratio, an Arithmetic integer/float/ add/mul, a Branch mis-prediction data’s, a data-TLB (dTLB) misses, instruction-TLB (iTLB) misses, an average power consumption and/or overall execution time of the workload.
[0034] In an embodiment of the present invention, the workload characterization module 110 may be adapted to characterize the workload based on the parameters extracted by the parameter extraction module 108. The ideal workload characteristics may be obtained by configuring the execution unit 102 for an optimum clock frequency so that it may consume less energy without sacrificing the performance. The Clock frequency may be measured by using a command CPUFREQUTIL tools and the performance may be measured by using a command CPUStat, in an embodiment of the present invention.
[0035] In an embodiment of the present invention, the learning framework 112 may be adapted to train an environment of the embedded system with the test data obtained from the parameter extraction module 108. The learning framework 112 may be further adapted to frame the set of the test data as an embedded database (not shown). In another embodiment of the present invention, the learning framework 112 may be adapted to frame the embedded database with training datasets obtained from the test data collected at multiple instances, as discussed above.
[0036] The embedded database may be used as a training dataset which may evolve with each cycle of the workload characterization. In an embodiment of the present invention, the training datasets may be collected by measuring the ideal characteristics of the execution unit 102 used in the embedded system. In an embodiment of the present invention, the core allocation module 114 may be adapted to predict the accuracy of the test data based on the training datasets of the embedded system. The core allocation module 114 may allocate a core based on workload characterization to improve the efficiency of the embedded system.
[0037] As depicted in FIG. 2, in an embodiment of the present invention, the parameter extraction module 108 may be an integration of a simulation tool 202 and a timing analyzer 204.
[0038] In an embodiment of the present invention, the parameter extraction module 108 may receive an input workload of the embedded system. The input workload may be optimized using benchmark programs such as MiBench, Coremark, Bristol/Embecosm Embedded Benchmark Suite (BEEBS), SPEC CPU 2010 and/or Dhrystone may be provided to evaluate costs associated with the test of the workload parameters. In an embodiment of the present invention, the benchmark programs may be employed for measurement of energy consumption and/or the performance of the execution unit 102 used in the embedded system.
[0039] In a preferred embodiment of the present invention, the workload parameters may be measured on embedded Advanced RISC Machines (ARM) and/or Cortex architectures. In another embodiment of the present invention, the workload parameters may be collected using binary management systems that may be designed specifically for certain architectures, such as PIN codes for IA-32, X86-64 architectures, Valgrind for X86 architectures, MIPS 64, ATOM for alpha architectures, and DynamoRIO for IA-Architecture 32 and Architecture 64.
[0040] The simulation tool 202 may comprise simulators such as SIM_BP and SIM_INSTR. The stimulators may be used for a measuring branch prediction ratio and instructions per second. The simulation tool 202 may further comprise CHEETAH that may be used for performance testing.
[0041] In an embodiment of the present invention, the parameter extraction module 108 may be configured to measure a number of branches taken, a cache size estimation, an instruction mix, and a number of clock signals used in the execution. In an embodiment of the present invention, the parameter extraction module 108 may be adapted to provide the set of the test data based on the measurement of the workload parameters. The test data may comprise a branch prediction ratio, a catch hit-miss ratio, instructions, and instructions per count.
[0042] In an embodiment of the present invention, the embedded system architecture may be explored using a method of training and testing of the workload parameters. In an embodiment of the present invention, the test data may be fed to the learning framework 112. The test data may be used for training the learning framework 112 that may be configured to design and explore the multi-core heterogeneous architectures. In another embodiment of the present invention, the learning framework 112 may be adapted to be executed and tested with different families of the ARM architectures to predict an accuracy of the test data.
[0043] As depicted in FIG. 3, the present invention may rely on cognitive rule sets that may have three subsets such as in a first subset 302 may require a setup of the workloads. In a second subset 304, the characterization of the workload may be performed using the parameter extraction module 108 and by deploying a machine learning algorithm that may be primarily based on extreme learning machines. In a third subset 306, the prediction and allocation of the core may be performed based on the test data.
[0044] An embodiment of the present invention may provide the cognitive rule sets for the utilization of the cores in accordance with the workloads. In an embodiment of the present invention, the cognitive rule sets may be deployed on a cognitive learning machine. The cognitive learning machine may consist of a single feed-forward network with an intelligent rule system.
[0045] In an embodiment of the present invention, the learning framework 112 may be used to train the workload parameters and is used to predict the cores in accordance with the workloads. In accordance with an embodiment of the present invention, the various machine learning-based algorithms may be utilized such as neural networks, support vector machines and fuzzy neural networks, etc. In a preferred embodiment of the present invention, machine learning algorithms may be based on the extreme learning machines along with the cognitive rule sets. In an embodiment of the present invention, the machine learning algorithm may act as a single brain-like neural network in which input weights and hidden nodes may be randomly chosen such that the machine learning algorithm may determine the output weights with high speed and high accuracy.
[0046] As depicted in FIG. 4, the further embodiments of the present invention may provide a method 400 of allocating core based on workload characterization. The method 400 may comprise steps that may be performed in any order such as:
[0047] At step 402, the system 100 may identify workloads using the workload identifier 106 for the embedded system.
[0048] At step 404, the system 100 may extract workload parameters of identified the workload in accordance with the architectures using the parameter extraction module 108. The system 100 may further provide the set of the test data.
[0049] At step 406, the system 100 may characterize the workload based on the set of the test data using the workload characterization module 110.
[0050] At step 408, the system 100 may implement the learning-based algorithm using the learning framework 112 to collect the test data and to frame the test data as the embedded database to train the embedded environment.
[0051] At step 410, the system 100 may explore architectures by employing the technique of training and testing using the execution unit 102.
[0052] At step 412, the system 100 may allocate core using a core allocation module 114 in accordance with the workload characterization to achieve a high output and a low power consumption. After the allocation, the method 400 may conclude.
[0053] As shown in FIG. 5, the performance of the system 100 may be compared with existing products. As shown in the table-1, a comparison of performance may be conducted for the present system 100 and existing products for multiple constraints of such as a communication range, a cost, GUI support, scalability analysis, sharing of information, and a control on situations. The system 100 may have better performance in terms of the aforementioned constraints, thereby may yield better efficiency. In an embodiment of the present invention, a Raspberry Pi 3 B + model with Quad Cortex A-7 core may be used to test the aforementioned constraints. With the present application, power consumption reductions for the various workloads such as HTTP, MQTT, WIFI storage and reference applications may be tested and are found in the range of 66.22% (minimum) to 89.18% (maximum). Therefore, it may be contemplated that the system 100 may reduce a time required to perform various functional loads from about 60% to 90%.
[0054] Embodiments of the invention are described above with reference to block diagrams and schematic illustrations of methods and systems according to embodiments of the invention. It will be understood that each block of the diagrams and combinations of blocks in the diagrams can be implemented by computer program instructions. These computer program instructions may be loaded onto one or more general-purpose computers, special purpose computers, or other programmable data processing apparatus to produce machines, such that the instructions which execute on the computers or other programmable data processing apparatus create means for implementing the functions specified in the block or blocks. Such computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement the function specified in the block or blocks.
[0055] While the invention has been described in connection with what is presently considered to be the most practical and various embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
[0056] This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined in the claims and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements within substantial differences from the literal languages of the claims.
| # | Name | Date |
|---|---|---|
| 1 | 202141014949-STATEMENT OF UNDERTAKING (FORM 3) [31-03-2021(online)].pdf | 2021-03-31 |
| 2 | 202141014949-FORM 1 [31-03-2021(online)].pdf | 2021-03-31 |
| 3 | 202141014949-FIGURE OF ABSTRACT [31-03-2021(online)].pdf | 2021-03-31 |
| 4 | 202141014949-DRAWINGS [31-03-2021(online)].pdf | 2021-03-31 |
| 5 | 202141014949-DECLARATION OF INVENTORSHIP (FORM 5) [31-03-2021(online)].pdf | 2021-03-31 |
| 6 | 202141014949-COMPLETE SPECIFICATION [31-03-2021(online)].pdf | 2021-03-31 |
| 7 | 202141014949-PA [30-12-2021(online)].pdf | 2021-12-30 |
| 8 | 202141014949-FORM28 [30-12-2021(online)].pdf | 2021-12-30 |
| 9 | 202141014949-ASSIGNMENT DOCUMENTS [30-12-2021(online)].pdf | 2021-12-30 |
| 10 | 202141014949-8(i)-Substitution-Change Of Applicant - Form 6 [30-12-2021(online)].pdf | 2021-12-30 |