Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for composable machine learning compute nodes. An example apparatus includes interface circuitry to receive a workload, instructions in the apparatus, and processor circuitry to at least one of execute or instantiate the instructions to generate a first configuration of one or more machine-learning models based on a workload, generate a second configuration of hardware, determine an evaluation parameter based on an execution of the workload, the execution of the workload based on the first configuration and the second configuration, and, in response to the evaluation parameter satisfying a threshold, execute the one or more machine-learning models in the first configuration on the hardware in the second configuration, the one or more machine-learning models and the hardware to execute the workload.
Description:RELATED APPLICATION
[0001] The present application claims priority to India Provisional Patent Application No. 202141036070 filed 10 August 2021 and titled “METHODS AND APPARATUS FOR COMPUTING SYSTEMS” the entire disclosure of which is hereby incorporated by reference.
[0002] The present application claims priority to U.S. Non-Provisional Patent Application No. 17/558,284 filed on 21 December 2021 and titled “APPARATUS, ARTICLES OF MANUFACTURE, AND METHODS FOR COMPOSABLE MACHINE LEARNING COMPUTE NODES” the entire disclosure of which is hereby incorporated by reference.
FIELD OF THE DISCLOSURE
[0003] This disclosure relates generally to machine learning and, more particularly, to apparatus, articles of manufacture, and methods for composable machine learning compute nodes.
BACKGROUND
[0004] Compute workloads may be carried out by using machine-learning models. Machine-learning models, such as neural networks, are useful tools that have demonstrated their value solving complex problems regarding pattern recognition, natural language processing, automatic speech recognition, etc. Identifying an optimal combination of hardware and/or software (e.g., a machine-learning model) to execute a compute workload is complex due to the vast range of available types of hardware and/or machine-learning models and customization(s) thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is an illustration of an example automatic machine learning (AutoML) architecture including an example machine-learning system configurator to identify and/or generate a composable machine learning compute node.
[0006] FIG. 2 is a block diagram of an example implementation of the machine-learning system configurator of FIG. 1.
[0007] FIG. 3 is a block diagram of an example implementation of the machine-learning system configurator of FIGS. 1 and/or 2.
[0008] FIG. 4 is an illustration of an example workflow to generate a composable machine learning compute node.
[0009] FIG. 5 is an illustration of another example workflow to identify a composable machine learning compute node.
[0010] FIG. 6 is an illustration of an example implementation of an example ontology database.
[0011] FIG. 7 is an illustration of yet another example workflow to identify a composable machine learning compute node.
[0012] FIG. 8 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to execute a workload with a composable machine learning compute node.
[0013] FIG. 9 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to generate a first configuration of one or more machine-learning models based on a machine-learning workload.
[0014] FIG. 10 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to generate a second configuration of hardware.
[0015] FIG. 11 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to adjust a first configuration based on an evaluation parameter.
[0016] FIG. 12 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to adjust a second configuration based on an evaluation parameter.
[0017] FIG. 13 is a flowchart representative of example machine readable instructions and/or example operations that may be executed by example processor circuitry to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3 to deploy a compute node to execute a machine-learning workload.
[0018] FIG. 14 is a block diagram of an example processing platform including processor circuitry structured to execute the example machine readable instructions and/or the example operations of FIGS. 8-13 to implement the example composable machine learning system configurator of FIGS. 1, 2, and/or 3.
[0019] FIG. 15 is a block diagram of an example implementation of the processor circuitry of FIG. 14.
[0020] FIG. 16 is a block diagram of another example implementation of the processor circuitry of FIG. 14.
[0021] FIG. 17 is a block diagram of an example software distribution platform (e.g., one or more servers) to distribute software (e.g., software corresponding to the example machine readable instructions of FIGS. 8-13) to client devices associated with end users and/or consumers (e.g., for license, sale, and/or use), retailers (e.g., for sale, re-sale, license, and/or sub-license), and/or original equipment manufacturers (OEMs) (e.g., for inclusion in products to be distributed to, for example, retailers and/or to other end users such as direct buy customers).
[0022] In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. The figures are not to scale.
DETAILED DESCRIPTION
[0023] As used herein, connection references (e.g., attached, coupled, connected, and joined) may include intermediate members between the elements referenced by the connection reference and/or relative movement between those elements unless otherwise indicated. As such, connection references do not necessarily infer that two elements are directly connected and/or in fixed relation to each other.
[0024] Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
, C , C , Claims: 1. An apparatus to generate a compute node, the apparatus comprising:
interface circuitry to receive a workload;
instructions in the apparatus; and
processor circuitry to at least one of execute or instantiate the instructions to:
generate a first configuration of one or more machine-learning models based on the workload, the first configuration stored in a first configuration database, the first configuration database including a plurality of machine-learning models, the plurality of the machine-learning models including the one or more machine- earning models;
generate a second configuration of hardware, the second configuration stored in a second configuration database, the second configuration database including one or more portions of a plurality of hardware, the plurality of the hardware including the hardware;
determine an evaluation parameter based on an execution of the workload, the execution of the workload based on the first configuration and the second configuration; and
in response to the evaluation parameter satisfying a threshold, execute the one or more machine-learning models in the first configuration on the hardware in the second configuration, the one or more machine-learning models and the hardware to execute the workload.
2. The apparatus of claim 1, wherein the first configuration includes at least one of a number of model layers, weights for the model layers, a type of machine-learning training, or one or more hyperparameters associated with the one or more machine-learning models.
3. The apparatus of any one of claims 1 or 2, wherein the one or more portions include at least one of a first block, a second block, or a third block, and the processor circuitry is to at least one of execute or instantiate the instructions to:
identify the first block of the hardware to execute a matrix-matrix workload;
identify the second block of the hardware to execute a vector-vector workload;
identify the third block of the hardware to execute a matrix-vector workload; and
identify register files for respective ones of the first block, the second block, and the third block, the register files to store states for the respective ones of the first block, the second block, and the third block, the second configuration based on a topology including at least one of the first block, the second block, or the third block.
4. The apparatus of any one of claims 1 or 2, wherein the one or more machine-learning models include a first machine-learning model, and the processor circuitry is to at least one of execute or instantiate the instructions to, in response to the evaluation parameter not satisfying the threshold:
identify a second machine-learning model in the first configuration database;
generate a third configuration of the second machine-learning model;
determine the evaluation parameter based on an execution of the workload based on the third configuration; and
deploy the second machine-learning model to execute the workload based on the third configuration.
5. The apparatus of any one of claims 1 or 2, wherein the one or more machine-learning models include a first machine-learning model, and the processor circuitry is to at least one of execute or instantiate the instructions to, in response to the evaluation parameter not satisfying the threshold:
determine one or more first layers of the first machine-learning model to execute a first portion of the workload;
identify a second machine-learning model in the first configuration database;
determine one or more second layers of the second machine-learning model to execute a second portion of the workload; and
determine a third configuration based on a topology of the one or more first layers and the one or more second layers, the topology based on an output from the one or more first layers as an input to the one or more second layers.
6. The apparatus of any one of claims 1 or 2, wherein the one or more machine-learning models include a first machine-learning model, and the processor circuitry is to at least one of execute or instantiate the instructions to:
identify the first machine-learning model in the first configuration database;
identify a second machine-learning model based on a query of an ontology database with an identifier of the first machine-learning model as an input, the ontology database including an association of the first machine-learning model and the second machine-learning model; and
in response to the evaluation parameter satisfying the threshold, update the ontology database based on the first configuration.
7. The apparatus of any one of claims 1 or 2, wherein the hardware is first hardware, and the processor circuitry is to at least one of execute or instantiate the instructions to, in response to the evaluation parameter not satisfying the threshold:
identify second hardware in the second configuration database;
generate a third configuration of the second hardware;
determine the evaluation parameter based on an execution of the workload by the second hardware in the third configuration; and
deploy the second hardware with the third configuration to execute the one or more machine-learning models to execute the workload.
8. The apparatus of any one of claims 1 or 2, wherein the hardware is first hardware, and the processor circuitry is to at least one of execute or instantiate the instructions to, in response to the evaluation parameter not satisfying the threshold:
determine one or more first portions of the first hardware to execute a first portion of the workload;
identify second hardware in the first configuration database;
determine one or more second portions of the second hardware to execute a second portion of the workload; and
determine a third configuration based on a topology of the one or more first portions and the one or more second portions, the topology based on an output from the one or more first portions as an input to the one or more second portions.
9. The apparatus of claim 8, wherein the first hardware and the second hardware are one of a central processor unit, a graphics processing unit, a digital signal processor, an Artificial Intelligence processor, a Neural Network processor, or a Field Programmable Gate Array.
10. The apparatus of any one of claims 1 or 2, wherein the evaluation parameter is a first evaluation parameter, and the processor circuitry is to at least one of execute or instantiate the instructions to:
output a reward function including the first evaluation parameter with a first weight and a second evaluation parameter with a second weight, the first weight greater than the second weight; and
in response to determining that at least one of the first evaluation parameter or the second evaluation parameter does not satisfy the threshold, modify at least one of the first configuration or the second configuration to at least one of increase the first evaluation parameter or decrease the second evaluation parameter.
11. The apparatus of any one of claims 1 or 2, wherein the evaluation parameter is at least one of an accuracy, a cost, an energy consumption, a latency, a performance, or a throughput associated with at least one of the one or more machine-learning models or the hardware.
12. A method for generating a compute node, the method comprising:
generating a first configuration of one or more machine-learning models based on a workload, the first configuration stored in a first configuration database, the first configuration database including a plurality of machine-learning models, the plurality of the machine-learning models including the one or more machine-learning models;
generating a second configuration of hardware, the second configuration stored in a second configuration database, the second configuration database including one or more portions of a plurality of hardware, the plurality of the hardware including the hardware;
determining an evaluation parameter based on an execution of the workload, the execution of the workload based on the first configuration and the second configuration; and
in response to the evaluation parameter satisfying a threshold, executing the one or more machine-learning models in the first configuration on the hardware in the second configuration, the one or more machine-learning models and the hardware to execute the workload.
13. The method of claim 12, wherein the first configuration includes at least one of a number of model layers, weights for the model layers, a type of machine-learning training, or one or more hyperparameters associated with the one or more machine-learning models.
14. The method of any one of claims 12 or 13, wherein the one or more portions include at least one of a first block, a second block, or a third block, and further including:
identifying the first block of the hardware to execute a matrix-matrix workload;
identifying the second block of the hardware to execute a vector-vector workload;
identifying the third block of the hardware to execute a matrix-vector workload; and
identifying register files for respective ones of the first block, the second block, and the third block, the register files to store states for the respective ones of the first block, the second block, and the third block, the second configuration based on a topology including at least one of the first block, the second block, or the third block.
15. The method of any one of claims 12 or 13, wherein the one or more machine-learning models include a first machine-learning model, and further including, in response to the evaluation parameter not satisfying the threshold:
identifying a second machine-learning model in the first configuration database;
generating a third configuration of the second machine-learning model;
determining the evaluation parameter based on an execution of the workload based on the third configuration; and
deploying the second machine-learning model to execute the workload based on the third configuration.
16. The method of any one of claims 12 or 13, wherein the one or more machine-learning models include a first machine-learning model, and further including, in response to the evaluation parameter not satisfying the threshold:
determining one or more first layers of the first machine-learning model to execute a first portion of the workload;
identifying a second machine-learning model in the first configuration database;
determining one or more second layers of the second machine-learning model to execute a second portion of the workload; and
determining a third configuration based on a topology of the one or more first layers and the one or more second layers, the topology based on an output from the one or more first layers as an input to the one or more second layers.
17. The method of any one of claims 12 or 13, wherein the one or more machine-learning models include a first machine-learning model, and further including:
identifying the first machine-learning model in the first configuration database;
identifying a second machine-learning model based on a query of an ontology database with an identifier of the first machine-learning model as an input, the ontology database including an association of the first machine-learning model and the second machine-learning model; and
in response to the evaluation parameter satisfying the threshold, updating the ontology database based on the first configuration.
18. The method of any one of claims 12 or 13, wherein the hardware is first hardware, and further including, in response to the evaluation parameter not satisfying the threshold:
identifying second hardware in the second configuration database;
generating a third configuration of the second hardware;
determining the evaluation parameter based on an execution of the workload by the second hardware in the third configuration; and
deploying the second hardware with the third configuration to execute the one or more machine-learning models to execute the workload.
19. The method of any one of claims 12 or 13, wherein the hardware is first hardware, and further including, in response to the evaluation parameter not satisfying the threshold:
determining one or more first portions of the first hardware to execute a first portion of the workload;
identifying second hardware in the first configuration database;
determining one or more second portions of the second hardware to execute a second portion of the workload; and
determining a third configuration based on a topology of the one or more first portions and the one or more second portions, the topology based on an output from the one or more first portions as an input to the one or more second portions.
20. The method of claim 19, wherein the first hardware and the second hardware are one of a central processor unit, a graphics processing unit, a digital signal processor, an Artificial Intelligence processor, a Neural Network processor, or a Field Programmable Gate Array.
21. The method of any one of claims 12 or 13, wherein the evaluation parameter is a first evaluation parameter, and further including:
outputting a reward function including the first evaluation parameter with a first weight and a second evaluation parameter with a second weight, the first weight greater than the second weight; and
in response to determining that at least one of the first evaluation parameter or the second evaluation parameter does not satisfy the threshold, adjusting at least one of the first configuration or the second configuration to at least one of increase the first evaluation parameter or decrease the second evaluation parameter.
22. The method of any one of claims 12 or 13, wherein the evaluation parameter is at least one of an accuracy, a cost, an energy consumption, a latency, a performance, or a throughput associated with at least one of the one or more machine-learning models or the hardware.
23. A machine readable medium including code that, when executed, cause a machine to perform the method of any one of claims 12-22.
| # | Name | Date |
|---|---|---|
| 1 | 202244039794-FORM 1 [11-07-2022(online)].pdf | 2022-07-11 |
| 2 | 202244039794-DRAWINGS [11-07-2022(online)].pdf | 2022-07-11 |
| 3 | 202244039794-DECLARATION OF INVENTORSHIP (FORM 5) [11-07-2022(online)].pdf | 2022-07-11 |
| 4 | 202244039794-COMPLETE SPECIFICATION [11-07-2022(online)].pdf | 2022-07-11 |
| 5 | 202244039794-FORM-26 [28-07-2022(online)].pdf | 2022-07-28 |
| 6 | 202244039794-FORM 3 [11-01-2023(online)].pdf | 2023-01-11 |
| 7 | 202244039794-FORM 3 [11-07-2023(online)].pdf | 2023-07-11 |
| 8 | 202244039794-FORM 3 [10-01-2024(online)].pdf | 2024-01-10 |
| 9 | 202244039794-FORM 18 [11-07-2025(online)].pdf | 2025-07-11 |