Abstract: Various embodiments are generally directed to techniques to detect fusible operators with machine learning, such as by evaluating a set of operators in a graph of a machine learning model to identify fusion candidates comprising subgraphs of the graph with two or more operators to combine, for instance. Some embodiments are particularly directed to utilizing a machine learning classifier to evaluate fusion candidates using a set of features of the fusion candidate.
1. An apparatus, comprising:
a processor; and
a memory comprising instructions that when executed by the processor cause the processor to:
identify input comprising one or more machine learning models that each include a graph of operators;
mine the one or more machine learning models based on one or more operational parameters to determine one or more fusion candidates, each of the one or more fusion candidates comprising a subgraph of at least one graph of operators, wherein each subgraph includes two or more operators;
extract a feature set from each of the one or more fusion candidates; utilize a machine learning classifier to evaluate the one or more fusion candidates based on the feature sets extracted from each of the one or more fusion candidates; and provide, as output, a proposed candidate of the one or more fusion candidates to fuse based on evaluation of the one or more fusion candidates.
2. The apparatus of claim 1, the memory comprising instructions that when executed by the processor cause the processor to combine each operator in the subgraph of the proposed candidate to fuse the proposed candidate into a fused candidate.
3. The apparatus of claim 2, the memory comprising instructions that when executed by the processor cause the processor to evaluate computational efficiency of a first machine learning model with the proposed candidate and a second machine learning model with the fused candidate to validate the proposed candidate.
4. The apparatus of claim 3, the memory comprising instructions that when executed by the processor cause the processor to utilize compiler stacks to evaluate computational efficiency of the first and second machine learning models.
5. The apparatus of claim 3, the memory comprising instructions that when executed by the processor cause the processor to utilize a tensor virtual machine (TVM) to evaluate computational efficiency of the first and second machine learning models.
6. The apparatus of claim 1, the machine learning model comprising a deep neural network (DNN) model and each operator includes a layer in the DNN model.
7. The apparatus of claim 1, the memory comprising instructions that when executed by the processor cause the processor to rank each of the one or more fusion candidates based on the feature sets to identify the proposed candidate.
8. The apparatus of claim 1, wherein the feature set includes the one or more operational parameters.
9. The apparatus of claim 1, wherein the one or more operational parameters include one or more of a frequency of utilization, a computational cost, and a memory cost.
10. The apparatus of claim 1, the memory comprising instructions that when executed by the processor cause the processor to utilize weighted frequent subgraph mining to mine the one or more machine learning models based on the one or more operational parameters to determine the one or more fusion candidates.
11. The apparatus of claim 10, the memory comprising instructions that when executed by the processor cause the processor to generate an edge weight metric based on the one or more operational parameters to mine the one or more machine learning models.
12. The apparatus of claim 1, each feature set comprising one or more core features and one or more uncore features.
13. The apparatus of claim 12, the core features comprising one or more of instructions retired, elapsed core clock ticks, core frequency, L2 cache hits and misses, and L3 cache hits and misses.
14. The apparatus of claim 12, the uncore features comprising one or more of read bytes from memory controllers, bytes written to memory controllers, and data traffic transferred via interconnect links.
15. At least one non-transitory computer-readable medium comprising a set of instructions that, in response to being executed by a processor circuit, cause the processor circuit to:
identify input comprising one or more machine learning models that each include a graph of operators;
mine the one or more machine learning models based on one or more operational parameters to determine one or more fusion candidates, each of the one or more fusion candidates comprising a subgraph of at least one graph of operators, wherein each subgraph includes two or more operators;
extract a feature set from each of the one or more fusion candidates;
utilize a machine learning classifier to evaluate the one or more fusion candidates based on the feature sets extracted from each of the one or more fusion candidates; and
identify a proposed candidate of the one or more fusion candidates to fuse based on evaluation of the one or more fusion candidates.
16. The at least one non-transitory computer-readable medium of claim 15, comprising
instructions that, in response to being executed by the processor circuit cause the
processor circuit to utilize a performance counter monitor (PCM) to extract the
feature sets.
17. The at least one non-transitory computer-readable medium of claim 15, wherein each feature set includes indications of one or more of data movement patterns, computation patterns, system resource utilization, frequency, computation cost, and memory cost.
18. The at least one non-transitory computer-readable medium of claim 15, the machine learning classifier comprising a recurrent neural network (RNN).
19. The at least one non-transitory computer-readable medium of claim 18, comprising instructions that, in response to being executed by the processor circuit cause the processor circuit to map the feature sets to vectors corresponding to fusibility.
20. The at least one non-transitory computer-readable medium of claim 19, comprising instructions that, in response to being executed by the processor circuit cause the processor circuit to calculate a probability that each fusion candidate is fusible with the vectors corresponding to fusibility.
21. A computer-implemented method, comprising:
identifying input comprising one or more machine learning models that each include a graph of operators;
mining the one or more machine learning models based on one or more operational parameters to determine one or more fusion candidates, each of the one or more fusion candidates comprising a subgraph of at least one graph of operators, wherein each subgraph includes two or more operators;
extracting a feature set from each of the one or more fusion candidates;
utilizing a machine learning classifier to evaluate the one or more fusion candidates based on the feature sets extracted from each of the one or more fusion candidates; and
identifying a proposed candidate of the one or more fusion candidates to fuse based on evaluation of the one or more fusion candidates.
22. The computer-implemented method of claim 21, comprising combining each operator in the subgraph of the proposed candidate to fuse the proposed candidate into a fused candidate.
23. The computer-implemented method of claim 22, comprising evaluating computational efficiency of a first machine learning model with the proposed candidate and a second machine learning model with the fused candidate to validate the proposed candidate.
24. An apparatus, comprising:
means for identifying input comprising one or more machine learning models that each include a graph of operators;
means for mining the one or more machine learning models based on one or more operational parameters to determine one or more fusion candidates, each of the one or more fusion candidates comprising a subgraph of at least one graph of operators, wherein each subgraph includes two or more operators;
means for extracting a feature set from each of the one or more fusion candidates;
means for utilizing a machine learning classifier to evaluate the one or more fusion candidates based on the feature sets extracted from each of the one or more fusion candidates; and
means for identifying a proposed candidate of the one or more fusion candidates to fuse based on evaluation of the one or more fusion candidates.
25. The apparatus of claim 24, comprising means for utilizing weighted frequent
subgraph mining to mine the one or more machine learning models based on the one
or more operational parameters to determine the one or more fusion candidates.
| # | Name | Date |
|---|---|---|
| 1 | 202047053320-PROOF OF RIGHT [08-12-2020(online)].pdf | 2020-12-08 |
| 2 | 202047053320-FORM 1 [08-12-2020(online)].pdf | 2020-12-08 |
| 3 | 202047053320-DRAWINGS [08-12-2020(online)].pdf | 2020-12-08 |
| 4 | 202047053320-DECLARATION OF INVENTORSHIP (FORM 5) [08-12-2020(online)].pdf | 2020-12-08 |
| 5 | 202047053320-COMPLETE SPECIFICATION [08-12-2020(online)].pdf | 2020-12-08 |
| 6 | 202047053320-FORM-26 [16-02-2021(online)].pdf | 2021-02-16 |
| 7 | 202047053320-FORM 3 [07-06-2021(online)].pdf | 2021-06-07 |
| 8 | 202047053320.pdf | 2021-10-18 |
| 9 | 202047053320-abstract.jpg | 2021-10-18 |
| 10 | 202047053320-FORM 3 [08-12-2021(online)].pdf | 2021-12-08 |
| 11 | 202047053320-FORM 18 [13-04-2022(online)].pdf | 2022-04-13 |
| 12 | 202047053320-FER.pdf | 2022-08-24 |
| 13 | 202047053320-Information under section 8(2) [22-12-2022(online)].pdf | 2022-12-22 |
| 14 | 202047053320-FORM 3 [22-12-2022(online)].pdf | 2022-12-22 |
| 15 | 202047053320-Proof of Right [08-02-2023(online)].pdf | 2023-02-08 |
| 16 | 202047053320-PETITION UNDER RULE 137 [24-02-2023(online)].pdf | 2023-02-24 |
| 17 | 202047053320-OTHERS [24-02-2023(online)].pdf | 2023-02-24 |
| 18 | 202047053320-FER_SER_REPLY [24-02-2023(online)].pdf | 2023-02-24 |
| 19 | 202047053320-CLAIMS [24-02-2023(online)].pdf | 2023-02-24 |
| 20 | 202047053320-FORM 3 [13-09-2023(online)].pdf | 2023-09-13 |
| 21 | 202047053320-FORM 3 [13-03-2024(online)].pdf | 2024-03-13 |
| 22 | 202047053320-US(14)-HearingNotice-(HearingDate-10-12-2024).pdf | 2024-11-18 |
| 23 | 202047053320-Correspondence to notify the Controller [18-11-2024(online)].pdf | 2024-11-18 |
| 24 | 202047053320-Information under section 8(2) [20-12-2024(online)].pdf | 2024-12-20 |
| 25 | 202047053320-FORM 3 [20-12-2024(online)].pdf | 2024-12-20 |
| 26 | 202047053320-Written submissions and relevant documents [24-12-2024(online)].pdf | 2024-12-24 |
| 27 | 202047053320-Annexure [24-12-2024(online)].pdf | 2024-12-24 |
| 28 | 202047053320-PatentCertificate30-01-2025.pdf | 2025-01-30 |
| 29 | 202047053320-IntimationOfGrant30-01-2025.pdf | 2025-01-30 |
| 1 | 202047053320E_23-08-2022.pdf |