Sign In to Follow Application
View All Documents & Correspondence

Method And System For Non Intrusive Profiling Of High Level Synthesis (Hls) Based Applications

Abstract: ABSTRACT METHOD AND SYSTEM FOR NON-INTRUSIVE PROFILING OF HIGH-LEVEL SYNTHESIS (HLS) BASED APPLICATIONS State of the art techniques provide dedicated High-Level Synthesis (HLS) performance estimator tools that can give insights on performance bottlenecks, stall rate, stall cause etc., in HLS designs. These estimators often limit themselves to simple loop topologies and limited pragma use which makes them unreliable for large designs with complex datapaths. Embodiments herein provide a method and system for non-intrusive profiling for high-level synthesis HLS based applications. The method provides a cycle-accurate, fine-grained performance profiling framework that is non-intrusive and provides an end-to-end profile of the design. Such profiling tool can help the designer/DSE tool to quickly identify the performance bottlenecks and have a guided approach towards tuning it. [To be published with 1B]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
25 March 2022
Publication Number
39/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. SUMEET, Nupur
Tata Consultancy Services Limited, Yantra Park, Opp Voltas HRD Training Center, Subhash Nagar, Pokhran Road No. 2, Thane West 400601, Maharashtra, India
2. NAMBIAR, Manoj Karunakaran
Tata Consultancy Services Limited, Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West 400607, Maharashtra, India
3. KASHYAP, Deeksha
Tata Consultancy Services Limited, 4th & 5th Floor, PTI Building, 4 Parliament Street, New Delhi 110001, Delhi, India

Specification

Claims:We Claim: 1. A processor implemented method (200) for non-intrusive profiling of High-Level Synthesis (HLS) applications, the method comprising: synthesizing a design for a source code for an HLS based application, by one or more hardware processors, using an HLS compiler, in accordance with a synthesis time period, wherein the design is specified by the HLS compiler in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files (202); co-simulating the design of the plurality of HDL files, by the one or more hardware processors using a co-simulator, based on a test bench identified for the source code to generate a Value Change Dump (VCD) file comprising a plurality of HDL signals in the plurality of HDL files cycle-by-cycle and a corresponding plurality of HDL signal values for an entire execution time at every time instant with a plurality of commands (204); extracting structured information (206), by the one or more hardware processors, using an HLS profiler by: (a) parsing the VCD file to generate a timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; (b) parsing the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; (c) parsing the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; (d) parsing the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is one of a wire and a register; (e) parsing the source code to record a plurality of variables, a plurality of array variables, a precision data type status, and a multiplication status at a line number for each of a plurality of code lines of the source code; and (f) parsing the plurality of database files to link the initial name for each of the HDL signals to the plurality of variables in the source code; and analyzing, by the one or more hardware processors, using the HLS profiler, the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into cycle-by-cycle hardware execution of the source-code for entire HLS based application execution time to generate a performance profile table for the source code (208). 2. The method as claimed in claim 1, wherein the set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics and enable suppress invalid transitions getting captured in standard HLS profiling, wherein a first subset of rules among the set of associative rules enable obtaining status at the line number level and a second subset of rules among set of associative rules enable obtaining profile correctness in the performance profile table. 3. A system (100) for non-intrusive profiling for High-Level Synthesis (HLS) based applications, the system (100) comprising: a memory (102) storing instructions; one or more Input/Output (I/O) interfaces (106); and one or more hardware processors (104) coupled to the memory (102) via the one or more I/O interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to: synthesize a design for a source code for an HLS based application using an HLS compiler, in accordance with a synthesis time period, wherein the design is specified by the HLS compiler in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files; co-simulate the design of the plurality of HDL files using a co-simulator, based on a test bench identified for the source code to generate a Value Change Dump (VCD) file comprising a plurality of HDL signals in the plurality of HDL files cycle-by-cycle and a corresponding plurality of HDL signal values for an entire execution time at every time instant with a plurality of commands; extract structured information using an HLS profiler by: (a) parsing the VCD file to generate a timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; (b) parsing the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; (c) parsing the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; (d) parsing the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is one of a wire and a register; (e) parsing the source code to record a plurality of variables, a plurality of array variables, a precision data type status, and a multiplication status at a line number for each of a plurality of code lines of the source code; and (f) parsing the plurality of database files to link the initial name for each of the HDL signals to the plurality of variables in the source code; and analyze using the HLS profiler, the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into cycle-by-cycle hardware execution of the source-code for entire HLS based application execution time to generate a performance profile table for the source code. 4. The method as claimed in claim 1, wherein the set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics, and enable suppress invalid transitions getting captured in standard HLS profiling, wherein a first subset of rules among the set of associative rules enable obtaining status at the line number level and a second subset of rules among set of associative rules enable obtaining profile correctness in the performance profile table. Dated this 25th Day of March 2022 Tata Consultancy Services Limited By their Agent & Attorney (Adheesh Nargolkar) of Khaitan & Co Reg No IN-PA-1086 , Description:FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: METHOD AND SYSTEM FOR NON-INTRUSIVE PROFILING OF HIGH-LEVEL SYNTHESIS (HLS) BASED APPLICATIONS Applicant Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India Preamble to the description: The following specification particularly describes the invention and the manner in which it is to be performed. TECHNICAL FIELD [001] The embodiments herein generally relate to High-Level Synthesis (HLS) profiling and, more particularly, to a method and system for non-intrusive profiling of High-Level Synthesis (HLS) based applications. BACKGROUND [002] Performance profilers are software development tools designed to assist in performance analysis of applications and improve poorly performing sections of a code. They provide measurements on time taken by a routine to execute, proportion of total time spent on it, its parent routine etc. Performance profiling are quite common practices in software paradigm as matured profiling tools are available. However, it is not the case in hardware development, mainly Field Programmable Gate Arrays (FPGAs). FPGAs are becoming a popular choice as an application accelerator due to its support for deep pipelines as well as low latency realizations. The datapath based design in FPGAs is favorable to performance sensitive applications. The traditional hardware development encompasses coding the design in Hardware Description Languages (HDLs), such as Verilog and Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL) and define the Register Transfer Logic (RTL) datapath from input to output and achieve the desired functionality. A recent approach for hardware design development is through High Level Synthesis (HLS) tools. With HLS, the design development productivity improves as it supports high-level languages (C/C++) and the process of creating HDL description, defining RTL datapath, operation scheduling etc. are abstracted away from the developer. [003] The HLS development flow includes HLS compiler for generating HDL description of source code and include co-simulation for cycle-accurate functional analysis. It should be noted that HLS development requires an additional step of implementation to generate the FPGA executable. Although the HLS development flow provides a simplified, faster, and highly abstracted way for hardware design developments, often, better parallel or pipelined algorithms may be designed which are better suited to the FPGA architecture. The special directives (e.g., #pragma in Xilinx HLS tools) available in HLS tools help in design space exploration to improve the design micro-architecture and FPGA hardware matching, but their efficient use depends on the programming abilities and experience of the developer. However, as is the case in software design, the performance profile of hardware design can help identify the performance bottlenecks and aid the developer in fine-tuning the design performance. Vivado HLS™ and Intel HLS™ are popular HLS tools used in industry. These HLS tools provide the overall latency of the source code along with the cycle-count at the loop or sub-function level but the cycle accounting for every line of source code is not available. Though, it is possible to relate the synthesized HDL to clock cycles through waveforms but associating the source code statements to the synthesized HDL is not straight-forward, since a single C statement can be expressed as multiple HDL signals triggering one another while being spread across multiple clock cycles. [004] The missing correspondence between source-code and performance information can be addressed by line-by-line profiling. This concept is not new and is available in software through tools like Gprof™. The performance profilers can outline how much time has been spent in each line of code. However, this information is difficult to obtain for highly-parallel superscalar-like architectures. Additionally, it is normal for one line of source code to translate into many lines of assembly instructions, that can be scheduled with out of order parallelism. Depending on the ISA scheduling, it is possible that more than one lines of code are executing in a single clock cycle. However, works in literature indicate that existing profiling tools are intrusive in nature and often introduce performance overheads. This is because of the additional profiling instructions that are added in the compiled application code to collect the required data. Apart from profiling tools, use of ‘print’ statements in software is a common method used to collect profile data. [005] On hardware platforms like FPGA, HLS_Print framework™ can be used to derive profiling information. Similar to other profiling framework, HLS_Print is intrusive in nature and introduces additional circuitry in the application. Moreover, it requires the time-consuming implementation followed by programming the FPGA and application test on hardware. On hardware platforms like FPGA, this is a step can be avoided since cycle accurate simulators are available. The design signal waveforms can be viewed in the simulator to estimate the profile of the application. The signals are part of the RTL design that is generated by the HLS compiler. SUMMARY [006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. [007] For example, in one embodiment, a method for non-intrusive profiling of High-Level Synthesis (HLS) based applications is provided. The method includes synthesizing a design for a source code for an HLS based application using an HLS compiler, in accordance with a synthesis time period, wherein the design is specified by the HLS compiler in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files. Further, the method includes co-simulating the design of the plurality of HDL files using a co-simulator, based on a test bench identified for the source code to generate a Value Change Dump (VCD) file comprising a plurality of HDL signals in the plurality of HDL files cycle-by-cycle and a corresponding plurality of HDL signal values for an entire execution time at every time instant with a plurality of commands. Further, the method includes extracting structured information using an HLS profiler by: (a)parsing the VCD file to generate a timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; (b) parsing the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; (c) parsing the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; (d) parsing the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is one of a wire and a register; (e) parsing the source code to record a plurality of variables, a plurality of array variables, a precision data type status, and a multiplication status at a line number for each of a plurality of code lines of the source code; and (f) parsing the plurality of database files to link the initial name for each of the HDL signals to the plurality of variables in the source code. Furthermore, the method includes analyzing using the HLS profiler, the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into cycle-by-cycle hardware execution of the source-code for entire HLS based application execution time to generate a performance profile table for the source code. The set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics and enable suppress invalid transitions getting captured in standard HLS profiling. [008] In another aspect, a system for non-intrusive profiling of High-Level Synthesis (HLS) based applications is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to synthesize a design for a source code for an HLS based application using an HLS compiler, in accordance with a synthesis time period, wherein the design is specified by the HLS compiler in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files. Further, the one or more hardware processors are configured to co-simulating the design of the plurality of HDL files using a co-simulator, based on a test bench identified for the source code to generate a Value Change Dump (VCD) file comprising a plurality of HDL signals in the plurality of HDL files cycle-by-cycle and a corresponding plurality of HDL signal values for an entire execution time at every time instant with a plurality of commands. Further, the one or more hardware processors are configured to extract structured information using an HLS profiler by: (a)parsing the VCD file to generate a timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; (b) parsing the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; (c) parsing the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; (d) parsing the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is one of a wire and a register; (e) parsing the source code to record a plurality of variables, a plurality of array variables, a precision data type status, and a multiplication status at a line number for each of a plurality of code lines of the source code; and (f) parsing the plurality of database files to link the initial name for each of the HDL signals to the plurality of variables in the source code. Furthermore, the one or more hardware processors are configured to analyze using the HLS profiler, the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into cycle-by-cycle hardware execution of the source-code for entire HLS based application execution time to generate a performance profile table for the source code. The set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics and enable suppress invalid transitions getting captured in standard HLS profiling. [009] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for non-intrusive profiling of High-Level Synthesis (HLS) based applications. The method includes synthesizing a design for a source code for an HLS based application using an HLS compiler, in accordance with a synthesis time period, wherein the design is specified by the HLS compiler in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files. Further, the method includes co-simulating the design of the plurality of HDL files using a co-simulator, based on a test bench identified for the source code to generate a Value Change Dump (VCD) file comprising a plurality of HDL signals in the plurality of HDL files cycle-by-cycle and a corresponding plurality of HDL signal values for an entire execution time at every time instant with a plurality of commands. Further, the method includes extracting structured information using an HLS profiler by: (a)parsing the VCD file to generate a timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; (b) parsing the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; (c) parsing the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; (d) parsing the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is one of a wire and a register; (e) parsing the source code to record a plurality of variables, a plurality of array variables, a precision data type status, and a multiplication status at a line number for each of a plurality of code lines of the source code; and (f) parsing the plurality of database files to link the initial name for each of the HDL signals to the plurality of variables in the source code. Furthermore, the method includes analyzing using the HLS profiler, the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into cycle-by-cycle hardware execution of the source-code for entire HLS based application execution time to generate a performance profile table for the source code. The set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics and enable suppress invalid transitions getting captured in standard HLS profiling. [0010] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. BRIEF DESCRIPTION OF THE DRAWINGS [0011] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: [0012] FIG. 1A is a functional block diagram of a system, for non-intrusive profiling of High-Level Synthesis (HLS) based applications, in accordance with some embodiments of the present disclosure. [0013] FIG. 1B depicts an overview of the system of FIG. 1A, in accordance with some embodiments of the present disclosure. [0014] FIG. 2 is a flow diagram illustrating a method for non-intrusive profiling of High-Level Synthesis (HLS) based applications, using the system of FIG. 1B, in accordance with some embodiments of the present disclosure. [0015] FIGS. 3A and 3B (collectively referred as FIG. 3) depict an architecture of an HLS profiler of the system of FIG. 1B, in accordance with some embodiments of the present disclosure. [0016] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. DETAILED DESCRIPTION OF EMBODIMENTS [0017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. [0018] Code profiling entails dynamic program analysis that captures code execution statistics including space (memory) or time complexity, frequency, and duration of function calls etc. The profiling information aids in program optimization and is commonly achieved by instrumenting either the program source code or its binary executable form using profiler tools. The output of the profiler tools can be a statistical summary of the code with profiling data (for example #of times a line of code is executed) annotated against the source code. On the other hand, performance issues in parallel programs often depend on the time relationship of events and thus require a full trace of code execution. Profiler tools work on the principle of sampling and instrumentation where the former supports low-profiling granularity as compared to the latter. A sampling profiler probes the program call stack at regular intervals using interrupts. Sampling profiles are less numerically accurate and specific but allow the target program to run at near full speed. Instrumentation technique effectively adds instructions to the target program to collect the profiling information. However, code instrumenting can cause performance changes, and may in some cases lead to inaccurate results. The instrumentation can be added manually or automatically at the source-code, intermediate-code or compiled executable level. Performance profile is an important instrument to efficiently explore the design space and systematically improve its performance. The challenges in relating the HLS code to a synthesized RTL design acts as a deterrent in analyzing the implementation performance when it comes to non-intrusive analysis. Moreover, control and branching statements add to the difficulty in performance evaluation based only on static information. [0019] Thus, embodiments herein provide a non-intrusive profiling of High-Level Synthesis (HLS) based applications. The method combines static information with hardware signal waveforms to come up with approaches that could be used to profile the application actual execution on Field Programmable Gate Arrays (FPGA). The method formulates useful association rules that could help relate hardware signals and waveforms to the source code variables and line numbers. The association rules when incorporated into HLS profiler correctly profile all the diverse applications. [0020] Referring now to the drawings, and more particularly to FIGS. 1A through 3B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. [0021] FIG. 1A is a functional block diagram of a system 100, for non-intrusive profiling of High-Level Synthesis (HLS) based applications, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100. [0022] Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like. [0023] The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting to a number of external devices or to another server or devices. [0024] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. [0025] Further, the memory 102 includes a plurality of modules (not shown) and a database 108 and the like. The database 108, may also store generated performance profile tables for a given source code. Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Further, the system 100 includes functional blocks such as an HLS compiler 110, a co-simulator 112, and an HLS profiler 114. [0026] FIG. 1B depicts an overview of the system 100 of FIG. 1A with process flow between the HLS compiler 110, the co-simulator 112, the HLS profiler 114, in accordance with some embodiments of the present disclosure. As part of HLS synthesis, the HLS compiler 110, can be implemented via any available HLS compiler tools and platforms, which converts a source code of an HLS-based application, alternatively refereed herein after as application, developed in high-level language to low-level HDL implementation and by following three steps. 1) Scheduling 2) Binding, and 3) Control extraction. The logic operations are distributed through the clock cycles in scheduling and the number of such operations depends on the clock frequency, optimization directives and FPGA technology library. Binding assigns hardware resources for carrying out the logic operation which are scheduled using a state machine by the control extraction step. [0027] The co-simulator 112 provides a C/RTL co-simulation that uses C test bench to automatically verify the RTL design. The HLS compiler 110 generates the input test vectors based on the C test bench and uses them for RTL simulation of the synthesized RTL. The RTL simulation output is stored as output vectors that are verified for correctness by the C test bench. The designer can review generated waveform from C/RTL co-simulation using a Wave Viewer to analyze the temporal changes of design RTL signals. The temporal changes of RTL signals can also be captured in form of a Value Change Dump (VCD) report. The VCD report contains the value of RTL or HDL signals at every time-stamps for entire simulation duration. The C/RTL co-simulation denotes the dynamic behavior of the design and depends on run-time value of the design variables. [0028] To support profiling in HLS-based designs, the HLS profiler 114, also referred to as profiler framework, provides an automated and non-intrusive performance profiling tool that generates a cycle-by-cycle association to every line of source code (SC) for an entire HLS based application execution time. The HLS- profiler 114 based approach for profiling collects waveforms, source code and HDL files from HLS development flow along with verbose binding files, test bench, database files such as ‘.adb’ files, and synthesis report files and translates it to profiling information as shown in FIG. 1B. Profiler framework is based on static analysis and dynamic trace, available from the HLS compilers, that is used along with associative rules to generate cycle-accurate profiles. The HLS compiler 110 generates the RTL equivalent of the C source code and represent the source code variable and RTL or HDL signal association in the form of static information. The dynamic behavior captured in form of RTL waveforms do not exhibit direct relation to C source code variables. This disconnect and technical limitation observed in the existing HLS development tools in the art acts as a deterrent to performance fine-tuning of HLS designs. Unlike the existing profiling tools, the HLS profiler 114 disclosed herein associates every line of C-source code to a clock cycle. The HLS profiler 114 uses reports generated by the HLS compiler 110 during synthesis and co-simulation and presents a fine-grained performance profile table of designs developed using HLS development tools. In addition to HLS compiler 110 generated reports, the HLS profiler114 makes use of a set of associative rules to suppress invalid transitions that were otherwise getting captured in the performance profile. [0029] Functions of the components of the system 100 are explained in conjunction with flow diagram of FIG. 2 and example of FIG. 3. [0030] FIG. 2 is a flow diagram illustrating a method 200 for non-intrusive profiling of High-Level Synthesis (HLS) based applications, using the system of FIG. 1, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 104 using the HLS compiler 110, co-simulator 112, and the HLS profiler 114. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1A and 1B and the steps of flow diagram as depicted in FIG. 2, and an HLS profiler architecture of the system 100 of FIG. 1 as depicted in FIGS. 3A and 3B (collectively referred as FIG. 3). Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously. [0031] Referring to the steps of the method 200, at step 202 of the method 200, the one or more hardware processors 104, using the HLS compiler 110, synthesize a design (RTL design) for a source code received for an HLS based application (application), in accordance with a synthesis time period. The source can be a C-source code. The design is specified by the HLS compiler 110 in terms of a plurality of hardware description language (HDL) files, a synthesis report, a verbose binding report and a plurality of database files such as ‘.adb’ files. [0032] At step 204 of the method 200, the one or more hardware processors 104, using the co-simulator 112, co-simulate the design for the plurality of HDL files,. The co-simulator utilizes the test bench (for example a C test bench) identified for the source code to generate the Value Change Dump (VCD) file comprising a plurality of HDL signals in the HDL files cycle-by-cycle and a corresponding plurality HDL signal values for entire execution time at every time instant with a plurality of commands. [0033] At step 206 of the method 200, the one or more hardware processors 104, using the HLS profiler 114, extract structured information by parsing the plurality of hardware description language (HDL) files, the synthesis report, the verbose binding report, and the plurality of .adb files as explained in steps (a) to (f) below. a) Parse the VCD file to generate timing information for the plurality of HDL signals from the plurality of HDL files with the corresponding plurality of HDL signal values; b) Parse the synthesis report to generate a plurality of module names to which the plurality of HDL signals belong and a corresponding plurality of source code functions; c) Parse the verbose binding report to link an initial name to an HDL signal name for each of the plurality of HDL signals; d) Parse the plurality of HDL files to obtain whether the HDL signal name, associated with each of the plurality of HDL signals, is a wire or register; e) Parse the source code to record a plurality of variables, a plurality of array variables, precision data type status, and multiplication status at a line number for each of a plurality of code lines of the source code; f) Parse the plurality of .adb files to link the initial name for each of the HDL signals to the plurality of variables in the source code. [0034] At step 208 of the method 200, the one or more hardware processors 104, using the HLS profiler 114, analyze the extracted structured information in accordance with one or more rules from a set of associative rules to define associations between the line number of the source code, the plurality of variables, the HDL signal name for each of the plurality of HDL signals and the corresponding plurality of HDL signal values providing visibility into the cycle-by-cycle hardware execution of the source-code for entire application execution time to generate the performance profile table for the source code. The set of associative rules are based on source code semantics, parsed output corresponding to the extracted structured information and HDL design semantics and enable suppress invalid transitions that get captured in standard HLS profiling. The set of associative rules comprise a first subset of rules among the set of associative rules that enable obtaining status at the line number level and a second subset of rules among set of associative rules that enable obtaining profile correctness in the performance profile table. [0035] Once the performance profile table is generated it can be shared with a developer, who can tune the performance of the design using HLS profiler 114 using following steps as listed below: Inputs: Source code, Test Bench, Synthesis Frequency and Top Function Name 1. Run the HLS_Profiler script. 2. Performance profile of the source code is available in a text file. 3. Identify, optimize bottleneck, and update source code with appropriate pragma directive. 4. Repeat Steps 1-3 till performance target is met. [0036] FIGS. 3A and 3B (collectively referred as FIG. 3) depict the architecture of the HLS profiler 114 shown in the system 100 of FIG. 1, in accordance with some embodiments of the present disclosure. Inputs: Source code, test bench, synthesis time period Output: Cycle-by-cycle performance profile STEP 1: Create HLS project using HLS development tools, Xilinx Vivado™ HLS or Xilinx Vitis™ HLS. Add source code and test bench to the project. STEP 2: Synthesize the design. Synthesis takes the source code and synthesis time period as inputs and generates the HDL (Verilog and VHDL) files of the source code. Synthesis also generates synthesis report, verbose binding and .adb files. STEP 3: Co-simulate the design. Co-simulation takes the HDL files of the source code and test bench as input and generates the timing behavior of all the signals in the HDL files of the source code cycle-by-cycle. The co-simulation information is dumped in the form of VCD (value change dump) file that contains all HDL signals from all HDL files and their signal value for entire execution time at every time instant with the following commands. 1. restart 2. open_vcd 3. log_vcd 4. run #entire execution time(ns) 5. close_vcd STEP 4: Run VCDVCD parser, which is an open source file processing tool. VCDVCD parser takes the VCD file as input and generates the timing information for all HDL signals from all HDL files with their values. It also generates the module name to which HDL signals belong. The module is equivalent to source code function (Table 1). This is more like a data processing step for VCD file to give it a certain structure required by following processing steps. Time Instant Module Name HDL signal name HDL signal value Table 1: depicts table format STEP 5: Parse synthesis report to generate Table 2 with columns ‘module name’ and ‘source code function’ by searching ‘instance’ keyword in ‘Utilization Estimates’ section of synthesis report. Module is equivalent of Source code function in HDL. Module Name Source code function Table 2: depicts table format STEP 6: Merge Table 1 and Table2 by comparing Table1.Module Name and Table2.Module Name to create Table 3. In addition to Table 2 data, Table 3 contains source code function name for all HDL signals. Time Instant Module Name Source code function HDL signal name HDL signal value Table 3: depicts format STEP 7: The signal transition records is deleted from Table 3 that occur at time instances other than rising edge of clock. For this, the “ap_clk” HDL signal is used. At the rising edge of clock, “ap_clk” HDL signal value is 1 else it is 0. Records with Signal transitions at the clock instance when ap_clk HDL signal is 0 are deleted from Table 3. For the retained records, the time instants are converted to clock cycles number. For this, the time period provided during synthesis is used and the following formula. Clock cycle # = (((time instant/(synthesis time period/2))-1)/2)+1 This clock cycle # information is added to the Table 3 entries as shown below. Time Instant Clock cycle # Module Source code function HDL signal name HDL signal value Table 3: format (updated with clock cycle #) STEP 8: The verbose binding report contains design states and sequence of operations active in a state. The sequence of operations are distributed among states (bind_states) by the HLS compiler during synthesis. The operation record contains the target signal (in the form of initial name of HDL signal), the operator type, operands, predicate, and line numbers. The operands are the inputs of the operation and are available in the form of initial names of HDL signals. The predicate indicates the condition that needs to be satisfied for the target signal to attain the output of operation. The predicate contains either “TRUE” or a signal expression. When the predicate is “TRUE” or signal expression evaluates as “TRUE”, the output of operation gets assigned to target signal otherwise the target signal is not updated. Line number indicates the source code line number for the target signal. Parse the verbose binding report to record the target signals belonging to a particular state (state_bind) and their predicate, operation, operands, and line numbers. The signals here are indicated as initial_names and do not directly correspond to HDL signal names. This information is presented in tabular format (Table 4). state_bind # initial_name operator operands predicate Line number 2 i add i_0,1 true 3 Table 4: format and an example. STEP 9: Parse the verbose binding report to link the initial_name to HDL signal name. For every initial_name the verbose binding report contains the HDL signal name. This information is presented in tabular format (Table 5). Initial_name HDL signal name i i_fu_142 Table 5: format and an example. STEP 10: Merge Table 4 and Table 5 to generate Table 6. In addition to Table 4 entries, Table 6 contains HDL signal name for every initial name. state_bind # HDL signal name initial_name operator operands predicate Line number 2 i_fu_142 i add i_0,1 true 3 Table 6: format and an example. STEP 11: Parse the HDL file to obtain whether HDL signal name is a wire or register. The ‘type’ information is added into Table 6 to generate Table 7. state_bind HDL signal name type (wire/reg) initial_name operator operands predicate Line number 2 i_fu_142 wire i add i_0,1 true 3 Table 7: format and an example STEP 12: Parse .adb file to link initial_names to source code variables. This information is presented in tabular format in Table 8. Initial_name Source code variable i i Table 8: format and an example STEP 13: Merge Table 7 and Table 8 to generate Table 9. The source code variable information is added into Table 7 to generate Table 9. state_bind HDL signal name type (wire/reg) initial_name Source code variable operator operands predicate Line number 2 i_fu_142 wire i i add i_0,1 true 3 Table 9: format and an example STEP 14: Parse the source code to record the target source code variables, array variables, precision data type status and multiplication status at each line along with line numbers in Table 10. Target source code variable holds the final variable at every line. Array variables stores the source code variables whose data structure is array and present at right hand side of assignment (‘=’) operator. At any line, if square bracket i.e. [] is followed by any source code variable, then that source code variable is array, square brackets may be empty, or they might contain the size of array as some constant or variables. If no array is present at any line then array variables hold ‘NA’ i.e., non-available value. Precision data type status stores the status of lines i.e., whether the line includes variables with precision data type or not. If source code contains #ap_int.h or #ap_fixed.h library then assign precision_data.value = 1 else precision_data.value = 0. If precision_data.value == 1 and any variable on any Line number is declared as precision data type i.e., of type ap_int or ap_fixed (and not standard data type (int, short, char, double, float)) then precision data type status on that line is TRUE else it is FALSE. Multiplication status holds TRUE when multiplication operation i.e., ‘*” operator is used at any line else it is FALSE. If ‘if/else’ condition is present in source code i.e., ‘if’ keyword at any line, then assign if/else.value = 1 or if/else.value = 0. If multiple functions are defined in source code and Table 2 has more than 1 entry then assign mult_func.value =1 else mult_func.value=0. mult_func.value=0 if/else.value = 0 precision_data.value = 0 Line # Line source code variable array variables precision data type status Multiplication status 3 outer:for(i=0;iApply RULE 13, and Check if mult_func.value obtained at STEP 14 is 1 --> Apply RULE 14, otherwise proceed to STEP 28. STEP 28: Repeat STEP 25-27, for all HDL signals in state_bind. STEP 29: Populate retained record in performance table (Table 16) with clock cycle#, Line number, source code variable and HDL signal value entries from Table 15 . Clock cycle # Line number Source code variable HDL signal value Table 16: format STEP 30: Repeat STEP 22-STEP29 to generate the entire performance profile in tabular format i.e., for all HDL signals for all states. Keep populating Table 16 for entire execution time. [0038] SECOND SUBSET OF ASSOCIATIVE RULES (to obtain profile correctness): RULE 7: For any line number if the Line number availability is ‘avail’ and status is ‘incorrect’. These are the cases when line numbers other than line numbers present in Table 11 come up in Table 9. In such cases, unwanted line numbers from Table 9 are suppressed by deleting those line numbers from Table9.line number column. RULE 8: For any line number/ source code variable present in Table 11, if line number/ source code variable availability is ‘NA’, status is ‘NA’ and precision_data.value = 1 then the Line number/ source code variable can be generated through code transformation parser (discussed in STEP R8.1-R8.6). Code-transformation parser steps: STEP R8.1: If source code contains #ap_int.h or #ap_fixed.h, then replace the arbitrary precision datatype of all variables with converted data types based on Table17: Arbitrary precision Datatype Converted data type ap_uint ; 0; 8 ;16 ; 0; 8 ;16 ;32; 0; 32; 0; 32

Documents

Application Documents

# Name Date
1 202221017111-STATEMENT OF UNDERTAKING (FORM 3) [25-03-2022(online)].pdf 2022-03-25
2 202221017111-REQUEST FOR EXAMINATION (FORM-18) [25-03-2022(online)].pdf 2022-03-25
3 202221017111-FORM 18 [25-03-2022(online)].pdf 2022-03-25
4 202221017111-FORM 1 [25-03-2022(online)].pdf 2022-03-25
5 202221017111-FIGURE OF ABSTRACT [25-03-2022(online)].jpg 2022-03-25
6 202221017111-DRAWINGS [25-03-2022(online)].pdf 2022-03-25
7 202221017111-DECLARATION OF INVENTORSHIP (FORM 5) [25-03-2022(online)].pdf 2022-03-25
8 202221017111-COMPLETE SPECIFICATION [25-03-2022(online)].pdf 2022-03-25
9 202221017111-Proof of Right [13-05-2022(online)].pdf 2022-05-13
10 202221017111-FORM-26 [23-06-2022(online)].pdf 2022-06-23
11 Abstract1.jpg 2022-07-26
12 202221017111-Power of Attorney [09-01-2023(online)].pdf 2023-01-09
13 202221017111-Form 1 (Submitted on date of filing) [09-01-2023(online)].pdf 2023-01-09
14 202221017111-Covering Letter [09-01-2023(online)].pdf 2023-01-09
15 202221017111-CORRESPONDENCE(IPO)-(CERTIFIED COPY WIPO DAS)-(16-01-2023).pdf 2023-01-16
16 202221017111-FORM 3 [21-07-2023(online)].pdf 2023-07-21
17 202221017111-FER.pdf 2025-03-19
18 202221017111-Information under section 8(2) [17-06-2025(online)].pdf 2025-06-17
19 202221017111-FORM 3 [17-06-2025(online)].pdf 2025-06-17
20 202221017111-PETITION UNDER RULE 137 [22-08-2025(online)].pdf 2025-08-22
21 202221017111-OTHERS [22-08-2025(online)].pdf 2025-08-22
22 202221017111-FER_SER_REPLY [22-08-2025(online)].pdf 2025-08-22
23 202221017111-CLAIMS [22-08-2025(online)].pdf 2025-08-22
24 202221017111-ORIGINAL UR 6(1A) FORM 26-250825.pdf 2025-09-01

Search Strategy

1 searchdoc-11E_01-07-2024.pdf