Sign In to Follow Application
View All Documents & Correspondence

Method And System For Processing In Interconnect

Abstract: The present disclosure provides a method and system for processing in interconnect to ensure energy efficient computation architectures. As such, an interconnect uses time-encoding of pulses for executing instructions and thereby, exploiting the interconnect for computation and storage. The interconnect performs time encoding of pulses based on input data and a plurality of operations may be performed on such time encoded pulses using delay elements. The proposed interconnect employs a time-to-event margin propagation (TEMP) method, which applies the margin-propagation (MP) methodology for time-based representation and processing-in-interconnect. In general, the TEMP based computing uses a Time-To-First-Spike (TTFS) encoding where the occurrence of an event (spike or pulse) encodes the magnitude of a variable. Accordingly, the MP methodology naturally lends itself to event-based computation, such as, sorting, temporal difference and causality as non-linearity to implement the following computational primitives: addition, subtraction, thresholding, memory, sorting, etc. Figure 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
11 August 2022
Publication Number
33/2023
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2024-06-03
Renewal Date

Applicants

INDIAN INSTITUTE OF SCIENCE
CV Raman Road, Bengaluru, Karnataka 560012, India

Inventors

1. CHETAN SINGH THAKUR
CV Raman Road, Bengaluru, Karnataka 560012, India
2. SHANTANU CHAKRABARTTY
715, Westwood drive Apt 2W, Clayton, Missouri USA-63105

Specification

DESC:TECHNICAL FIELD

[0001] The present disclosure relates to an interconnect architecture for computing and storage functions and more specifically, to a method and a system for computing using an interconnect for providing efficient computing architectures.

BACKGROUND

[0002] In recent time, computer architectures, for example analog computing and VSLI, are facing major challenges with executing a high number of operations and large amount of data to be processed. More specifically, complex applications that involve artificial intelligence (AI). different types of Neural Networks (NN) techniques, Machine learning (ML) techniques involve processing large amounts of data and performing numerous computations making the existing computer architectures incapable of delivering the required performance. The complexity in performing such processing is a direct consequence of well-known limitations such as, limited memory bandwidth, latency, power dissipation, chip architecture etc.
[0003] Conventionally, computer architectures include a Central Processing Unit (CPU) at the core with a storage and input/output interface that facilitates the CPU to access the storage for performing computations or any computational related work. Figure 1A illustrates one such exemplary conventional computing architecture 100. The core processing element, the CPU 102 (also referred to herein as ‘Processing Element 102’ or ‘PE 102’) are generally not co-located with the memory 106. As a result, for example, neural network implementations or other advanced computing implementations on conventional computation architecture need to repeatedly fetch parameters stored in the memory 106 over an interconnect 104 to be processed by the PE 102. As such, the interconnect 104 between the PE 102 and the memory 106 plays a passive role or has no role in the computation and/or storage. Thus, the performance of the computing architecture 100 is restricted by the memory bandwidth. In general, such exemplary conventional computing architectures such as, the computing architecture 100 illustrated in Figure 1 face challenges in power dissipation and latency while implementing neural networks or other advanced algorithms.
[0004] To mitigate such bottleneck encountered in the computing architectures, in an exemplary computer/computing architecture as illustrated in Figure 1B, for example neural network hardware accelerators adopt a compute-in-memory (CIM) architecture 150. In this type of computing architecture 150, some of the basic computing tasks are integrated in close proximity of the memory 106. As illustrated in Figure 1B, a PE 152 is integrated within the memory 106 to perform computations such as, matrix-vector-multiplications or other complex applications. As a result, CIM architecture 150 can enhance memory bandwidth and in the process improve the energy-efficiency of the system. However, the interconnect 104 in the exemplary CIM architecture 150 of Figure 1B still plays a passive role and limits the overall performance and efficiency of the system.
[0005] In view of the above, there is a need to efficiently utilize the interconnect in any type of computing system for improving overall efficiency of systems. Moreover, it would be advantageous to design an interconnect to be compatible with other substrates employing photonic architectures or any other types of computing architectures.

SUMMARY

[0006] Embodiments of the present disclosure aim to provide an energy efficient computing architecture that essentially exploits the interconnect for computation between a processing element and a storage. More specifically, various embodiments of the present disclosure provide a method and system for an interconnect capable of handling advanced processing faster and efficiently. To that effect, an interconnect is disclosed, which uses time-encoding of pulses for executing instruction. In other words, the interconnect performs time encoding of pulses based on input data and a plurality of operations may be performed on such time encoded pulses using delay elements.

[0007] The interconnect or interconnect processing unit as disclosed herein in accordance with the present disclosure employs a time-to-event margin propagation (TEMP) method, which applies the margin-propagation (MP) technique for time-based representation and processing-in-interconnect. In general, the TEMP based computing uses a Time-To-First-Spike (TTFS) encoding, where the occurrence of an event (spike or pulse) encodes the magnitude of a variable. Accordingly, the MP technique tens to naturally lend itself to event-based computation, such as, for example, sorting, temporal difference and causality as non-linearity. TEMP computes using the time taken for the occurrence of the first event (or pulse) and can implement the following computational primitives such as addition, subtraction, thresholding, memory, sorting and the like. In general, any delay in the interconnect can be exploited for computation and memory. By using these primitives, the margin-propagation technique is implemented for efficient computation architectures, for example energy-efficient architectures, which utilize the interconnect for processing and storage. A TEMP circuit or module is used to replace the basic MP modules and the MP-based neural networks are translated into time-based-encoding without sacrificing dynamic range or introducing significant latency.

[0008] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description. For a better understanding of exemplary embodiments of the present invention, together with other and further features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The embodiments of the disclosure itself, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings. One or more embodiments are now described, by way of example only, with reference to the accompanying drawings in which:

[0010] Figure1A illustrates an exemplary conventional computation architecture for performing operations as known in prior art.

[0011] Figure 1B illustrates an exemplary compute in memory architecture employed by neural network hardware accelerators as known in prior art.

[0012] Figure 2 illustrates an exemplary processing in interconnect architecture that exploits the interconnect for computing and storage functions to improve energy efficiency, in accordance with an embodiment of the present disclosure.

[0013] Figure 3 illustrates an exemplary interconnect processing unit of Figure 2, in accordance with an embodiment of the present disclosure.

[0014] Figure 4A illustrates time-to-event margin propagation (TEMP) that implements margin-propagation (MP) algorithm to perform time-domain sorting, in accordance with an embodiment of the present disclosure.

[0015] Figure 4B illustrates an exemplary time-to-event margin propagation (TEMP) that implements margin-propagation (MP) technique to perform addition/subtraction in time-domain, in accordance with an embodiment of the present disclosure.

[0016] Figure 5 illustrates an exemplary time-to-event margin propagation (TEMP) using temporal causality to implement the thresholding in the MP technique, in accordance with an embodiment of the present disclosure.

[0017] Figure 6 illustrates an exemplary embodiment of implementing time-to-event margin propagation (TEMP) using asynchronous and synchronous digital logic, in accordance with an embodiment of the present disclosure.

[0018] Figure 7 illustrates an exemplary embodiment of a pulse-based backpropagation and update technique for training time-to-event margin propagation (TEMP), in accordance with an embodiment of the present disclosure.

[0019] Figure 8 illustrates an exemplary method of processing-in-interconnect using the TEMP methodology in accordance with an embodiment of the present disclosure.

[0020] The figures depict exemplary embodiments of the present disclosure for purposes of illustration only. A person of ordinary skill in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles and scope of the disclosure described herein.

DETAILED DESCRIPTION

[0021] In describing and claiming the present disclosure, the following terminology will be used in accordance with the definitions set forth below. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by a person of ordinary skill in the art. Although any methods, systems and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, exemplary methods, system and materials are described herein. As used herein, each of the following terms has the meaning associated with it in this section. Specific and preferred values may be listed below for individual process parameters, substituents and ranges are for purpose of illustration only, and they do not exclude other defined values or other values falling within the preferred defined ranges.

[0022] In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. As used herein, the singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. The terms “preferred” and “preferably” refer to embodiments of the invention that may afford certain benefits, under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances.

[0023] The terms “comprise(s)”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a device or system or apparatus proceeded by “comprises… a” does not, without more constraints, preclude the existence of other elements or additional elements in the device or system or apparatus.

[0024] Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful and is not intended to exclude other embodiments from the scope of the invention. When the term “about” is used in describing a value or an endpoint of a range, the disclosure should be understood to include both the specific value and end-point referred to. As used herein, the terms “comprising” “including,” “having,” “containing,” “involving,” and the like are to be understood to be open-ended, i.e. to mean including but not limited to.

[0025] While the disclosure is susceptible to various modifications and alternative forms, specific or preferred embodiment thereof may have been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure.

[0026] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific or exemplary embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.

[0027] For a better understanding of the present disclosure, various aspects of the present disclosure will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the present application and does not limit the scope of the present application in any way. Throughout the specification, the same reference numerals designate the same elements.

[0028] The present disclosure provides an efficient computing architecture, for example an energy efficient computing architecture, that may be configured to exploit the interconnect for computation and storage. More specifically, various embodiments of the present disclosure provide an interconnect (also referred to as an interconnect processing unit) that may be configured to employ a time-to-event margin propagation (TEMP) method, which applies a margin-propagation (MP) methodology or technique for a time-based representation of input data (information or content) and processing in the interconnect. In general, the TEMP based computing uses a Time-To-First-Spike (TTFS) encoding, where in accordance with the TTFS, the occurrence of an event (identified as a pulse) encodes the magnitude of the input data at that point in time. Accordingly, a delay element may be used for event-based computation, such as, for example, sorting, temporal difference and causality as non-linearity for implementing some following computational primitives such as addition, subtraction, thresholding, memory, sorting and the like.
[0029] Figure 2 illustrates a processing in interconnect architecture 200 (also referred to as the interconnect system) for improving efficiency, in particular energy efficiency, in accordance with an embodiment of the present disclosure. The architecture 200 includes a processing element 202, a memory 206 and an interconnect processing unit 204, which forms part of the memory 206. The interconnect 204 (also referred to herein as an interconnect processing unit) is employed for computing and storage functions. It may be noted that the interconnect 204 need not be a physical connection between the processing element 202 and the memory 206, but may also be represent as a virtual connection, which for example may be between different software processes and/or software threads that may be running or executing on the memory 206. In an embodiment, the internet as a whole may be used as a framework for the processing in interconnect architecture 200.

[0030] The interconnect 204 is configured to employ a time-to-event margin propagation (TEMP) methodology, where the TEMP methodology employs or applies the margin-propagation (MP) technique (methodology) for a time-based representation and processing of the input data that is received under this system 200. In general, TEMP based computing employs a Time-To-First-Spike (TTFS) encoding, in which essentially the occurrence of an event encodes the magnitude of the input data. Accordingly, in accordance with the embodiments of the present disclosure, the interconnect 204 may be configured to execute instructions to perform a plurality of operations. More specifically, the interconnect 204 may be configured to execute instructions to perform event-based computation, for example sorting, temporal difference and causality as non-linearity, which can implement the following computational primitives like addition, subtraction, thresholding, memory, sorting and the like. As such, these basic functions need to be mapped into operations that can be easily implemented using time-encoding of pulses over interconnect. Some of these functions and their corresponding mapping may be summarized as illustrated in Table 1. It should be obvious that other mappings may be defined and/or created and all such mapping to implement the processing in interconnect fall within the scope of the present disclosure.

OPERATIONS BASIC FUNCTIONS
Addition Delay
Multiplication Sorting
ReLU Causality
Memory Delay

Table 1: Mapping of basic functions to operations

[0031] The interconnect 204 may employ the TEMP method to perform other operations such as, but not limited to, multiply and obtain inner product, adaptive M-of-N encoding, Sparse Distributed Memory (SDM), and the like. Again, it should be obvious to a person of ordinary skill in the art that only a few exemplary operations are defined to illustrate the implementation of the processing in interconnect architecture, and other operations may be implemented and all such operations fall within the scope of the embodiments of the present disclosure. Further, the interconnect 204 may also be used to perform complex computations pertaining to neural network architectures for example, max-pooling, weight implementation using routing table and implementation of other DNN modules, CNN models, Genetic Algorithms, Machine Learning algorithms, Artificial Intelligence algorithms, etc. In an illustrative example, weights may be implemented as delays by using existing look-up table or DNS-lookup infrastructure for training a neural network and drawing inference from the neural network. In general, any delay in the interconnect 204 can be exploited for computation and/or memory, which may be a volatile memory or a non-volatile memory.

[0032] It may be noted that the operations shown in Table 1 and the corresponding basic functions performed by the interconnect 204 are for exemplary purposes only and other operations may be suitably performed based on creation of a suitable mapping of the operations to other basic functions using time-encoding of pulses, or a combination of basic functions to perform a variety of other operations. The interconnect 204 is explained in detail with reference to Figure 3.

[0033] Figure 3 illustrates an exemplary schematic representation 300 of the interconnect 204 of Figure 2, in accordance with an embodiment of the present disclosure. The interconnect 204, which is part of a memory and/or storage includes at least one or more Arithmetic Logic Units (ALUs) 302, , an Input/Output (I/O) interface (IOU) 306, a control unit 308, and a delay unit 310. It should be obvious to an person of ordinary skill in the art the various other elements may also be combined or included in the interconnect 204, and all such representation fall within the scope of the present disclosure.

[0034] The ALU 302 may be considered as a basic block of the Interconnect 204 and may be configured to perform arithmetic and logic operations. More specifically, the ALU 302 may be configured to perform all processes related to arithmetic and logic operations such as addition, subtraction, and shifting operations, including Boolean comparisons (XOR, OR, AND, and NOT operations). Further, the ALU 302 may include two units- an arithmetic unit (AU) and a logic unit (LU). The interconnect 204 may have one or more ALUs, such as, the ALU 302 for example, one for fixed-point operations and another for floating-point operations. It should be obvious to a person of ordinary skill in the art that with advancements of computing technologies, other combinations with the ALU may be designed and all such combinations fall within the scope of the present disclosure.

[0035] Memory 304 may include a plurality of instructions which may be executed by various modules/units of the interconnect 204, which may be configured for performing one or more operations and storage functions. The memory 304 may be a volatile or a non-volatile memory. The I/O interface 306 may be configured to facilitate communication with external entities such as peripheral devices for exchange of information and/or data. For example, the I/O interface 306 receives input data from the PE 202 and provides the information to be stored in the memory 206. The control unit 308 may be configured to generate appropriate timing and control signals for performing various operations in the interconnect204. In other words, the control unit 308 may be configured to fetch and retrieve instructions from the memory 206 in proper sequence and interprets them so as to manage function of the other components in the interconnect 204 at the appropriate moment. The delay unit 310 includes at least one or more delay elements configured to introduce delay in the interconnect 204. In general, such delays produced by the delay unit 310 may be exploited to perform various operations and/or computations as disclosed previously. For example, the delay in an interconnect may be introduced due to electrical delays, photonic delays, optical delays and the like. Some examples of the delay elements in the delay unit 310 for introducing delay include, but not limited to, memristors, Resistor-Capacitance (RC) circuits, optical delay elements, delay lines and the like that may introduce any delay in the interconnect 204. In general, the interconnect 204 may be configured to exploit any delay in the interconnect for computation and memory.

[0036] In an embodiment, the memory 204 stores one or more instructions for which may be used for time encoding a plurality of pulses based on input data and performing one or more operations on the plurality of encoded pulses for implementing computations in interconnect. As such, the I/O interface 306 may be configured to receive input data for processing or storage from a processor such as, the PE 202. The input data received at the I/O interface 306 may correspond to analog or digital variables, and the I/O interface 306 may be configured to forward the input data to the control unit 308.

[0037] The control unit 308 may be configured to perform a time encoding of a plurality of pulses based on the input data, wherein the input data is received by the I/O unit is related to an occurrence of an event (pulse). In other words, the control unit 308 employs the TEMP based computing which uses a Time-To-First-Spike (TTFS) encoding, where the occurrence of an event encodes the magnitude of the input data for time-based representation of the input data. The time-based representation of the input data is then forwarded to the delay unit 310. The delay unit 310 is configured to perform a plurality of operations on the encoded input data (i.e., the time-based representation of the input data). More specifically, the delay unit 310 uses the margin-propagation (MP) technique or methodology for processing-in-interconnect. As such, operations are mapped onto basic functions that may be performed using time encoding of pulses. As already shown previously in Table 1, the delay element 310 executes instructions to perform event-based computation, such as, sorting, temporal difference and causality as non-linearity.

[0038] The TEMP method employed within the framework of the embodiments of the present disclosure computes the time taken for the occurrence of the first event (or pulse) and can implement the following computational primitives such as sorting, addition, subtraction, thresholding and memory. As the input data are time-based representations, the variables arrive at a node pre-sorted when a pulse occurs thereby providing sorted input data. Similarly, time between pulses may be measured to perform addition/subtraction of variables in the input data. For storing variables using TEMP processing, memory is just a delay and can be implemented using physical devices (delay lines) or using counting modules (interrupts). It may be noted that components of the interconnect 204 have been shown for illustration purposes only and the interconnect 204 may include fewer or more number of components than the components depicted in Figure 3. Some examples of operations performed by exploiting delay are explained with reference to Figures 4A, 4B, Figure 5 and Figure 6.

[0039] Figure 4A and Figure 4B illustrate time-to-event margin propagation (TEMP) methodology that implements margin-propagation (MP) technique (methodology or algorithm) to perform time-domain sorting, addition/subtraction, respectively, in accordance with an embodiment of the present disclosure. The input data (i.e., variables) are mapped to time domain representation and appear as time encoded pulses. In other words, the TEMP module or circuit is given a time-to-spike (t1, t2, t3 and t4) as input, which provides output (tz) as output spike (see, Figure 4A). As the input data are represented by time when a pulse occurs, by definition, the variables arrive at a node pre-sorted as shown in Figure 4A. For example, the interconnect 204 performs sorting of the time variables (as illustrated in Figure 4B) when a pulse is generated and providing the sorted variables to the nodes. In some embodiments, the delay module 310 may be configured to determine time between pulses for performing addition/subtraction in time-domain as shown in Figure 4B. In one illustrative example, an up-down counter or a capacitor may be implemented for computing sum/difference between operands in the input data. In another illustrative example, an integrate-fire-neuron may be used for performing the addition/subtraction operations. It may be noted that the counter and the integrate-fire-neuron are provided here by way of examples and the addition/subtraction operations may be realized using a variety of other modules or circuitry that may be configured to measure time difference between pulses.

[0040] Figure 5 illustrates time-to-event margin propagation (TEMP) using temporal causality to implement the thresholding in the MP technique, in accordance with an embodiment of the present disclosure. Typically, temporal causality acts as an ideal diode in the time domain. As such, a present timestep may act as a threshold. Any pulse that occurs before the present timestep may be used for computation and storage As such, any pulse that arrives after the present timestep may be ignored or effectively does not contribute towards the computation. An example implementation of the TEMP using asynchronous and synchronous digital logic is shown in Figure 6.

[0041] In another exemplary embodiment, the TEMP-based interconnect processing unit 204 may be configured to implement neural network architectures containing weights. These weights may be equivalent to delay lines and the routing may be part of the computation. Therefore, the multiplication of weights of the neural network may be equivalent to delaying a pulse before routing the pulse to its destination node. This delay may be implemented on the routing tables that are used for implementing address-event-routing of spikes or events. Hence, using TEMP in neural network architectures, the routing becomes part of the computation and thereby, utilizing the delays in the interconnect for processing and/or storing weights of the neural network. Further, the TEMP framework achieves accuracy on large scale neural networks by providing one-to-one mapping. In an example scenario, a conventional CNN was trained based on deep learning network. The conventional CNN included 1 Convolution layer with 6 channels, 1 hidden layer with 15 nodes followed by a Softmax layer for MNIST dataset and the accuracy of the conventional CNN was compared with the equivalent MP based neural network architecture (i.e., TEMP based CNN) implemented using interconnect processing unit204. The accuracy of TEMP based CNN was same as the baseline CNN network as shown in Table 2.

Method Batch / Hidden Nodes Test Accuracy /
Train Accuracy

Dot Product

64 / 15
97.96% / 99.81%

MP (BN only last layer, loss scaled, NO BN Layer in inference

64 / 15
98% / 99.3%

Table 2

[0042] Moreover, the path delays between the nodes of the neural network may be used as weight parameters of the neural network. These path delays (i.e., weight parameters) may be fixed during the compile time, or the path delay may be variables by inserting the controllable “delay standard cell”. Such an implementation results in significant saving of memory and logic gates as the weights need not be stored. As stated above, processing in interconnect architecture as disclosed in the exemplary embodiments of the present disclosure is robust to skew/routing delay and thereby, enabling us to build a Globally-Asynchronous and Locally-Synchronous Architecture (GALS) to improve the Power-Performance-Area (PPA) matrix.

[0043] Figure 7 illustrates an exemplary schematic representation 700 for pulse-based backpropagation and update technique for training a time-to-event margin propagation (TEMP), in accordance with an embodiment of the present disclosure. In Digital IC design, clock jitter is one of the significant challenges to overcome. However, this can be used as an advantage in the interconnect 204. During the training of an MP-based neural network, the hyperparameter gamma(?) may be changed dynamically to avoid a solution at the local minima. Accordingly, jitter may be exploited to modulate gamma(?) as an advantage rather than artifacts. This ensures the system's performance is pushed as the clock of a system can run at a much higher frequency rather than be limited by the jitter artifacts.

[0044] In an exemplary case of Digital IC, routing of signals adds significant delays, and minimizing relative delays in the clock paths (i.e., skew) incur a big challenge for IC designers and as such, synthesizing a clock-tree in Digital IC may be considered as one of the major steps to reduce skew. However, clock-tree consumes significant power and area on a chip. Such delays introduced due to synthesizing clock tree may be utilized to perform computations and storage in the interconnect processing unit 304, resulting in significant area and power savings

[0045] In another exemplary embodiment, replacing the DFT logic with training logic in digital IC, the chip may be trained for a specific purpose as per customers requirement. The process in interconnect architecture 200 is based on the learning paradigm and can compensate for a fault that occurred due to the manufacturing process. This could result in the elimination of DFT logic in a chip built based on the process in interconnect architecture.

[0046] The present disclosure may advantageously eliminate DFT tests on digital IC by replacing the DFT logic with training logic. The DFT logic assists in automated testing of the chip based on standard models. The present disclosure is advantageously based on the learning paradigm and compensates for the fault that occurred due to the manufacturing process which results in the elimination of DFT logic in a chip, and reduces the cost required for testing the chip. The present disclosure further advantageously eliminates the clock-tree optimization in digital IC by exploiting the RC routing delays which results in area and power saving, thus providing energy efficient machine learning.

[0047] The present disclosure advantageously uses jitter to modulate the Gamma that assists to push a system's performance as a clock of the system can run at a much higher frequency rather than be limited by the jitter artifacts. The present disclosure advantageously enables utilizing the path delays between the nodes of a neural network as weight parameters of our network, which can be fixed during the compile time, or the path delay can be variables by inserting the controllable “delay standard cell”, which results in significant saving of memory and logic gates as the weights need not be stored. This feature allows to build a Globally-Asynchronous and Locally-Synchronous Architecture (GALS) to improve the Power-Performance-Area (PPA) matrix.

[0048] In an exemplary embodiment, a computing system includes at least a processing element and a memory. The memory includes an interconnect configured to receive input data from the processing element. The interconnect is further configured to determine a time to first spike (TTFS), wherein the TTFS is configured to encode a magnitude of the input data for time-based representation of the input data. The interconnect is further configured to perform a plurality of operations on the encoded input data based on mapping the operation to a function in a lookup table, wherein the lookup table is stored in the memory.

[0049] In an exemplary embodiment, the input data may comprise a plurality of pulses, and the input data received is related to occurrence of an event. In an exemplary embodiment, the interconnect is configured for processing the input data for computation of data and storage of data.

[0050] In an exemplary embodiment, the interconnector includes an input/output interface configured to facilitate communication with external entities of exchange of data, wherein the input data comprises analog variables or digital variables and the input/output interface configured to provide the received data to a control unit. In an exemplary embodiment the interconnect further includes an Arithmetic Logic (ALU), wherein the ALU is configured to perform arithmetic and logic operations. In a further exemplary embodiment, the interconnect is part of the memory, wherein the memory includes a plurality of instructions to be executed in the interconnector, wherein the interconnect is part of the memory. In an exemplary embodiment the interconnect includes a control unit configured to generate a timing signal or a control signal and perform time encoding of the received input data. In an exemplary embodiment, the interconnect a delay unit configured to introduce one or more delays in the interconnect to the received input data.

[0051] In an exemplary embodiment, the I/O interface receives input data from the processing element and provides the received input data to be stored in the memory. In an exemplary embodiment, the memory may be a volatile memory or non-volatile memory. In an exemplary embodiment, the interconnector connecting the processing element and the memory includes at least one of a physical element and/or a software process running in the memory and/or a thread in the memory and/or a combination thereof.

[0052] In an exemplary embodiment, the control unit is configured to retrieve the received input data from the memory in a pre-defined sequence; convert the received input data into a plurality of digital pluses; and interpret the received input data for the interconnect at a given instant of time.

[0053] In an exemplary embodiment, the delay is at least one of an electrical delay or a photonic delay or an optical delay or a combination thereof. In an exemplary embodiment, the delay is related to various computations and/or operations. In an exemplary embodiment, the computation includes at least one of sorting or temporal difference or causality. In an exemplary embodiment, the operations include at least one of a memory or a sorting or an addition or a subtraction or multiplication or a thresholding.

[0054] In an exemplary embodiment. the delay unit includes at least one of a mem-resistor or a resistor capacitance circuit or an optical delay element or delay lines.

[0055] In an exemplary embodiment, on receipt of input information or data, the intercept is configured to determine a time to first spike, wherein the time to first spike is associated with the occurrence of an event. In an exemplary embodiment, the occurrence of the event encodes the magnitude of the received input data.

[0056] In an exemplary embodiment, the memory is configured to store one or more instructions for time encoding the input data and performing one or more operations on the encoded input data for implementing computation and/or operation.

[0057] In an exemplary embodiment, the control unit is configured to determine the time-to-first spike (TTFS) for a plurality of input data t1, t2, t3, t4…tn, and provides an output spike tz. In an exemplary embodiment, the delay unit is configured to perform operations on the encoded time-based representation of the input data, and wherein the operation relates to at least one of an add, a subtract, a multiply, a divide, a threshold, a memory, a sort and a combination thereof.

[0058] In an exemplary embodiment, the interconnect is configured for performing sorting the time variables when a pulse or event is generated, wherein the pulse or event are associated with the input data; and providing the sorted variables to a node. In an exemplary embodiment, the delay unit is configured to determine a time between the pulses or events for performing the operation in a time-domain, wherein the pulse or event are associated with the input data. In an exemplary embodiment, a counter is configured to store a potential and update a value of the counter dynamically, and the counter is configured implemented for performing the operation, wherein performing the operation comprises computing a sum and/or a difference in the received input data.

[0059] An exemplary embodiment provides a method for performing computations on a system as illustrated in Figure 8. In step 810 the method includes receiving input data from a processing element, wherein the received input data comprises at least one of an analog signal or a digital signal. In step 820 the method includes providing the received input data to a memory, wherein the memory comprises a interconnect, wherein the received input data is used to perform an operation or stored. In step 830 the method includes encoding a magnitude of the received input data in the time domain at the interconnect at the occurrence of an event. In step 840- the method comprises performing an operation on the data in the time domain at the interconnect. In a further embodiment, the method includes determining a time to first spike (TTFS) for the encoded data, wherein for input data t1, t2, …, an output tz is obtained In a further embodiment, the method include performing an operation which includes introducing a delay in the encoded received input data or performing a computation which includes at least one of a sorting or a temporal difference or a causality.

[0060] In an exemplary embodiment, the method includes performing the operation which includes at least one of a memory and/or a sorting or an addition and/or a subtraction and/or a multiplication and/or a division and/or a thresholding and/or a combination thereof.

[0061] An exemplary method includes processing in an interconnect. In an exemplary embodiment the method includes receiving, by the interconnect processing unit, input data. In an exemplary embodiment, the method includes generating, by the interconnect, an encoded data by time encoding a plurality of pulses based on the input data; In an exemplary embodiment the method includes performing, by the interconnect processing unit, a plurality of operations on the encoded data.

[0062] The language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter.

[0063] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

[0064] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. ,CLAIMS:
1. A computing system 200 comprising:
a processing element 202 and a memory 206, wherein the memory comprises a interconnect 204 configured to
receive input data from the processing element 202;
determine a time to first spike (TTFS), wherein the TTFS is configured to encode a magnitude of the input data for time-based representation of the input data;
perform a plurality of operations on the encoded input data based on an mapping the operation to a function in a lookup table, wherein the lookup table is stored in the memory.

2. The system 200 as claimed in claim 1, wherein the input data may comprise a plurality of pulses, and the input data received is related to occurrence of an event.

3. The system 200 as claimed in claim 1, wherein the interconnect is configured for processing the input data for computation of data and storage of data.

4. The system as claimed in claim 1, wherein the interconnector 300 comprises:
an input/output interface 306 configured to facilitate communication with external entities of exchange of data, wherein the input data comprises analog variables or digital variables and the input/output interface 306 configured to provide the received data to a control unit 308;
an Arithmetic Logic J (ALU) 302, wherein the ALU 302 is configured to perform arithmetic and logic operation;
a memory 304, wherein the memory comprises a plurality of instructions to be executed in the interconnector 204, wherein the interconnect is part of the memory 304;
a control unit 308 configured to generate a timing signal or a control signal, and perform time encoding of the received input data;
a delay unit 310 configured to introduce one or more delays in the interconnect 204 to the received input data.

5. The system as claimed in claim 4, wherein the I/O interface 306 receives input data from the processing element 202 and provides the received input data to be stored in the memory 206.

6. The system as claimed in claim 4, wherein the memory 204 may be a volatile memory or non-volatile memory.

7. The system as claimed in claim 4, wherein the interconnector 204 connecting the processing element 202 and the memory 206 comprises at least one of a physical element and/or a software process running in the memory and/or a thread in the memory and/or a combination thereof.

8. The system as claimed in claim 4, wherein the control unit 306 is configured to:
retrieve the received input data from the memory 304 in a pre-defined sequence;
convert the received input data into a plurality of digital pluses; and
interpret the received input data for the interconnect 204 at a given instant of time

9. The system as claimed in claim 4, the delay is at least one of an electrical delay or a photonic delay or an optical delay or a combination thereof.

10. The system as claimed in claim 4, wherein the delay is related to various computations and/or operations.

11. The system as claimed in claim 10, wherein
the computation comprises at least one of sorting or temporal difference or causality; and
the operations comprise at least one of a memory or a sorting or an addition or a subtraction or multiplication or a thresholding,

12. The system as claimed in claim 4, wherein the delay unit comprises at least one of a mem-resistor or a resistor capacitance circuit or an optical delay element or delay lines

13. The system as claimed in claim 4, wherein on receipt of input information or data, the intercept is configured to determine a time to first spike, wherein the time to first spike is associated with the occurrence of an event.

14. The system as claimed in claim 13, wherein the occurrence of the event encodes the magnitude of the received input data.

15. The system as claimed in claim 4, wherein the memory is configured to store one or more instruction for
time encoding the input data; and
performing one or more operations on the encoded input data for implementing computation and/or operation.

16. The system as claimed in claim 15, wherein the control unit 308 is configured to determine the time-to-first spike (TTFS) for a plurality of input data t1, t2, t3, t4…,tn and provides an output spike tz.

17. The system as claimed in claim 14, wherein the delay unit is configured to perform operations on the encoded time-based representation of the input data, and wherein the operation relates to at least one of an add, a subtract, a multiply, a divide, a threshold, a memory, a sort and a combination thereof,

18. The system as claimed in claim 4, wherein the interconnect 302 is configured for
performing sorting the time variables when a pulse or event is generated, wherein the pulse or event are associated with the input data; and
providing the sorted variables to a node.

19. The system as claimed in claim 4, wherein the delay unit is configured to determine a time between the pulses or events for performing the operation in a time-domain, wherein the pulse or event is associated with the input data.

20. The system as claimed in claim 4, wherein a counter is configured to store a potential and update a value of the counter dynamically, and the counter is configured implemented for performing the operation, wherein performing the operation comprises computing a sum and/or a difference in the received input data.

21. A method for performing computations on a system, the method comprising:
receiving input data from a processing element, wherein the received input data comprises at least one of an analog signal or a digital signal;
providing the received input data to a memory, wherein the memory comprises a interconnect, wherein the received input data is be used to perform an operation or computation or store in the memory;
encoding a magnitude of the received input data in the time domain at the interconnect at the occurrence of an event;
performing an operation on the information and/or data in the time domain at the interconnect.

22. The method as claimed in claim 21, comprises:
determining a time to first spike (TTFS) for the encoded data, wherein for input data t1, t2, …,tn, and an output tz is obtained.

23. The method as claimed in claim 21, wherein the memory 204 may be a volatile memory or non-volatile memory.

24. The method as claimed in claim 21, wherein at least one of a physical element and/or a software process executing in the memory and/or a thread executing in the memory and/or a combination thereof.

25. The method as claimed in claim, 21, wherein performing an operation comprises:
introducing a delay in the encoded received input data.

26. The method as claimed in claim 21, wherein the computation comprise:
at least one of a sorting or a temporal difference or a causality.

27. The method as claimed in claim 21, wherein the operation comprises:
at least one of a memory and/or a sorting or an addition and/or a subtraction and/or a multiplication and/or a division and/or a thresholding and/or a combination thereof.

28. The method as claimed in claim 21, wherein a counter stores a potential and the counter is updated dynamically and performing an operation, for performing the operation, wherein performing the operation comprises computing a sum and/or a difference in the received input data.

29. The method as claimed in claim 21, wherein at the occurrence of an event a spike is detected, and at a threshold value a spike (tz) is emitted.

30. A method for processing in an interconnect processing unit, the method comprising:
receiving, by the interconnect, input data;
generating, by the interconnect, an encoded data by time encoding a plurality of pulses based on the input data;
performing, by the interconnect, a plurality of operations on the encoded data.

Documents

Application Documents

# Name Date
1 202241039593-STATEMENT OF UNDERTAKING (FORM 3) [11-07-2022(online)].pdf 2022-07-11
2 202241039593-PROVISIONAL SPECIFICATION [11-07-2022(online)].pdf 2022-07-11
3 202241039593-POWER OF AUTHORITY [11-07-2022(online)].pdf 2022-07-11
4 202241039593-FORM 1 [11-07-2022(online)].pdf 2022-07-11
5 202241039593-DRAWINGS [11-07-2022(online)].pdf 2022-07-11
6 202241039593-DECLARATION OF INVENTORSHIP (FORM 5) [11-07-2022(online)].pdf 2022-07-11
7 202241039593-PostDating-(11-07-2023)-(E-6-235-2023-CHE).pdf 2023-07-11
8 202241039593-APPLICATIONFORPOSTDATING [11-07-2023(online)].pdf 2023-07-11
9 202241039593-POA [09-08-2023(online)].pdf 2023-08-09
10 202241039593-FORM-26 [09-08-2023(online)].pdf 2023-08-09
11 202241039593-FORM 13 [09-08-2023(online)].pdf 2023-08-09
12 202241039593-AMENDED DOCUMENTS [09-08-2023(online)].pdf 2023-08-09
13 202241039593-ENDORSEMENT BY INVENTORS [10-08-2023(online)].pdf 2023-08-10
14 202241039593-DRAWING [10-08-2023(online)].pdf 2023-08-10
15 202241039593-CORRESPONDENCE-OTHERS [10-08-2023(online)].pdf 2023-08-10
16 202241039593-COMPLETE SPECIFICATION [10-08-2023(online)].pdf 2023-08-10
17 202241039593-FORM-9 [11-08-2023(online)].pdf 2023-08-11
18 202241039593-FORM 18A [16-08-2023(online)].pdf 2023-08-16
19 202241039593-EVIDENCE OF ELIGIBILTY RULE 24C1f [16-08-2023(online)].pdf 2023-08-16
20 202241039593-CORRECTED PAGES [29-08-2023(online)].pdf 2023-08-29
21 202241039593-FER.pdf 2023-09-22
22 202241039593-FORM 13 [07-12-2023(online)].pdf 2023-12-07
23 202241039593-AMENDED DOCUMENTS [07-12-2023(online)].pdf 2023-12-07
24 202241039593-FER_SER_REPLY [21-02-2024(online)].pdf 2024-02-21
25 202241039593-CLAIMS [21-02-2024(online)].pdf 2024-02-21
26 202241039593-ABSTRACT [21-02-2024(online)].pdf 2024-02-21
27 202241039593-PatentCertificate03-06-2024.pdf 2024-06-03
28 202241039593-IntimationOfGrant03-06-2024.pdf 2024-06-03
29 202241039593-EDUCATIONAL INSTITUTION(S) [30-08-2024(online)].pdf 2024-08-30

Search Strategy

1 Search_202241039593E_08-09-2023.pdf

ERegister / Renewals

3rd: 30 Aug 2024

From 11/08/2024 - To 11/08/2025

4th: 07 Aug 2025

From 11/08/2025 - To 11/08/2026