Hybrid Memory Cubeoriented Image Classification Using Convolutional

< Back

Hybrid Memory Cubeoriented Image Classification Using Convolutional Neural Network

Abstract: In this present invention, The project invents the adoption of convolutional neural network model, a deep learning algorithm on a hybrid memory cube. The adoption is conducted to accelerate the process of classification of huge brain tumour images. The hardware acceleration enables fastest classification than other conventional simulation based or hardware-oriented designs.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

21 October 2021

Publication Number

48/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

1. MR. V ASHOK KUMAR

ASSISTANT PROFESSOR, COMPUTER SCIENCE AND ENGINEERING, VEL TECH RANGARAJAN DR. SAGUNTHALA R&D INSTITUTE OF SCIENCE AND TECHNOLOGY, CHENNAI, TAMIL NADU 600062.

2. MR. HEMANT KUMAR

ASSISTANT PROFESSOR, INFORMATION TECHNOLOGY, UNIVERSITY INSTITUTE OF ENGINEERING AND TECHNOLOGY, CHHATRAPATI SHAHU JI MAHARAJ UNIVERSITY, KANPUR, UTTAR PRADESH 208024

3. DR. K VIJILA RANI

ASSISTANT PROFESSOR, DEPARTMENT OF ELECTRONICS AND COMMUNICAITON ENGINEERING, UDAYA SCHOOL OF ENGINEERING, VELLAMODI, TAMIL NADU 629204

4. DR. S SUGUMARAN

PROFESSOR, DEPARTMENT OF ELECTRONICS AND COMMUNICAITON ENGINEERING, VISHNU INSTITUTE OF TECHNOLOGY, KOVVADA, ANDHRA PRADESH 534202

5. MS. R. ASMITHA SHREE

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, SRI KRISHNA COLLEGE OF TECHNOLOGY, KOVAIPUDUR, COIMBATORE, TAMIL NADU 641042.

6. DR. G. KIRUTHIGA

ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, IES COLLEGE OF ENGINEERING, THRISSUR, KERALA 680551.

7. MS. D. NIRMALA

ASSISTANT PROFESSOR, DEPT.OF COMPUTER SCIENCE AND ENGINEERING, SRI KRISHNA COLLEGE OF TECHNOLOGY, KOVAIPUDUR, COIMBATORE, TAMIL NADU 641042

8. MS. R. NITHYA

ASSISTANT PROFESSOR, DEPARTMENT.OF COMPUTER SCIENCE AND ENGINEERING, BANNARI AMMAN INSTITUTE OF TECHNOLOGY, SATHYAMANGALAM, TAMIL NADU 638401

9. DR. S. SREE PRIYA

ASSISTANT PROFESSOR DEPARTMENT OF EEE ARUNACHALA COLLEGE OF ENGINEERING FOR WOMEN, MANAVILAI, TAMIL NADU 629203

Inventors

1. MR. V ASHOK KUMAR

ASSISTANT PROFESSOR, COMPUTER SCIENCE AND ENGINEERING, VEL TECH RANGARAJAN DR. SAGUNTHALA R&D INSTITUTE OF SCIENCE AND TECHNOLOGY, CHENNAI, TAMIL NADU 600062.

2. MR. HEMANT KUMAR

ASSISTANT PROFESSOR, INFORMATION TECHNOLOGY, UNIVERSITY INSTITUTE OF ENGINEERING AND TECHNOLOGY, CHHATRAPATI SHAHU JI MAHARAJ UNIVERSITY, KANPUR, UTTAR PRADESH 208024

3. DR. K VIJILA RANI

ASSISTANT PROFESSOR, DEPARTMENT OF ELECTRONICS AND COMMUNICAITON ENGINEERING, UDAYA SCHOOL OF ENGINEERING, VELLAMODI, TAMIL NADU 629204

4. DR. S SUGUMARAN

PROFESSOR, DEPARTMENT OF ELECTRONICS AND COMMUNICAITON ENGINEERING, VISHNU INSTITUTE OF TECHNOLOGY, KOVVADA, ANDHRA PRADESH 534202

5. MS. R. ASMITHA SHREE

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, SRI KRISHNA COLLEGE OF TECHNOLOGY, KOVAIPUDUR, COIMBATORE, TAMIL NADU 641042.

6. DR. G. KIRUTHIGA

ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, IES COLLEGE OF ENGINEERING, THRISSUR, KERALA 680551.

7. MS. D. NIRMALA

ASSISTANT PROFESSOR, DEPT.OF COMPUTER SCIENCE AND ENGINEERING, SRI KRISHNA COLLEGE OF TECHNOLOGY, KOVAIPUDUR, COIMBATORE, TAMIL NADU 641042

8. MS. R. NITHYA

ASSISTANT PROFESSOR, DEPARTMENT.OF COMPUTER SCIENCE AND ENGINEERING, BANNARI AMMAN INSTITUTE OF TECHNOLOGY, SATHYAMANGALAM, TAMIL NADU 638401

9. DR. S. SREE PRIYA

ASSISTANT PROFESSOR DEPARTMENT OF EEE ARUNACHALA COLLEGE OF ENGINEERING FOR WOMEN, MANAVILAI, TAMIL NADU 629203

Specification

"HYBRID MEMORY CUBEORIENTED IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK"
FIELD OF THE INVENTION
The present invention relates to the field of Computer science. In this present invention, The project invents the adoption of convolutional neural network model, a deep learning algorithm on a hybrid memory cube. The adoption is conducted to accelerate the process of classification of huge brain tumour images. The hardware acceleration enables fastest . classification than other conventional simulation based or hardware-oriented designs.
BACKGROUND OF THE INVENTION
[0001] Background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
|0002| Computing is not as critical for deep learning workloads, because lots of data is reused. FPGAs, GPUs, and even some CPUs can all manage these workloads, but deep learning approaches, especially specialised ones, are ultimately more powerful when it comes to performance per watt. The aim is to employ a custom multiplier and accumulator structure within the HMC that will allow for a great deal of memory bandwidth and energy usage preservation, which is exactly what a lot of the new deep learning chip startups have discovered.
[0003] It should be acknowledged that HMC is not the only memory technology available on the market, and that more stacked memory products will be coming to the market. The Pascal GPU architecture includes HBM, an AMD-backed project. With

HBM, Nvidia has an option to Intel and Micron's collaboration, which is critical in the competition with Intel.
[0004J This is especially relevant in the context of the rapidly growing deep learning market—where Nvidia has long held the advantage, but which now finds Intel closing in. Nvidia's latest design differs from previous ones since it doesn't feature a traditional logic layer or a collection of bespoke functions. Rather, the new systems will be designed to be scalable and take advantage of HBM. Deep learning chip startups and research efforts such as NeuroCube are, for the most part, focused on designing HMC (High-Performance Multi-Chip) devices. At this time, such gadgets offer scalability issues, but they also include their own suite of hurdles.
|0005| In the past, a lot of organisations have mentioned the memory bandwidth issues faced by offload systems and the FPGA's ability to be reprogrammed. With both firms currently concerned with keeping the magic that enables scalability, low power, and great performance for deep learning frameworks, we have only been able to examine restricted architecture details. On top of the existing logic layer of HMC, a programme developed by Georgia Tech known as NeuroCube, might bring some illumination. [0006| The above information is presented as background information only to assist with an understanding of the present disclosure. No determination has been made, no assertion is made, and as to whether any of the above might be applicable as prior art regarding the present invention.
OBJECTIVE OF THE INVENTION
[0007| In this present invention, As HMC is a significant enabler in deep learning designs for reprogrammable FPGA and GPU workloads, seeing it from a different angle indicates that it is also crucial for other businesses' HMC-based deep learning architectures. There is a level ground between the value propositions of ASICs, FPGAs,

and GPUs. A neural computing architecture designed to be software programmable, with power efficiency numbers similar to a custom ASIC, was our goal. |0008| An architecture designer argued that memory bandwidth is the real difficulty for all architectures outside of flexibility and power economy. If you want to begin, start with a high bandwidth memory technique.This is a more general network with an accelerator that can be used with recurrent and convolutional neural networks included in the HMC, and which is also customizable at runtime.
|0009| For illustrative purposes, Mukhopadhyay offered the following visualisation that showed the programmability, performance, and power tradeoffs of bespoke ASICs, GPUs, and CPUs. However, the functions on the left require a much larger amount of memory than is available on-chip, but the operations are really basic. The only thing they require is a multiplier and accumulation capabilities. While this neuromorphic platform is incredibly basic, its design is hindered by the vast amount of data that can't be processed into the chip.
[0010] In order to decide what sort of neural network is being run, the host CPU ■ transmits numbers to the NeuroCube, which then pushes the data to the DRAM. Computation is currently taking place in the HMC logic section, which has many cores and buffer engines located within the yellow space. As a further examination of the inner layer of the HMC reveals, it contains a simplistic design with somewhat simple math operations, despite what would appear to be quite simple procedures from a user perspective.
(00111 These together with other object of the invention, along with the various features of novelty which characterize the invention, are pointed out with particularity in the disclosure. For a better understanding of the invention, its operating advantages and the specific object attained by its uses, reference should be had to the accompanying figures

and descriptive matter in which there are illustrated preferred embodiments of the invention.
SUMMARY OF THE INVENTION
[0012| In convolutional neural networks, the dot product is performed between the input activations and weight kernels to calculate the convolution operation. Each channel needs to accumulate the dot product of each input feature map. Conventional deep learning procedures use 32-bit single-precision floating point numbers for training deep neural networks. Because the 32-bit floating-point format offers a wide bit width and a large dynamic range, it is fairly uncommon for all weight parameters to be different values.
|00I3] The network compression and quantization method reduce the amount of multiplications necessary. Once quantization is done, there are 16 weight values that produce sufficient accuracy for the vast majority of deep neural networks. As a result, they uniformly assign 16 distinct values to each weight parameter. Each weight value was uniquely identified by using a 4-bit identifier. The places that have the same weight tag are gathered in order while computing.
[00I4J Once the accumulation is accomplished, the multiplication with weights is performed only once. Since fewer multiplications are required to obtain a single output point, the overall number of multiplications must be lowered. No more than 16 multiplications are necessary to calculate one output value based on a kernel of a given size.
[0015J See Fig. 1 for a graphical representation of the process of computing convolution using a weight-sharing approach. In keeping with HMC Specification 2.1, the HMC architecture's alterations are kept to a minimum. The light-weight arithmetic operations

in the HMC are just being introduced, therefore no substantial heat issues should be an issue.
|0016| In order to help the machine learning software perform deeper neural network operations, two HMC operations are-implemented: one to store data which represents the same weight, and the other to multiply the data and the weights. A MAC operation is also included to help store the generated products. The redesigned HMC architecture is seen in Fig. 2.
|0017| Most recent deep convolutional neural networks may use 16 weight values, which should provide them with sufficient accuracy. To accommodate the 16 weight scenarios, we updated the HMC design. 16 vaults will be employed in the HMC design for parallel processing. To transfer data from one HMC memory to another, the data weighs the same, therefore it is transmitted to the same vault in the HMC memory. |0018| These together with other summary of the invention, along with the various features of novelty which characterize the invention, are pointed out with particularity in the disclosure. For a better understanding of the invention, its operating advantages and the specific summary attained by its uses, reference should be had to the accompanying figures and descriptive matter in which there are illustrated preferred embodiments of the invention.
DETAILED DESCRIPTION OF THE INVENTION
|00I9| In the following detailed description, reference is made to the accompanying figures which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that the embodiments may be combined, or that other embodiments may be utilized and that structural

and logical.
|0020| While the present invention is described herein by way of example using several
embodiments and illustrative figures, those skilled in the art will recognize that the
invention is neither intended to be limited to the embodiments of figures or drawings
described, nor intended to represent the scale of the various components.
|00211 Further, some components that may form a part of the invention may not be
illustrated in certain figures, for ease of illustration, and such omissions do not limit the
embodiments outlined in any way. It should be understood that the figures and detailed
description thereto are not intended to limit the invention to the particular form
disclosed, but on the contrary, the invention covers all modification/s, equivalents and
alternatives falling within the spirit and scope of the present invention as defined by the
appended claims. The headings are used for organizational purposes only and are not
meant to limit the scope of the description or claims.
|0022| The 16-bit fixed-point operations have a comparatively low cost when compared
to floating-point operations, making them a good fit for deep neural networks. Storage
or other devices will be used to send the input data to the HMC vaults. Because there are
only 16 different values, the weights are loaded into the registers of the host CPU. They
will be used during the multiply-accumulate operation as the value to be multiplied and
accumulated immediately.
|0023| The intrusion set is: To begin an operation, the host processor sends an
instruction to the HMC devices. We only need to provide one instruction in order to get
the HMC going, and then the HMC will take care of the rest of the operations by
applying sequential operations on a one-for-one basis. There is no requirement for
additional host processor instructions to perform HMC functions.
|0024| Two instructions, DOA and Multiplication and Accumulation, are added to the

HMC instruction set for weight-sharing activities (MA).
|0025| In situations where you need to do deep neural network (DNN) operations utilising modified HMC weight-sharing models, you use several vaults, where each vault has the data for a certain weight value. Because of this, it is necessary to visit numerous vaults in parallel when doing data accumulation. The conventional HMC architecture limits a memory request to accessing just one vault. In order to allow several vaults to simultaneously visit their respective vaults, the memory controller needs to be updated.
[0026| It is not difficult to modify the HMC memory request controller. This design implements the first iteration of the algorithm in which the initial memory request is examined, and the multiple vault access requests are generated from it. In order for each vault controller to receive a request, and to start accumulating, the associated vault must be selected.
|0027| DOA involves setting the initial address and the number of accumulations to be performed in the instructions that make up the DOA operation. Finally, each vault looks up the original memory value using the starting address, and performs a succession of accumulations. To implement the accumulation operation, the integers are combined using atomic integer addition.
|0028] Conventional HMC architecture already includes an arithmetic unit. This accumulation operation does not require any new logic to be built. A 16-bit fixed-point MAC unit is connected to each vault controller for MA operation. As each MAC unit is followed by a register to store partial results, each MAC unit is followed by a register to store the partial results. In the MAC unit processing, a serial technique is employed in which the MAC result of a previous operation is used as a starting point for the next MAC operation, which in turn yields a MAC product that is then accumulated to

produce the next MAC product.
(0029) The MAC stores one operand in the vault memory, which is the data that has been accumulated over time. The other operand of the MAC is calculated by extracting the weight value from a register set in the host CPU. An instantaneous value from the host processor is required in the memory request packet in order for HMC to start operating. We use registers or buffers of the host processor to store the weights, and then we use those values as the immediate values for commencing MA operations. This strategy is practical since there are only limited numbers of weight values in convolutional neural network.

Documents

Application Documents

#	Name	Date
1	202141047867-Abstract_As Filed_21-10-2021.pdf	2021-10-21
1	202141047867-Form9_Early Publication_21-10-2021.pdf	2021-10-21
2	202141047867-Claims_As Filed_21-10-2021.pdf	2021-10-21
2	202141047867-Form-5_As Filed_21-10-2021.pdf	2021-10-21
3	202141047867-Correspondence_As Filed_21-10-2021.pdf	2021-10-21
3	202141047867-Form-3_As Filed_21-10-2021.pdf	2021-10-21
4	202141047867-Description Complete_As Filed_21-10-2021.pdf	2021-10-21
4	202141047867-Form-1_As Filed_21-10-2021.pdf	2021-10-21
5	202141047867-Form 2(Title Page)Complete_21-10-2021.pdf	2021-10-21
6	202141047867-Description Complete_As Filed_21-10-2021.pdf	2021-10-21
6	202141047867-Form-1_As Filed_21-10-2021.pdf	2021-10-21
7	202141047867-Correspondence_As Filed_21-10-2021.pdf	2021-10-21
7	202141047867-Form-3_As Filed_21-10-2021.pdf	2021-10-21
8	202141047867-Claims_As Filed_21-10-2021.pdf	2021-10-21
8	202141047867-Form-5_As Filed_21-10-2021.pdf	2021-10-21
9	202141047867-Abstract_As Filed_21-10-2021.pdf	2021-10-21
9	202141047867-Form9_Early Publication_21-10-2021.pdf	2021-10-21