A Control Unit For Provided An Efficient Code Summary And A Method

< Back

A Control Unit For Provided An Efficient Code Summary And A Method Thereof

Abstract: Abstract A control unit for provided an efficient code summary and a method thereof. . The control unit 10 comprising a code database 12 storing multiple code snippets 14 and their corresponding code summaries and a retriever module 15 provides a related information upon comparing at least one stored code snippet 14(a) and an inputted code snippet 16. The control unit 10 generates a code summary that is relevant to the inputted code snippet 16 by a Large Language module (LLM) 18, based on the comparison between the at least one stored code snippet 14(a) and the inputted code snippet 14. (Figures 1 &2)

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 March 2024

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Bosch Global Software Technologies Private Limited

123, Industrial Layout, Hosur Road, Koramangala, Bengaluru – 560095, Karnataka, India

Robert Bosch GmbH

Postfach 300220, 0-70442, Stuttgart, Germany

Inventors

1. Paheli Bhattacharya

Gouri Apartment, 5/1/A Deshbandhu Road (East), Kolkata 700035, West Bengal, India

2. Rishabh Gupta

104, Ansal Krishna 1, Hosur Road, Adugodi, Bangalore 560030, Karnataka, India

Specification

Description:Complete Specification:

The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.
Field of the invention
[0001] The invention is related to a control unit for provided an efficient code summary and a method thereof.

Background of the invention

[0002] Understanding legacy codes in a big code repository is a challenge in the domain of software engineering. It has been estimated by Liang et.al., 2018 that only 15% of the methods are documented (in the Java GitHub repository). This makes it difficult and time-consuming for developers to comprehend the underlying functionality. For programmers new to a particular programming language, it will be helpful if code snippets are labelled with the corresponding intent that describes what purpose it achieves (MacNeil, 2023). Automating the task of code documentation through comments and explanations can therefore prove beneficial in many ways.

[0003] A technical paper by M. Geng, S.Wang, D. Dong, H.Wang, G. Li, Z. Jin, X. Mao, X. Liao, “Large language models are few-shot summarizers: multi-intent comment generation via in-context learning”, Accepted at the 46th International Conference on Software Engineering (ICSE 2024), discloses a similar concept.

Brief description of the accompanying drawings
[0004] Figure 1 illustrates a control unit for provided an efficient code summary according to one embodiment of the invention; and
[0005] Figure 2 illustrates a flowchart for a method of providing an efficient code summary by a control unit according to the present invention.

Detailed description of the embodiments
[0006] Figure 1 illustrates a control unit for provided an efficient code summary in accordance with one embodiment of the invention. The control unit 10 comprising a code database 12 storing multiple code snippets 14 and their corresponding code summaries and a retriever module 15 provides a related information upon comparing at least one stored code snippet 14(a) and an inputted code snippet 16. The control unit 10 generates a code summary that is relevant to the inputted code snippet 16 by a Large Language module (LLM) 18, based on the comparison between the at least one stored code snippet 14(a) and the inputted code snippet 16.

[0007]Further the construction of the control unit and the components of the control unit is explained in detail. The control unit 10 is chosen from a group of control units comprising a microcontroller, a microprocessor, a digital circuit, an integrated chip, and the like. The code database 12 comprises multiple code snippets 14 and their corresponding summaries. For instance, the code snippets 14 that are stored in the code database 12 are referred as (d1, d2, d3….dn).The control unit 10 performs at least one function using multiple hardware units like a retriever module 15 that comprises three sub modules , a NER (please give full form of this) naming module 15(a), a similarity score generation module 15(b), and a ranking module 15(c). The NER naming module 15(a) segregates each of the words present in the code snippet 14(a) into multiple categories, wherein the multiple categories are a library, a class, a data structure, a function and the like.

[0008]The named entities are the different keywords related to methodology like function, library, algorithm, data type etc. Thus, given any code snippet (14(a) or 16) , this module 15(a) extracts the relevant entities that are present in the code. The NER naming module 15(a) categories the words present in both stored code snippet 14(a) and the input code snippet 16.The similarity score generation module 15(b) generates multiple similarity matrix and their corresponding similarity score upon comparing each of the categorized stored code snippet 14(a) with the categorized input code snippet 16. This module 15(b) determines the percent of two code snippets (14(a) & 16) that can be similar to each other. For instance, if the “q” is a code snippet 16 that is inputted, and “d” is one of the code snippet 14(a) from the code data base 12. Then the NER naming module 15(a) will first segregates each of the words present in the code snippets (14(a) &16) d and q and obtains their entities. Then the control unit 10 using the similarity score generation module 15(b), then calculates a similarity between the corresponding entities of d & q (14(a) & 16) to obtain a similarity score ?????????? (??, ??) .

[0009]The ranking module 15(c) ranks each of the generated similarity matrix after the above disclosed comparison. The control unit 10 selects the relevant stored code snippets 14(a) based on the ranking for generation of code summary of the input code snippet 16. The NER similarity-based ranking of the legacy codes to obtain ???????? –, the control unit 10 applies the similarity score generation method by the second sub-module 15(b) (i.e.., the similarity score generation module 15(b)) between the query code snippet(q) 16 and all code snippets (d1, d2, d3, ….dn) (14(a) in the legacy code base 12 to obtain their ?????????? scores. The control unit 10 then ranks the code snippets 14(a) from the code database 12 in decreasing order of similarity and pick the top k code snippets 14(a) and their corresponding summary. The control unit 10 with the help of LLM 18 generates the code summary based on the input code snippet 16 and relevant stored code snippets 14(a) received from the retriever module 15. The relevant stored code snippets 14(a) are taken from the code data base 12 that have the highest similarity score.

[0010]Figure 2 illustrates a method of providing an efficient code summary by a control unit, in accordance with the present invention. In step S1, multiple code snippets 14 and their corresponding code summaries are stored in a code database 12. In step S2, a related information is retrieved by a retriever module 15 upon comparing at least one stored code snippet 14(a) and an inputted code snippet 16. In step S3, a code summary is generated that is relevant to the inputted code snippet 16 by a LLM 18 , based on the comparison between the at least one stored code snippet 14(a) and the inputted code snippet 16.

[0011]The above method is explained in detail. The present invention addresses the broader goal of legacy code understanding by summarizing code snippets. The input given here, is a line of code / method. The output will be the summary that explains what the code is performing. The present disclosed methodology employed in the control unit 10 leverages the documented code from the historical/legacy code data base 12. The present invention employs the Large Language Model (LLM) 18 to achieve the task. The LLMs 18have advanced the state-of-the-art in many Generative AI (artificial intelligence) task, including several use-cases in the software engineering sector.

[0012]The LLM 18 in an unsupervised way of using prompting and uses any one of the two kinds of prompting techniques like a zero-shot technique, wherein the code snippet 16 is directly given and whose summary will be generated. Another kind of prompting technique is a few-shot technique, wherein some code snippets (14(a)) in the form of is given along with the input code snippet 16, so that the LLM 18 can better understand the task & generate coherent responses.



[0013]While the control unit 10 can choose  random code snippets 14(a) in the few-shot technique, the study affirms that selecting examples that are relevant to the current input code snippet 16 achieves better performance. Therefore, the present invention is designed to use the “few-shot context retriever module  18” (the retriever module 18 of the control unit 10), that retrieves relevant few shot  code snippets 14(a). These code snippets  14(a) are added to the prompt which enables the LLM 18 to accurately generate the code summary relevant to the inputted code snippet 16. 

[0014]The input given by the user to the control unit 10   is a code snippet via a user interface 20. This input goes to the few-shot context retriever module 15  that selects a few relevant code snippets 14(a) from a historical code base 12. These code snippets 14(a)  along with the input code snippet 16 is wrapped in a prompt and fed to the LLM 18. The LLM 18 then generated the summary of the input code snippet 16. In this process of generating the summary of the input code snippet 16, the few shot retriever module 15 uses the three sub modules. The NER naming module 15(a) segregates the  different keywords as named entities  related to the software/algorithm like function, library, algorithm,  data type etc. 

[0015]The control unit 10 then hypothesize the code snippets 14(a) with similar entities (e.g., function, library etc.). For instance, the code snippet  14(a) is “print (os,listdir(dname)), then the NER naming module 15(a) segregates the code snippet  14(a) into multiple categories/entities like class, function, library, and data structure.  And the same is performed for the input code snippet 16 also received via user interface 20. Thus, after the completion of the above disclosed task, the retriever module 15  takes as input the query code snippet 16 and the code data base 12 containing documented code and extracts the list of similar code snippets 14(a) ( ???????? ) along with their comments from the code database 12. 

[0016]The ranking module 15(c) then ranks each of the generated similarity matrix of the each of the code snippet 14(a) of the code database 12 and the inputted code snippet 16 , by comparing the inputted code snippet 16  to each of the code snippet  14(a)  of the code database 12. The ranking module 15(c) provides the similarity score and the control unit 10 from this list, a set of k most similar/relevant examples are selected. The selected code snippets 14(a) of the code data base 12 along with their summary and the input code snippet 16 is fed as a prompt to the LLM 18 for generating the code summary related to the input code snippet 16 by considering the similar code snippets 14(a) of the code database 12. 

 [0017] It should be understood that embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modifications and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.

 

 , Claims:We Claim:

1.	A control unit (10) for provided an efficient code summary, said control unit (10) comprising:

-	a code database (12) adapted to store multiple code snippets (14) and their corresponding code summaries;

-	a retriever module (15) adapted to provide related information upon comparing  at least one stored code snippet (14(a)) and an inputted code snippet (16);

-	generate a code summary that is relevant to said inputted code snippet (16) by a large language module (LLM) (18) , based on the comparison between the at least one stored code snippet  (14(a)) and said inputted code snippet (16).

2.	The control unit  (10) as claimed in claim 1, wherein said retriever module (15) comprises three sub modules , a NER naming module (15(a)), a similarity score generation module (15(b)) and a ranking module (15(c)).

3.	The control unit (10) as claimed in claim 2, wherein said NER naming module (15(a)) segregates each of the words present in the code snippet (16/14) into multiple categories, said multiple categories are a library, a class, a data structure, a function.

4.	The control unit (10) as claimed in claim 3, wherein said NER naming module (15(a)) categories the words present in both stored code snippet (14(a)) and the input code snippet (16).

5.	The control unit (10)  as claimed in claim 2, wherein said similarity score generation module (15(b)) generates multiple similarity matrix and corresponding similarity matrix scores upon comparing each of the categorized stored code snippet (14(a)) with the categorized input code snippet (16).

6.	The control unit (10) as claimed in claim 2, wherein said ranking module (15(c)) adapted to rank each of the generated similarity matrix  and their similarity score after the above disclosed comparison. 

7.	The control unit (10) as claimed in claim 2, wherein the control unit (10) selects the relevant stored code snippets (14(a)) based on the ranking for generation of code summary of the input code snippet (16).

8.	The control unit (10) as claimed in claim 1, wherein the LLM (18) of the control unit (10) generates the code summary based on the input code snippet (16) and relevant stored code snippets (14(c)) received from the retriever module (15).

9. A method for providing an efficient code summary by a control unit (10), said method comprising : - storing multiple code snippets (14) and their corresponding code summaries in a code data base (12); - retrieving a related information by a retriever module (15) upon comparing at least one stored code snippet (14(a)) and an inputted code snippet (16); - generating a code summary that is relevant to said inputted code snippet (16) by a large language module ( LLM ) (18), based on the comparison between the at least one stored code snippet (14(a)) and said inputted code snippet (16).

Documents

Application Documents

#	Name	Date
1	202441025792-POWER OF AUTHORITY [29-03-2024(online)].pdf	2024-03-29
2	202441025792-FORM 1 [29-03-2024(online)].pdf	2024-03-29
3	202441025792-DRAWINGS [29-03-2024(online)].pdf	2024-03-29
4	202441025792-DECLARATION OF INVENTORSHIP (FORM 5) [29-03-2024(online)].pdf	2024-03-29
5	202441025792-COMPLETE SPECIFICATION [29-03-2024(online)].pdf	2024-03-29
6	202441025792-Power of Attorney [29-07-2025(online)].pdf	2025-07-29
7	202441025792-Covering Letter [29-07-2025(online)].pdf	2025-07-29