A System And Method For Analyzing Large Repositories And Generating

< Back

A System And Method For Analyzing Large Repositories And Generating Actionable Insights

Abstract: Title: A SYSTEM AND METHOD FOR ANALYZING LARGE REPOSITORIES AND GENERATING ACTIONABLE INSIGHTS A system and method for analyzing large repositories and generating actionable insights; by combining Large Language Models (LLMs) and Graph Neural Networks (GNNs) to generate repository-level intelligence; wherein the system comprises of a parsing module(110), a summarization module(120), an embedding module(130), a graph construction module(140), and an insights module(150), a visualization module(160), a dynamic update module(170), an anomaly detection module(180), a clustering module(190), an impact analysis module(200), a report generation module(210) and a system integration module(220); and employs a method thereof. The system constructs a graph representation of a repository, with nodes and edges; LLMs are utilized to produce semantic summaries of code entities, which are transformed into embeddings for enriching graph nodes; a GNN processes enriched graph to extract insights; thereby dynamically updating the graph with changes in the repository and using a visualization engine for rendering insights, allowing users to interpret and act upon the findings.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 December 2024

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India.

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States.

Specification

Description:FIELD OF INVENTION
The present invention relates to the fields of system-based application engineering and artificial intelligence. More particularly, it relates to a system and method for analyzing large repositories and generating actionable insights for large system-based application codebases using a combination of Large Language Models (LLMs) and Graph Neural Networks (GNNs).

BACKGROUND
Modern system-based application repositories are centralized platforms where system based application source code, documentation, and other resources related to system based application projects are stored and managed. They are essential tools in the system based application development lifecycle, supporting version control, collaboration, and the integration of code changes across multiple contributors. These repositories host interdependent files, including code, libraries, configurations, and test cases, that interact to build and maintain system-based applications. The repositories are integral to practices like continuous integration and continuous deployment, allowing teams to collaborate in real-time and streamline development processes.
As system-based application projects scale, the repositories grow increasingly complex. Developers and teams rely on these repositories to manage numerous interdependencies between components, which can evolve and change rapidly. Despite advances in repository management tools, existing analysis methods primarily focus on static evaluations or isolated metrics, such as code quality scores or basic dependency checks, without providing deeper insights into how different code elements interact or how changes might affect the overall system-based application ecosystem.
As a result, the need for a more robust and dynamic framework to interpret the relationships within a repository, as well as the semantic meaning of the code, has become increasingly apparent. Such a framework would empower developers to make informed decisions regarding code quality, interdependencies, and impact analysis, significantly improving the system based application development process.

PRIOR ART
US patent document US10831372B2 describes a method and system for automated repository monitoring and management. This system is designed to monitor repository storage usage and implement actions when repositories exceed predetermined storage limits. It identifies and tracks repositories for excessive storage consumption, automatically modifying repository states to safeguard against issues and notifying responsible parties. While this approach addresses the issue of resource consumption within repositories, it does not offer an understanding of the complex relationships between code components, nor does it provide any contextual or semantic insights into the code itself.
US patent document US2020278842A1 presents systems and methods for mining system-based application repositories using bots. This approach aims to automate the process of extracting useful information from system-based application repositories by deploying bots that assist developers in answering common questions related to system-based application development and maintenance. Although it improves the efficiency of data retrieval from repositories, this invention does not account for the underlying relationships between code components or provide a deeper, semantic analysis of the repository. The bot-based approach focuses on information extraction without addressing the need for understanding the dynamic interactions and dependencies within a repository.
None of the prior art references above address the challenge of understanding the full context of repository data, the intricate relationships between various components, or the semantic meaning of code. Current systems provide limited insights into code dependencies, quality, and the potential impact of changes. An advanced solution is therefore necessary to address these shortcomings by integrating Large Language Models (LLMs) for semantic code understanding and Graph Neural Networks (GNNs) for contextual relationship analysis.
The present invention thus provides a system that enables scanning a repository (of code) to generate full details about architecture, framework, design patterns, etc. and use them to create a summary of the repository that can be used for advanced actions like defect fix, generate documentation, reverse engineering and the like.

OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system and method for analyzing large repositories and generating actionable insights.
Another object of the present invention is to provide a system and method that analyses large application repositories, enabling the generation of actionable insights by integrating Large Language Models (LLMs) and Graph Neural Networks (GNNs).
Yet another object of the present invention is to construct a graph representation of the repository.
Yet another object of the present invention is to use LLMs to generate semantic summaries for each code entity and convert these summaries into embeddings to enrich graph nodes.
Yet another object of the present invention is to enable the Graph Neural Network (GNN) to analyze the enriched graph to extract insights.
Yet another object of the present invention is to support real-time updates in response to changes within the repository.
Yet another object of the present invention is to offer a visualization engine that renders the graph with actionable insights, allowing users to interpret and act upon the generated data effectively.

SUMMARY OF THE INVENTION
Before the present invention is described, it is to be understood that present invention is not limited to particular methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only, and is not intended to limit the scope of the present invention.
This invention provides a system and method for analyzing large repositories and generating actionable insights for large system-based application codebases using a combination of Large Language Models (LLMs) and Graph Neural Networks (GNNs); wherein the system constructs a graph representation of the repository, where nodes represent code entities (files, classes, functions) and edges represent contextual relationships (imports, calls, co-modifications). LLMs are used to generate semantic summaries for each code entity, which are converted into embeddings to enrich graph nodes. GNNs analyze the enriched graph to extract insights, such as refactoring opportunities, impact predictions, and clustering of related files. The system dynamically updates with repository changes and supports visualization for actionable insights.

BRIEF DESCRIPTION OF THE DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
Fig. 1. illustrates an overview of a system and method for analyzing large repositories and generating actionable insights.

DETAILED DESCRIPTION OF THE DRAWINGS
Before the present invention is described, it is to be understood that this invention is not limited to particular methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only, and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention discloses a system (100) and method for analyzing large repositories and generating actionable insights for large system-based application codebases using a combination of Large Language Models (LLMs) and Graph Neural Networks (GNNs); utilizing advanced parsing, graph construction, and machine learning techniques. The system (100) comprises various modules including a parsing module (110), a summarization module (120), an embedding module (130), a graph construction module (140), and an insights module (150), a visualization module (160), a dynamic update module (170), an anomaly detection module (180), a clustering module (190), an impact analysis module (200), a report generation module (210) and a system integration module (220); enabling efficient repository analysis and insights. The components work in synergy to extract metadata, generate embeddings, build contextual graphs, and provide actionable insights based on repository data.
According to one embodiment, the parsing module (110) is configured to extract files, classes, and functions from the system based application repository, breaking down the repository into its fundamental components for analysis; the summarization module (120) employs a large language model (LLM) to generate concise and meaningful summaries for each extracted code entity where these summaries are converted into semantic embeddings by the embedding module (130), creating numerical representations of contextual meaning to enable computational processing.
According to another embodiment, the graph construction module (140) organizes the parsed repository data into a graph representation, where nodes represent files or functions, and edges depict relationships such as imports, calls, and commit histories. The relationships extracted by the relationship extraction module (150) are enriched with contextual metadata, such as interaction patterns, to enhance the accuracy of the graph.
According to another embodiment, the constructed graph is analyzed by a graph neural network (GNN) within the graph analysis module (180), which evaluates the relationships and identifies patterns, detects anomalies, and clusters related files or functions; whereas the insights module (200) utilizes the results of the GNN analysis to provide actionable recommendations, including refactoring suggestions, dependency management, and impact analysis of proposed changes.
According to yet another embodiment, the visualization module (210) allows developers to view the graph representation and insights in an interactive format, aiding in decision-making and repository optimization. Additionally, the dynamic update module (220) ensures that any changes made to the repository are integrated into the graph representation in real-time, keeping the system accurate and up-to-date.
The system (100) is designed to adapt dynamically to evolving repositories, leveraging advanced machine learning and graph-based methodologies to address the limitations of traditional repository analysis methods, ensuring optimal resource utilization and maintenance efficiency.
According to another preferred embodiment, the system (100) employs a method for analyzing large repositories and generating actionable insights using a combination of Large Language Models (LLMs) and Graph Neural Networks (GNNs). The method comprises the steps as follows:
1. Parsing Module (110):
The parsing module (110) extracts files, classes, and functions from the repository. This module is responsible for analyzing the structure and content of the repository to break it down into meaningful components for further processing.
2. Summarization Module (120):
The summarization module (120) leverages a large language model (LLM) to generate file summaries. These summaries encapsulate the functionality and purpose of each file in the repository, providing a comprehensive overview.
3. Embedding Module (130):
The embedding module (130) converts the generated summaries into semantic embeddings. These embeddings numerically represent the contextual meaning of files, facilitating computational processing.
4. Graph Construction Module (140):
The graph construction module (140) creates a graph-based representation of the repository. Nodes represent files or functions, and edges represent relationships like imports, calls, or commit histories. This graph maps the structural and contextual interdependencies within the repository.
5. Insights Module (150):
The insights module (150) uses graph neural networks (GNNs) to analyze the repository graph. This analysis identifies patterns, detects anomalies, and clusters related files. Insights such as recommendations for refactoring and dependency management are generated.
6. Visualization Module (160):
The visualization module (160) provides an interactive interface to visualize the repository graph and analysis results. This enables developers to interpret insights and make data-driven decisions effectively.
7. Dynamic Update Module (170):
The dynamic update module (170) integrates repository changes into the system (110) to ensure the graph and insights remain accurate and up-to-date in real-time. This adaptability ensures ongoing relevance in evolving repositories.
8. Anomaly Detection Module (180):
The anomaly detection module (180) leverages GNNs and metadata to identify unusual patterns or outliers within the repository graph. This module highlights critical areas needing review or optimization.
9. Clustering Module (190):
The clustering module (190) groups related files, classes, or functions based on contextual similarity, facilitating modularization and code management.
10. Impact Analysis Module (200):
The impact analysis module (200) evaluates the effect of proposed changes on the repository. It provides insights into dependencies and potential conflicts to minimize disruption.
11. Report Generation Module (210):
The report generation module (210) compiles actionable insights, analysis results, and visualizations into comprehensive reports for stakeholders.
12. System Integration Module (220):
The system integration module (220) ensures compatibility with existing development tools and workflows, enabling seamless adoption of the repository intelligence system (110).
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:CLAIMS:
We Claim:
1. A system (100) and method for analyzing large repositories and generating actionable insights; for large system-based application codebases using a combination of Large Language Models (LLMs) and Graph Neural Networks (GNNs);
wherein the system (100) comprises the components including a parsing module (110), a summarization module (120), an embedding module (130), a graph construction module (140), and an insights module (150), a visualization module (160), a dynamic update module (170), an anomaly detection module (180), a clustering module (190), an impact analysis module (200), a report generation module (210) and a system integration module (220);
characterised in that:
the repository parser module (110) is configured to extract structural information from a system-based application repository, including files, functions, and classes;
the summarization module (120) utilizes a large language model to generate semantic summaries for the extracted code entities and convert them into embeddings;
the embedding module (130) converts the generated summaries into semantic embeddings; the graph construction module (140) is configured to create a graph-based representation of the repository;
the insights module (150) uses graph neural network (GNN) analyzer configured to process the constructed graph by identifying unusual patterns and structures in the repository graph for anomaly detection (180), code clustering (190), and impact analysis and prediction (200);
and the visualization module (160) configured to generate actionable insights, including refactoring recommendations, and visualize the graph with annotated nodes and edges.

2. The system as claimed in claim 1, dynamically integrates repository changes into the graph and adapts its insights based on updated metadata and real-time repository conditions.

3. The system as claimed in claim 1, wherein the embedding module (130) utilizes a large language model to generate contextual embeddings for code entities to facilitate semantic similarity analysis.

4. The system as claimed in claim 1, wherein the graph construction module (140) establishes relationships such as import dependencies, function calls, and co-modification patterns between the code entities to enrich the repository graph; wherein the nodes represent code entities, edges represent relationships, and metadata enriches the relationships.

5. The system as claimed in claim 1, wherein the insights module (150) and visualization module (160) clusters related code entities, provides impact analysis for proposed changes, and identifies areas for potential refactoring.

6. The method as claimed in claim 1, comprises the steps of: -
- extracting structural information from a system-based application repository, including files, classes, and functions, using a parsing module;
- generating semantic summaries for the extracted code entities using a large language model and converting the summaries into embeddings;
- constructing a graph representation of the repository, wherein nodes represent code entities, edges represent relationships, and metadata enriches the edges;
- analyzing the repository graph using a graph neural network to identify anomalies, perform clustering, and predict the impact of code changes;
- generating actionable insights and visualizations of the graph for developers to interpret and optimize repository quality; and
- dynamically updating the repository graph with changes made to the repository to ensure ongoing relevance of the insights.

7. The method as claimed in claim 6, wherein metadata for relationships includes input/output schemas, co-modification data, and interaction patterns.
8. The method as claimed in claim 6, wherein the actionable insights include recommendations for dependency management, code refactoring, and impact minimization of proposed changes.

Dated this 26th day of December, 2024.

Documents

Application Documents

#	Name	Date
1	202421103238-STATEMENT OF UNDERTAKING (FORM 3) [26-12-2024(online)].pdf	2024-12-26
2	202421103238-POWER OF AUTHORITY [26-12-2024(online)].pdf	2024-12-26
3	202421103238-FORM 1 [26-12-2024(online)].pdf	2024-12-26
4	202421103238-FIGURE OF ABSTRACT [26-12-2024(online)].pdf	2024-12-26
5	202421103238-DRAWINGS [26-12-2024(online)].pdf	2024-12-26
6	202421103238-DECLARATION OF INVENTORSHIP (FORM 5) [26-12-2024(online)].pdf	2024-12-26
7	202421103238-COMPLETE SPECIFICATION [26-12-2024(online)].pdf	2024-12-26
8	Abstract1.jpg	2025-02-12
9	202421103238-POA [22-02-2025(online)].pdf	2025-02-22
10	202421103238-MARKED COPIES OF AMENDEMENTS [22-02-2025(online)].pdf	2025-02-22
11	202421103238-FORM 13 [22-02-2025(online)].pdf	2025-02-22
12	202421103238-AMMENDED DOCUMENTS [22-02-2025(online)].pdf	2025-02-22
13	202421103238-FORM-9 [25-09-2025(online)].pdf	2025-09-25
14	202421103238-FORM 18 [01-10-2025(online)].pdf	2025-10-01