Abstract: The invention provides a system for automated software modernization using hybrid semantic and structural graph reasoning. The system comprises an Ingestion Layer that collects artifacts from Git, Jira, and documentation platforms; an AST Extractor that performs language-agnostic structural analysis of source code; and a Cross-Artifact Mapper that establishes semantic relationships across development artifacts. These relationships are represented in a Knowledge Graph composed of typed nodes and edges, supporting recursive summarization and dynamic updates. A Theme-Driven Reasoning Engine evaluates the graph using symbolic rules and prompts aligned with modernization themes such as performance, security, and maintainability. Based on these insights, a Recommendation Generator suggests architectural improvements, including modularization and interface simplification. A Visualization Layer renders side-by-side architecture views to aid technical decision-making. The system enables structured, goal-driven software modernization aligned with organizational priorities, supporting continuous improvement throughout the software lifecycle with minimal manual effort.
Description:FIELD OF INVENTION
The present invention relates to the field of software engineering and modernization. More particularly, it pertains to a system and method for automating modernization planning using knowledge graphs, LLMs, and multi-source artifact analysis.
BACKGROUND
In today’s rapidly evolving digital world, software systems need to keep up with changing technologies, business needs, and user expectations. Many organizations still rely on legacy systems, older software that may have served them well in the past but now struggles to meet modern requirements. These systems are often difficult to manage, expensive to maintain, and hard to integrate with newer tools or platforms. As a result, they slow down development, limit innovation, and expose businesses to risks like security vulnerabilities or system failures. Modernization is the process of upgrading these outdated systems so they can perform better, cost less to maintain, and support current technologies. Instead of replacing everything from scratch, modernization focuses on improving and adapting what already exists, making it more efficient, secure, and scalable. It allows companies to stay competitive, reduce long-term risks, and make the most of their existing software investments.
In order to deal with modernization, several organizations have tried various approaches over the years. Many still depend on manual processes such as reviewing code line by line or preparing documentation by hand which are ultimately time-consuming and prone to errors. Some rely on tools that work only with specific programming languages or apply rule-based automation scripts that offer limited adaptability. While a few advanced tools claim to automate parts of the process, they often operate within rigid, predefined templates and lack the flexibility to handle real-world development scenarios. These methods generally lack the ability to connect with the fast-paced and ever-changing nature of modern software workflows. More importantly, they do not offer a holistic view of the system. They are unable to effectively connect related development elements like source code, issue trackers, documentation, and architectural decisions which are essential for meaningful and traceable modernization. As a result, such approaches frequently lead to costly, full-scale rewrites, increasing risk and complexity for the organization.
The present invention introduces a smart and adaptable system which is designed to simplify the complex task of software modernization. By using knowledge graphs and large language models (LLMs), it brings together information from different sources like code repositories, issue trackers, and internal notes to build a clear picture of the software’s structure and context. Unlike traditional tools that follow rigid formats, this system supports modular, theme-based updates, so software developer teams can focus on specific goals such as improving cloud efficiency, boosting performance, or enhancing security. This system provides thoughtful recommendations, easy-to-understand visualizations, and works well across various languages and technologies, making the entire modernization process quicker, safer, and more in tune with what the organization actually needs.
PRIOR ARTS
For instance, US11593438B2 discloses systems and methods for generating theme-based folders by clustering digital media items in a semantic space. Although this invention applies semantic analysis to group content thematically, it focuses solely on organizing media files. It does not deal with software engineering artifacts or modernization strategies. Unlike our invention, it lacks capabilities for semantic and structural reasoning over software development components or generating modernization plans.
CN104778056B discloses an autonomous type software upgrading method involving subject-based retrieval and version management of vehicle software components. While it introduces theme-based update mechanisms, it is limited to a domain-specific application and does not offer general-purpose, language-agnostic modernization. It also lacks visualization, knowledge graph construction, and modular theme-based reasoning which are central to our invention.
US11748411B2 describes cognitive session graphs using enriched data from multiple sources, including blockchain, to perform cognitive inference. While it uses graph-based reasoning, it does not construct hierarchical, typed knowledge graphs representing code, features, and documentation relationships. It also does not provide theme-driven modernization recommendations, which is a core feature of our system.
CN106294639B presents a semantic-based cross-language patent novelty prediction system using NLP and clustering techniques. Although it incorporates semantic reasoning and hierarchical clustering, it is specific to patent analysis and language translation. It does not include software modernization, multi-source artifact ingestion, or modular reasoning using fine-tuned LLMs as our invention does.
The present invention provides a system and method for automated, theme-driven software modernization using hybrid semantic and structural graph reasoning. It uses fine-tuned large language models (LLMs) and rule-based logic to analyze artifacts and identify modernization opportunities, reducing manual effort and enabling continuous, context-aware transformation.
DEFINITIONS
“Product Artifacts” refers to development inputs such as source code repositories, issue trackers, and internal documentation formats, including Markdown, PDF, and Confluence pages.
“Cross-Artifact Mapping” refers to the capability of relating code changes in Git directly to associated tickets and documentation references using LLM-powered semantic linking.
“Typed Property Knowledge Graph” refers to a multi-layer, recursive graph structure where nodes represent files, folders, functions, classes, Jira tickets, documentation snippets, and features, and edges define relationships such as calls, references, extends, defined in, related to ticket, implements feature, and is documented by.
“Language-Agnostic Abstract Syntax Tree (AST) Extractors” refers to tools such as Tree-sitter used by the system to uniformly extract structural elements like function definitions, imports, classes, and dependencies from source code written in different programming languages.
“Theme-Driven Reasoning Engine” refers to the component that applies symbolic rules, fine-tuned LLM prompts, and graph traversal logic to detect modernization opportunities tailored to themes like cloud cost optimization, performance, security, and maintainability.
“Hybrid Semantic and Structural Analysis” refers to the system’s analysis method that combines rule-based logic and fine-tuned LLMs to evaluate files and folders for architectural roles, logic complexity, coupling, cohesion, and anti-patterns.
“Modular Modernization” refers to the approach of incrementally modernizing software based on specific themes, allowing updates to targeted areas without requiring a full system rewrite.
“Visualization Layer” refers to the part of the system that generates visualizations of current and recommended architectures, including dependency maps and phase-wise modernization paths.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system and method for automated, theme-driven software modernization using hybrid semantic and structural graph reasoning.
Another object of the invention is to ingest product artifacts from multiple sources.
Yet another object of the invention is to extract metadata and perform cross-artifact mapping using language-agnostic techniques and LLM-powered semantic linking.
Another object of the invention is to construct a typed property knowledge graph representing relationships among various development entities.
An additional object of the invention is to perform semantic and structural analysis using a hybrid of rule-based logic and fine-tuned large language models.
A further object of the invention is to support modular, theme-based modernization strategies.
Another object of the invention is to generate context-aware modernization recommendations and architecture transformations.
Another object of the invention is to visualize current and target architectures and enable incremental modernization.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The system and method of the present invention describes automatically generating modernization plans from product artifacts such as source code repositories, issue trackers, and internal documentation. The system constructs a universal knowledge graph representing the relationships between files, folders, modules, features, and issues. Using fine-tuned LLMs in conjunction with rule-based reasoning, it analyses functional logic, dependencies, architecture, and thematic constraints (e.g., cloud optimization, security). The system outputs visualizations and recommendations for incremental or full modernization, and provides before-and-after architecture maps.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 illustrates the system architecture and workflow of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention relates to a system for automated [minimal human intervention], theme-driven software modernization using hybrid semantic and structural graph reasoning. The system is designed to analyze functional logic, architecture, and thematic constraints from product artifacts such as source code repositories, issue trackers, and internal documentation formats. It provides recommendations and visualizations that assist organizations in modernizing their systems in a modular, efficient, and goal-oriented manner.
The system comprises of the Ingestion Layer , the AST Extractor, the Cross-Artifact Mapper, the Knowledge Graph, the Theme-Driven Reasoning Engine, the Recommendation Generator, and the Visualization Layer, arranged sequentially to perform the modernization pipeline. Each of these components performs a unique and essential role within the system, operating in an automated [minimal human intervention] manner to enable continuous and context-aware modernization.
In an embodiment of the invention, the process begins with the Ingestion Layer , which collects inputs from various development and documentation sources. These sources include Git repositories, Jira issues, and internal documentation formats such as Markdown, PDF, and Confluence pages. The Ingestion Layer is responsible for normalizing these diverse inputs into a structured format. This normalization step ensures that the data collected from different systems can be uniformly processed in subsequent stages. By automating [minimal human intervention] this collection and normalization process, the Ingestion Layer establishes a strong foundation for the downstream components of the system.
In another embodiment of the invention, once the inputs are collected and normalized by the Ingestion Layer , the AST Extractor takes over to perform a language-agnostic analysis of the source code. This component uses abstract syntax tree (AST) extractors such as Tree-sitter to handle source code written in different programming languages. The AST Extractor identifies and extracts key programming constructs, including function definitions, imports, classes, and dependencies. It also captures supporting details such as change history and file ownership. This enables the system to uniformly interpret the structural aspects of the codebase across different technologies. The output of the AST Extractor is passed on to the next component, the Cross-Artifact Mapper, for further semantic linking and analysis, ensuring continuous, automated [minimal human intervention] processing of the data.
In yet another embodiment of the invention, following the structural extraction by the AST Extractor, the Cross-Artifact Mapper performs semantic linking between the collected and processed development inputs. It uses large language models (LLMs) to relate code changes in Git to associated Jira tickets and documentation references. The Cross-Artifact Mapper ensures that all related information, regardless of whether it comes from code, issues, or documentation, is effectively linked. This cross-referencing enables the system to establish a coherent narrative of what has been built, why it was built, and how it evolved. The output from the Cross-Artifact Mapper is then passed to the Knowledge Graph , where the relationships between entities are represented in a structured and recursive manner, further facilitating the modernization process.
In yet another embodiment of the invention, the relationships established by the Cross-Artifact Mapper are represented within the Knowledge Graph, which forms the core of the system. The Knowledge Graph is a typed property knowledge graph composed of nodes and edges. Nodes represent entities such as files, folders, classes, Jira tickets, documentation snippets, and features. The graph supports recursive summarization and traversal queries, which are critical for downstream reasoning. As new data is ingested or existing data is modified, the graph is updated automatically [minimal human intervention], thereby ensuring that it always reflects the current state of the software project. The output of the Knowledge Graph is passed to the Theme-Driven Reasoning Engine , where symbolic rules and reasoning are applied to identify modernization opportunities.
In yet another embodiment of the invention, with the Knowledge Graph in place, the Theme-Driven Reasoning Engine applies symbolic logic and LLM-based reasoning to identify modernization opportunities. Each modernization theme such as cloud cost optimization, performance, security, or maintainability has its own set of rules, prompts, and traversal paths. The Theme-Driven Reasoning Engine evaluates the graph using these mechanisms to detect tightly coupled modules, logic complexity, performance bottlenecks, insecure data flows, and architectural anti-patterns. This reasoning is performed automatically [minimal human intervention] and aligns with organizational goals set by the selected themes. The insights generated by the reasoning engine are then passed on to the Recommendation Generator, which then converts them into actionable modernization outputs.
In an embodiment of the invention, the insights generated by the Theme-Driven Reasoning Engine are passed on to the Recommendation Generator, which converts them into actionable modernization outputs. These may include suggestions for modularization, service extraction, refactoring, interface simplification, or restructuring of components. The Recommendation Generator operates automatically [minimal human intervention] and produces context-aware recommendations that are tailored to the current structure and identified issues in the system. The recommendations generated by the Recommendation Generator are then forwarded to the Visualization Layer for representation in visual form.
In an embodiment of the invention, the Visualization Layer provides visual output of both the existing and the recommended architectures. It renders architecture diagrams using tools such as Graphviz, Mermaid, and D3.js. The Visualization Layer presents module-level and component-level breakdowns, before-and-after comparisons, and change paths. These visualizations help teams to understand the impact of recommended transformations and to plan implementation phases. Visualization is generated automatically [minimal human intervention] and supports filtering by themes or technical components. This final output from the Visualization Layer provides the comprehensive visual representation necessary for teams to effectively execute the modernization process.
In a preferred embodiment of the invention, the various components of the system are configured to perform the automated theme-driven software modernization process comprising the steps of:
● Ingesting project artifacts from Git repositories, Jira issues, and internal documentation formats such as Markdown, PDF, and Confluence pages using the Ingestion Layer ;
● Extracting structural constructs from source code using the AST Extractor ;
● Mapping relationships between the artifacts using the Cross-Artifact Mapper ;
● Constructing a knowledge graph to represent the relationships between artifacts using the Knowledge Graph ;
● Identifying modernization opportunities using symbolic logic and LLM-based reasoning with the Theme-Driven Reasoning Engine ;
● Generating actionable modernization outputs through the Recommendation Generator ; and
● Rendering architecture visualizations using the Visualization Layer .
According to an embodiment of the present invention, the Visualization Layer ensures that the generated architecture diagrams and visualizations are accurate, context-aware, and suitable for ongoing use across the modernization lifecycle.
An embodiment of the invention discloses various advantages of the automated theme-driven software modernization system of the present invention. The significant advantages are as follows:
• The present invention provides a scalable and intelligent software modernization solution that reduces manual intervention and minimizes errors by utilizing hybrid semantic and structural graph reasoning and large language models to automate modernization tasks.
• The system enables continuous and context-aware modernization by analyzing functional logic, architecture, and thematic constraints from project artifacts, ensuring actionable recommendations aligned with modernization goals.
• The system ensures organized data flow, traceability, and structured representation through a dynamic knowledge graph, which maps relationships among source code, Jira issues, documentation, and features.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation. , Claims:We claim,
1. A system for automated driven software modernization using hybrid semantic and structural graph reasoning, wherein the system comprises an ingestion layer , an AST extractor , a cross-artifact mapper , a knowledge graph, a theme-driven reasoning engine, a recommendation generator, and a visualization layer ; arranged functionally and sequentially to enable the automated modernization of software systems through context-aware recommendations and visualizations;
characterized in that:
the ingestion layer collects project artifacts, including commits, branches, pull requests, and diff summaries from version control systems like git; epics, stories, and sub-tasks from project management tools; and documentation in formats such as markdown, pdf, and confluence pages and this layer normalizes the data for uniform processing across the system;
the AST extractor performs language-agnostic structural analysis on the source code artifacts obtained from the ingestion layer, identifying key constructs like function definitions, imports, classes, and dependencies and ensures accurate identification and extraction of structural information to create a detailed understanding of the source code, independent of programming language;
the cross-artifact mapper connects and establishes relationships between the artifacts ingested in the previous steps, it maps code changes from GIT to corresponding JIRA issues and documentation, thus ensuring traceability and this enables semantic relationships to be created between different pieces of the project;
the knowledge graph stores and organizes these relationships in a contextual representation, forming a dynamic and continuously updated graph and this graph allows for recursive summarization and traversal queries to explore how artifacts such as commits, features, documents, and tickets are interrelated, supporting the evolving state of the project;
the theme-driven reasoning engine uses symbolic rules and prompts related to selected modernization to analyze the knowledge graph as it evaluates the graph to detect tightly coupled modules, complexity, performance bottlenecks, insecure data flows, and architectural anti-patterns, identifying areas for software modernization;
the recommendation generator produces actionable outputs, including recommendations for architectural restructuring such as modularization, service extraction, interface simplification, and other relevant modernization suggestions based on the insights derived from the reasoning engine and these recommendations align with the organization's selected themes;
the Visualization Layer generates and presents visual outputs of the current and recommended architectures and it supports visual comparisons, such as before-and-after views of the software architecture, module-level and component-level breakdowns, and change paths and the users can filter these visualizations by theme or technical components to plan and execute the modernization process.
2. The system as claimed in claim 1, wherein the Ingestion Layer collects development artifacts from tools such as Git (commits, branches, pull requests), Jira (epics, stories, sub-tasks), and documentation repositories (Markdown, PDF, Confluence, Google Docs).
3. The system as claimed in claim 1, wherein the AST Extractor performs analysis across multiple programming languages to extract structural constructs such as functions, classes, dependencies, and imports, enabling language-agnostic code analysis.
4. The system as claimed in claim 1, wherein the Cross-Artifact Mapper establishes relationships between related artifacts, including linking code changes from Git to Jira issues and corresponding documentation, thus forming a cohesive representation of the software system's evolution.
5. The system as claimed in claim 1, wherein the Knowledge Graph continuously updates to reflect the current state of the project, enabling real-time traversal queries to maintain an up-to-date, contextual representation of the system.
6. The system as claimed in claim 1, wherein the Theme-Driven Reasoning Engine identifies modernization opportunities using predefined rules and prompts aligned with selected themes, such as optimizing performance, enhancing security, or improving maintainability.
7. The system as claimed in claim 1, wherein the Recommendation Generator provides modularization, service extraction, and interface simplification recommendations based on the insights from the Theme-Driven Reasoning Engine .
8. The system as claimed in claim 1, wherein the Visualization Layer enables users to view side-by-side visualizations of the existing and recommended architectures, providing clear insights into the impact of modernization and helping with the planning of implementation phases.
9. A method for automated theme-driven software modernization using the system as claimed in claim 1, the method comprising the steps of:
a. ingesting project artifacts and internal document repositories using the ingestion layer ;
b. extracting structural constructs from source code using the AST extractor ;
c. mapping relationships between artifacts using the cross-artifact mapper ;
d. constructing a knowledge graph representing these relationships using the knowledge graph ;
e. identifying modernization opportunities using the theme-driven reasoning engine ;
f. generating actionable recommendations using the recommendation generator ; and
g. rendering visual outputs of the existing and recommended architectures using the visualization layer .
10. The system as claimed in claim 1, wherein the system accelerates the modernization process, maintains continuity in modernization efforts, and ensures alignment with organizational goals, thereby enabling effective and efficient transformation of software architecture and functionality.
| # | Name | Date |
|---|---|---|
| 1 | 202521036850-STATEMENT OF UNDERTAKING (FORM 3) [16-04-2025(online)].pdf | 2025-04-16 |
| 2 | 202521036850-POWER OF AUTHORITY [16-04-2025(online)].pdf | 2025-04-16 |
| 3 | 202521036850-FORM 1 [16-04-2025(online)].pdf | 2025-04-16 |
| 4 | 202521036850-FIGURE OF ABSTRACT [16-04-2025(online)].pdf | 2025-04-16 |
| 5 | 202521036850-DRAWINGS [16-04-2025(online)].pdf | 2025-04-16 |
| 6 | 202521036850-DECLARATION OF INVENTORSHIP (FORM 5) [16-04-2025(online)].pdf | 2025-04-16 |
| 7 | 202521036850-COMPLETE SPECIFICATION [16-04-2025(online)].pdf | 2025-04-16 |
| 8 | 202521036850-FORM-9 [26-09-2025(online)].pdf | 2025-09-26 |
| 9 | 202521036850-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |
| 10 | Abstract.jpg | 2025-10-08 |