System And Method For Transforming Repository Data To Model Context

< Back

System And Method For Transforming Repository Data To Model Context Protocol Compliant Api’s And Tools

Abstract: The present invention describes a system and method that transforms software repository data into Model Context Protocol (MCP) compliant APIs and tools using an intent-driven universal transformation algorithm. The system assists to ingest, interpret, and transform repository data into MCP-compliant formats using a Universal Intent-Aware Transformation Algorithm (UITA). The invention leverages semantic inference, universal schema mapping, and combinatorial indexing to generate APIs and tools consumable by AI and modern applications. The system comprises of Repository Listener Module to extract real-time repository events; Intent Inference Engine module powered by a hybrid rule-based and LLM-driven classifier; Universal MCP Mapper module that encodes data into flexible, composable MCP entities; Combination Generator module to enable intent-causal reverse lookups and API Generator & Tool Exporter module to expose the data for any downstream use.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

07 July 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States

2. Mr. Pradeep Sharma

20200 Lucille Ave Apt 62 Cupertino CA 95014, United States

3. Ms. Anusha Srivastava

D601, Rustomjee Elita, D.N. Nagar, Andheri West, Mumbai, Maharashtra, 400053, India

Specification

Description:FIELD OF THE INVENTION
The present invention relates to the field of software engineering. More specifically, it concerns transforming software repository data into Model Context Protocol (MCP) compliant Application Programming Interfaces (APIs) and tools using an intent-aware, universal transformation algorithm.
BACKGROUND OF THE INVENTION
Software repositories like GitHub have become essential to how modern software is built and maintained. They store a wide range of information including commits, pull requests, issues, branches, and tags that reflects how a project evolves over time. This data is critical for understanding the structure and progress of software development. However, even though repositories contain rich information, making meaningful use of it across tools and workflows remains difficult. Much of the data exists in different formats, and there is no consistent way to represent it in a standard or structured form that can capture both the content and the context of the development activities.
Some of the existing tools attempt to make use of repository data through log parsers or predefined keyword based tagging systems. These can be useful in order to generate reports or extract specific metrics, but they often focus only on what was done and not on the underlying intent. Such systems usually depend on various aspects such as fixed workflows and thereby lack flexibility while dealing with ambiguous or complex repository activity. They do not provide a general purpose or a scalable way to interpret diverse repository events or enable seamless integration with downstream tools or platforms.
The present invention introduces a system and method which is used for transforming software repository data into Model Context Protocol (MCP)-compliant formats with the help of an intent-driven, universal transformation algorithm. This system continuously listens to repository activity and uses a hybrid model engine to infer the developer’s intent such as whether a commit is a bug fix, refactor, or documentation update. This inferred intent, along with relevant associated metadata, is then processed through a Universal MCP Mapper that converts the information into a standardized structure of entities, actions, and context. Later, a Combination Generator enables the creation of multiple retrieval paths for reverse lookups and traceability. Lastly, the data is exposed through Application Programming Interfaces (APIs) and exportable tools.
Prior Art:
For instance, US20200118014A1 discloses adaptable systems and methods for discovering intent from enterprise data. While it introduces methods for identifying user or system intent using structured and unstructured datasets, it focuses primarily on enterprise document processing and lacks a structured framework to transform real-time repository events into a standardized, consumable format. Moreover, it does not address the transformation of software repository artifacts into a schema like Model Context Protocol (MCP), nor does it support the generation of downstream tools and APIs based on inferred intent.
US20240211783A1, a continuation of the above, refines intent extraction methods but remains centered around enterprise data analytics rather than software repository transformation. It does not disclose a universal transformation algorithm that captures developer activity across commits, issues, or branches, nor does it propose a layered schema abstraction that allows flexible and consistent representation of heterogeneous repository sources.
US12265565B2 describes methods for intent-driven query processing through classification pipelines. While this invention enables intent-aware querying, it does not consider live repository event listening, multi-layered intent disambiguation (including large language model-based fallback mechanisms), or the generation of standardized Application Programming Interfaces (APIs) based on repository context and developer behavior.
Although these existing systems attempt to infer intent and improve enterprise data usability, they do not provide a complete system that continuously listens to repository activity, interprets developer intent in a semantic and structured manner, and transforms that information into Model Context Protocol (MCP)-compliant formats. They also lack important components such as generating multiple permutations for reverse lookups, ensuring bidirectional traceability, and creating integrated tools or APIs for downstream use.
DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “large language model (LLM)” used hereinafter in this specification refers to a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
The term “Model Context Protocol (MCP)” used hereinafter in this specification refers to a protocol that standardizes how AI models, especially large language models (LLMs), interact with external tools and data sources. It essentially acts as a universal connector, enabling AI to access information, execute tasks, and utilize services in a structured and secure way.
The term “Universal Schema Layer (USL)” used hereinafter in this specification refers to the abstraction layer that separates the differences across repository sources and ensures consistent transformation into the MCP format.
The term “bidirectional graph” used hereinafter in this specification refers to a dynamic execution structure where each module or task (node) is connected to others through directed edges that support both forward progression and backward tracing. Unlike a linear or one-way flow (like a linked list), this setup allows the orchestrator not only to move forward from one module to the next (e.g., from retrieval to summarization to response generation) but also to trace backward through the graph to inspect, revise, or explain intermediate steps. This backward navigation is crucial for traceability, error correction, and explainability, as it lets the system revisit earlier outputs, reprocess data with updated parameters, or provide a transparent reasoning path to the user or another agent.
OBJECTS OF THE INVENTION
The primary object of the present invention is to offer a system and method that transforms software repository data into Model Context Protocol (MCP) compliant representations using an intent-driven universal transformation algorithm.
Another object of the invention is to infer developer intent from repository activities by using a hybrid model engine that combines rule-based logic with language-driven disambiguation.
Yet another object is to convert the inferred intent and related metadata into standardized MCP entities, actions, and context through a universal schema layer.
A further object of the invention is to enable the generation of multiple permutations of context–action–intent triples to support reverse lookups and ensure bidirectional traceability.
An additional object is to make the transformed data available through automatically generated Application Programming Interfaces (APIs) and tools for seamless integration.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention describes a system and method that transforms software repository data into Model Context Protocol (MCP) compliant representations such as APIs and tools using an intent-driven universal transformation algorithm. The system and method of the present invention assists to ingest, interpret, and transform repository data into MCP-compliant formats using a Universal Intent-Aware Transformation Algorithm (UITA). This algorithm captures not just the syntax of changes (commits, PRs, branches) but their semantic intent (bug fix, refactor, enhancement), and generates multiple data combinations to support retrieval for any intent or use case. The invention leverages semantic inference, universal schema mapping, and combinatorial indexing to generate APIs and tools consumable by AI and modern applications.
According to an aspect of the present invention, the system comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of Repository Listener Module to extract real-time repository events; Intent Inference Engine module powered by a hybrid rule-based and LLM-driven classifier; Universal MCP Mapper module that encodes data into flexible, composable MCP entities; Combination Generator module to enable intent-causal reverse lookups and API Generator & Tool Exporter module to expose the data for any downstream use.
According to an aspect of the present invention, the disclosed invention introduces a novel, non-obvious, and patentable method for transforming heterogeneous repository artifacts into Model Context Protocol (MCP)-compliant representations by leveraging a hybrid intent-classification engine and a universal transformation layer. Unlike conventional systems that rely solely on syntactic parsing, this invention dynamically infers developer intent through a multi-stage classification mechanism combining rule-based logic, semantic embeddings, and large language model (LLM)-driven disambiguation. The inferred intent is then mapped to a universal schema abstraction layer, enabling consistent transformation across diverse repository sources. To ensure retrieval across any downstream use case, the algorithm generates combinatorial permutations of context-action-intent triples, allowing bidirectional traceability and seamless generation of APIs, tools, or data exports conforming to MCP specifications. This approach ensures extensibility, language-agnostic adaptability, and deterministic reproducibility, rendering it uniquely suitable for modern AI-augmented software ecosystems.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 illustrates the system architecture of the present invention
FIG. 2 illustrates a flowchart of the workflow of the present invention .
FIG.3 illustrates the sequence diagram of the system for transforming repository data to MCP of the present invention.

DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method that transforms software repository data into Model Context Protocol (MCP) compliant representations such as APIs and tools using an intent-driven universal transformation algorithm. The system and method of the present invention assists to ingest, interpret, and transform repository data into MCP-compliant formats using a Universal Intent-Aware Transformation Algorithm (UITA). This algorithm captures not just the syntax of changes (commits, PRs, branches) but their semantic intent (bug fix, refactor, enhancement), and generates multiple data combinations to support retrieval for any intent or use case. The invention leverages semantic inference, universal schema mapping, and combinatorial indexing to generate APIs and tools consumable by AI and modern applications.The system as illustrated in FIG. 1 comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of
• Repository Listener Module to extract real-time repository events.
• Intent Inference Engine module powered by a hybrid rule-based and LLM-driven classifier.
• Universal MCP Mapper module that encodes data into flexible, composable MCP entities.
• Combination Generator module to enable intent-causal reverse lookups.
• API Generator & Tool Exporter module to expose the data for any downstream use.
According to the embodiment of the present invention, the repository listener module continuously listens to changes via Git hooks, webhooks, or APIs. Git Hooks refer to customizable scripts that are executed automatically by a Git client on specific repository events such as commit, merge, or push. These are primarily used to enforce coding standards, run automated tests, or block invalid commits at the developer’s local environment level. Webhooks are HTTP-based callbacks triggered by predefined events in a system. They enable real-time communication between systems by sending structured payloads (e.g., JSON) to external URLs, often used to initiate external workflows such as continuous integration pipelines or notification systems. APIs (Application Programming Interfaces) provide a standardized mechanism for external systems to interact with software components. They allow querying, updating, or managing data and functionality through well-defined endpoints and methods, typically over HTTP using RESTful protocols. This module extracts data types such as commits (message, author, diff), pull requests (description, review threads, labels), issues (title, body, assignees) and branching/merging operations.
According to the embodiment of the present invention, the intent inference engine module extracts semantic meaning from commit messages and associated code diffs using the following:
• a trained classification model that is trained on labeled historical commit data to classify commits into predefined categories such as Bug fix, test addition or modification, code refactoring, documentation update and feature addition. This model uses commit messages and code diffs (e.g., added/removed lines, file types, structural changes) as input features.
• LLM-based inference for commits that are ambiguous, multi-intent, or poorly labeled, to infer semantic intent through contextual understanding of the commit message and diff content
• and fine-tuned intent taxonomy engine that refines and aligns the output to a customized, fine-grained taxonomy of commit types, which may be hierarchical or domain-specific.
According to the embodiment of the present invention, the universal (Model Context Protocol ) MCP Mapper module transforms inferred intent and associated metadata into MCP schema objects. These objects include
• MCPEntity- entities such as PRs, Commits, Issues;
• MCPAction- actions such as Merge, Review, Comment and
• MCPContext- context such as affected files, modules, user intent.
This transformation follows a Universal Schema Layer (USL) that abstracts away source-specific structures. To ensure interoperability and uniform processing across the pipeline, the USL transforms all incoming data into a common, schema-independent representation. This transformation abstracts away source-specific attributes, field names, hierarchies, and data types, and instead maps them to a canonical schema that is consistent across all sources.
According to the embodiment of the present invention, the combination Generator module functions to support full intent retrieval. For this, the module generates multiple permutations of relevant dimensions such as context, tool and user intent. These permutations serve to represent possible contextual interpretations of a repository change, supporting exhaustive intent coverage. It also constructs linked indices to support reverse intent queries, allowing users or systems to retrieve change sets or pull requests (PRs) that match specific semantic conditions. (e.g., “find PRs that fixed bugs in payment module”). These indices are optimized for efficient backward traceability. This module also builds bidirectional graphs that represent the evolution of a repository over time, annotated with semantic intent. The API Generator and Tool Exporter module automatically generates RESTful and GraphQL APIs based on the underlying MCP schema definitions. These APIs expose semantically enriched repository data, tool outputs, and inferred user intents to external systems. This module also generates Command Line Interface (CLI) tools, web-based dashboards and visualization interfaces, search interfaces for AI assistants. A Command-Line Interface (CLI) is a text-based interface that allows users to interact with a computer program by typing commands into a terminal or console. Unlike graphical user interfaces (GUIs), where users click buttons or icons, a CLI relies on command syntax and parameters to control a system or application. In a CLI, users issue commands to: execute programs, run scripts, pass arguments, interact with APIs or orchestrators, navigate file systems and trigger workflows. In the world of LLMs and orchestrators (e.g., LangGraph, AutoGen), a CLI serves as an interface layer to trigger toolchains or agent workflows manually, pass input data or queries to an LLM pipeline, inspect intermediate outputs (for traceability), monitor execution paths in workflows or re-run or modify specific steps in a modular system.
According to an embodiment of the present invention, system and method for transforming software repository data into Model Context Protocol (MCP) compliant Application Programming Interfaces (APIs) and tools as described in FIG. 2 comprises the steps of:
• receiving data from a repository as input.
• analyzing the received repository data to understand its intent or purpose through a multi-stage classification mechanism combining rule-based logic, semantic embeddings, and large language model (LLM)-driven disambiguation;
• Applying a universal logic or universal schema abstraction layer to transform the repository data based on the determined intent;
• generating combinatorial permutations of context-action-intent triples, allowing bidirectional traceability, to ensure retrieval across any downstream use case,
• generating multiple combinations or permutations of the transformed data that conform to the specifications defined by MCP (which may refer to a particular framework or model).
• Generating output as usable components such as APIs, tools, or other relevant items.
• Generating the final MCP, completing the transformation process.
The present invention describes a system and method for transforming software repository data into Model Context Protocol (MCP)-compliant formats using an intent-first, universal algorithm. As described in FIG. 3, the disclosed method introduces a novel, non-obvious, and patentable method for transforming heterogeneous repository artifacts into Model Context Protocol (MCP)-compliant representations by leveraging a hybrid intent-classification engine and a universal transformation layer. Unlike conventional systems that rely solely on syntactic parsing, this invention dynamically infers developer intent through a multi-stage classification mechanism combining rule-based logic, semantic embeddings, and large language model (LLM)-driven disambiguation. The inferred intent is then mapped to a universal schema abstraction layer, enabling consistent transformation across diverse repository sources. To ensure retrieval across any downstream use case, the algorithm generates combinatorial permutations of context-action-intent triples, allowing bidirectional traceability and seamless generation of APIs, tools, or data exports conforming to MCP specifications. This approach ensures extensibility, language-agnostic adaptability, and deterministic reproducibility, rendering it uniquely suitable for modern AI-augmented software ecosystems.
Advantages:
• Intent-First Design: Unlike conventional syntactic log parsers, this approach uses inferred developer intent as the primary dimension of transformation.
• Universal Schema Layer: Enables decoupling of Git-based and non-Git-based repo structures.
• Multi-Combination Indexing: Supports flexible, low-latency querying by downstream agents or tools.
• AI with Deterministic Blend: Uses rule-based filters with fallback to generative models to maximize robustness.
, Claims:We claim,
1. A system and method for transforming software repository data to Model Context Protocol compliant Application Programming Interfaces and tools
characterized in that
the system assists to ingest, interpret, and transform repository data into MCP-compliant formats using a Universal Intent-Aware Transformation Algorithm;
the system comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of Repository Listener Module to extract real-time repository events; Intent Inference Engine module powered by a hybrid rule-based and LLM-driven classifier; Universal MCP Mapper module that encodes data into flexible, composable MCP entities; Combination Generator module to enable intent-causal reverse lookups and API Generator & Tool Exporter module to expose the data for any downstream use;
the method for transforming software repository data into Model Context Protocol (MCP) compliant Application Programming Interfaces (APIs) and tools comprises the steps of:
• receiving data from a repository as input.
• analyzing the received repository data to understand its intent or purpose through a multi-stage classification mechanism combining rule-based logic, semantic embeddings, and large language model (LLM)-driven disambiguation;
• applying a universal logic or universal schema abstraction layer to transform the repository data based on the determined intent;
• generating combinatorial permutations of context-action-intent triples, allowing bidirectional traceability, to ensure retrieval across any downstream use case,
• generating multiple combinations or permutations of the transformed data that conform to the specifications defined by MCP (which may refer to a particular framework or model).
• generating output as usable components such as APIs, tools, or other relevant items.
• generating the final MCP, completing the transformation process.

2. The system and method as claimed in claim 1, wherein the repository listener module continuously listens to changes via Git hooks, webhooks, or APIs and the module extracts data types including commits such as message, author, diff; pull requests such as description, review threads, labels; issues such as title, body, assignees and branching/merging operations.

3. The system and method as claimed in claim 1, wherein the intent inference engine module extracts semantic meaning from commit messages and associated code diffs using a trained classification model, LLM-based inference and fine-tuned intent taxonomy engine.

4. The system and method as claimed in claim 1, wherein the trained classification model is trained on labelled historical commit data to classify commits into predefined categories such as Bug fix, test addition or modification, code refactoring, documentation update and feature addition and this model uses commit messages and code diffs such as added or removed lines, file types, structural changes as input features.

5. The system and method as claimed in claim 1, wherein LLM-based inference is for commits that are ambiguous, multi-intent, or poorly labelled, to infer semantic intent through contextual understanding of the commit message and diff content

6. The system and method as claimed in claim 1, fine-tuned intent taxonomy engine refines and aligns the output to a customized, fine-grained taxonomy of commit types, which may be hierarchical or domain-specific.

7. The system and method as claimed in claim 1, wherein , the universal Model Context Protocol Mapper module transforms inferred intent and associated metadata into MCP schema objects and this transformation follows a Universal Schema Layer (USL) that abstracts away source-specific structures.

8. The system and method as claimed in claim 1, wherein the said objects include entities such as PRs, Commits, Issues; actions such as Merge, Review, Comment and context such as affected files, modules, user intent.

9. The system and method as claimed in claim 1, wherein the combination Generator module functions to support full intent retrieval, and the module generates multiple permutations of relevant dimensions such as context, tool and user intent that serve to represent possible contextual interpretations of a repository change, supporting exhaustive intent coverage and it also constructs linked indices to support reverse intent queries, allowing users or systems to retrieve change sets or pull requests that match specific semantic conditions and also builds bidirectional graphs that represent the evolution of a repository over time, annotated with semantic intent.

10. The system and method as claimed in claim 1, wherein the API Generator and Tool Exporter module automatically generates RESTful and GraphQL APIs based on the underlying MCP schema definitions and these APIs expose semantically enriched repository data, tool outputs, and inferred user intents to external systems and this module also generates Command Line Interface (CLI) tools, web-based dashboards and visualization interfaces, search interfaces for AI assistants.

Documents

Application Documents

#	Name	Date
1	202521064488-STATEMENT OF UNDERTAKING (FORM 3) [07-07-2025(online)].pdf	2025-07-07
2	202521064488-POWER OF AUTHORITY [07-07-2025(online)].pdf	2025-07-07
3	202521064488-FORM 1 [07-07-2025(online)].pdf	2025-07-07
4	202521064488-FIGURE OF ABSTRACT [07-07-2025(online)].pdf	2025-07-07
5	202521064488-DRAWINGS [07-07-2025(online)].pdf	2025-07-07
6	202521064488-DECLARATION OF INVENTORSHIP (FORM 5) [07-07-2025(online)].pdf	2025-07-07
7	202521064488-COMPLETE SPECIFICATION [07-07-2025(online)].pdf	2025-07-07
8	Abstract.jpg	2025-07-29
9	202521064488-FORM-9 [26-09-2025(online)].pdf	2025-09-26
10	202521064488-FORM 18 [01-10-2025(online)].pdf	2025-10-01