Abstract: ABSTRACT TITLE: A SYSTEM AND METHOD OF IDENTIFYING MOST SUITABLE OPEN-SOURCE LIBRARIES USING LANGUAGE MODELS AND CONTEXT GRAPHS A system and method of identifying the most suitable open-source libraries for building a product or application; the system comprises an input unit, a processing unit with repository crawling (210), summarization utilizing fine-tuned large language models (LLMs) (220), static and dynamic code analysis (230), scoring repositories (240), building the knowledge web (250), handling inference requests (260), generating implementation plans (270), identifying compatible components (280), creating the final manifest (290), and an output unit (300); and wherein the method involves an advanced mechanism for collecting and analyzing repository data from public sources, extracting attributes such as workflows, use cases, and dependencies using fine-tuned language models. The relationships and compatibility between repositories are visualized through a context graph, facilitating the identification of suitable libraries. The system employs scoring pre-defined instructions and contextual evaluation to rank repositories and generate detailed implementation plans; thereby enabling efficient selection and seamless integration of open-source components, streamlining development workflows and reducing project complexities.
Description:FIELD OF INVENTION
The present invention generally relates to a system and method of leveraging advanced technologies such as fine-tuned language models, context graphs, and scoring pre-defined instructions. More particularly, the present invention relates to a system and method of identifying most suitable open-source libraries using language models and context graphs for building a product or application, to analyze open-source repositories, evaluate their compatibility and quality, and generate implementation plans tailored to user requirements.
BACKGROUND
The development of modern software applications increasingly depends on open-source libraries, which provide pre-built functionalities to streamline the development process. While these libraries offer significant benefits in terms of cost and time efficiency, identifying the most suitable library for a given application poses considerable challenges. Developers must assess libraries based on factors such as functionality, security, compatibility, and relevance to specific project requirements. This task is often performed manually, making it time-consuming, inconsistent, and prone to errors, especially as the volume and complexity of available repositories grow.
Traditional methods for evaluating open-source libraries lack a standardized and systematic approach, often failing to account for critical aspects like vulnerability assessments, architectural compatibility, and integration ease. Additionally, there is no efficient way to visualize relationships and dependencies among repositories, which limits informed decision-making.
PRIOR ART
US20220414228A1 discloses a system for optimizing machine learning models through dynamic adaptive training. It utilizes data streams to adjust the learning process in real-time. While this system improves training efficiency, it does not account for the integration of dynamic contextual knowledge from external sources, a feature that the present invention introduces to enhance adaptability and precision across diverse applications.
US20240020096A1 presents a method for automating the configuration of software applications by analyzing user behavior patterns. While this approach effectively personalizes user interfaces, it lacks the ability to incorporate data from external system dependencies, an area where the present invention innovates by leveraging a comprehensive dependency graph for context-driven recommendations and optimization.
These limitations highlight the need for a solution that can automate the evaluation process and provide developers with actionable insights to seamlessly integrate the best-suited libraries into their applications.
DEFINITIONS:
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but not limited to, a scoring system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to collect and auto-analyze data, a visualization platform, and output. It is extended to computing systems like mobile phones, laptops, computers, PCs, and other digital computing devices.
The term “repositories” refers to digital storage locations that house codebases, libraries, or software components. These repositories can be public or private, version-controlled, and may include, but are not limited to platforms like GitHub, GitLab, or Bitbucket. Repositories are the source of data used for analysis, scoring, and component selection.
The term “SBOM” refers to a comprehensive list or inventory of all software components, dependencies, libraries, and modules used in a software project. It identifies the origins, licenses, and vulnerabilities associated with each component, enabling transparency and informed decision-making in software development and maintenance.
A “scoring system” refers to a mechanism designed to evaluate and rank repositories or components based on various factors such as code quality, frequency of updates, security compliance, and community support. This score reflects the reliability, trustworthiness, and usability of the repository or component.
The term “context graph” refers to a structured representation of relationships between various components, repositories, and their attributes. This graph visually maps dependencies, interactions, and hierarchies, enabling stakeholders to analyze the ecosystem effectively and identify optimal pathways for upgrades and modifications.
The term “manifest” refers to the final output generated by the workflow. It is a document or file containing a detailed summary of the analysis, including recommendations for upgrading components, resolving vulnerabilities, and optimizing performance. The manifest serves as a guide for implementation.
The term “crawling” refers to the automated process of scanning, indexing, and extracting data from various repositories. This process gathers relevant information such as metadata, dependencies, and files for further analysis and processing.
The term “visualization platform” refers to a digital interface or tool designed to display data, relationships, and analyses in a user-friendly visual format. This can include graphs, charts, and diagrams, aiding stakeholders in understanding complex datasets.
The term “processing unit” refers to the computational hardware or software that performs the core analysis, scoring, and generation of context graphs. It includes servers, CPUs, GPUs, or cloud-based systems that handle intensive computations.
The term “output devices” refers to hardware or digital tools that present processed information to users. Examples include computer monitors, mobile screens, printers, or online dashboards.
The term “dependencies” refers to the relationships between components where one component relies on another for functionality. Managing dependencies is critical to ensure compatibility and stability in software systems.
The term “vulnerabilities” refers to weaknesses or flaws in software components that may pose security risks. Identifying and resolving vulnerabilities is essential to maintain system integrity and security.
The term “stakeholders” refers to individuals or entities involved in or affected by the workflow process, including developers, project managers, end-users, and organizations utilizing the system.
The term “component” refers to an individual part of a software system, such as a module, library, or plugin. Components are often used as building blocks for applications and may have dependencies on other components.
OBJECTS OF THE INVENTION:
The primary object of the present invention is to provide a system and method of identifying most suitable open-source libraries using language models and context graphs, for building a product or application.
Yet another object of the present invention is to automate the analysis and summarization of open-source repositories by leveraging fine-tuned large language models to extract use cases, workflows, features, and dependencies.
Yet another object of the present invention is to evaluate repositories systematically using advanced scoring pre-defined instructions that account for security vulnerabilities, architectural complexity, and quality metrics.
Yet another object of the present invention is to generate a context graph that visualizes the relationships and dependencies among libraries, enabling developers to make informed decisions about library selection and compatibility.
Further, the object of the present invention is to streamline the integration process by generating detailed implementation plans, including compatible versions of libraries and frameworks tailored to specific project requirements.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The invention relates to a system and method for identifying the most suitable open-source libraries using advanced technologies such as fine-tuned language models, context graphs, and scoring mechanisms. The system automates the evaluation process, overcoming challenges posed by traditional manual methods, which are time-consuming, inconsistent, and error-prone. It includes components such as a repository crawling module, a summarization module, a static and dynamic code analysis module, a scoring mechanism, and a context graph for visualizing relationships and dependencies.
The system analyzes open-source repositories, extracting attributes like workflows, features, and dependencies. It evaluates these repositories using a scoring formula that factors in security vulnerabilities and architectural complexity. Additionally, it constructs a context graph to map relationships, aiding in the informed selection of compatible components.
Thus, the invention addresses the complexity of choosing open-source libraries by providing an automated, data-driven solution. The system generates detailed implementation plans, including a comprehensive manifest of compatible components and their dependencies. This approach streamlines workflows, optimizes library selection, and enhances the overall efficiency and quality of software development projects.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 shows an overview of the system of the present invention.
FIG. 2 shows a detailed workflow of the method employed by the processing unit of the present system.
DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
To understand the invention clearly, the various components of the system are referred as below:
No. Component
10 System
100 Input unit
200 Processing unit
300 Output unit
210 Crawling Public Repositories
220 Summarizing Repositories Using LLMs
230 SBOM and Static/Dynamic Code Analysis
240 Scoring Repositories
250 Building the Knowledge Web
260 Handling Inference Requests
270 Generating Implementation Plans
280 Identifying Compatible Components
290 Creating the Final Manifest
The present invention is directed to a system and method of identifying most suitable open-source libraries using language models and context graphs for building a product or application, wherein the system comprises an input unit (100), a processing unit (200) further employs a method comprising the steps of a repository crawling (210), a summarization utilizing fine-tuned large language models (LLMs) (220), a static and dynamic code analysis (230), scoring repositories (240), building the knowledge web (250), handling inference requests (260), generating implementation plans (270), identifying compatible components (280),creating the final manifest (290), and an output unit (300). The system operates to analyze repository metadata, source code, and user requirements, enabling organizations to streamline workflows and ensure optimized library selection.
According to a preferred embodiment, the repository crawling (210) acts as the primary interface for identifying and collecting repositories from public platforms such as, but not limited to GitHub or GitLab. This extracts metadata, including contributors, stars, forks, and commit histories, and retrieves manifest files and source code, organizing the collected data for downstream processing.
According to another embodiment, the summarization (220) step employs fine-tuned language models to analyze and condense repository content into concise summaries that include critical details such as use cases, workflows, key features, and component dependencies, facilitating structured representation and effective mapping.
According to yet another embodiment, the static and dynamic code analysis (230) leverages tools like SBOM generators and security analysis software to evaluate repositories for vulnerabilities and code quality. This generates detailed metrics, including maintainability scores, vulnerability assessments, and technical debt, ensuring repositories meet stringent security and quality criteria.
According to a further embodiment, the scoring (240) computes repository scores based on security and architectural complexity, employing a balanced evaluation formula. This scoring mechanism integrates factors such as critical vulnerabilities, high vulnerabilities, and repository complexity to rank libraries effectively for specific use cases. The system employs a scoring mechanism to evaluate repositories by integrating security vulnerabilities and architectural complexity. The formula ensures a balanced evaluation by penalizing critical and high vulnerabilities while rewarding architectural complexity. The formula is expressed as: S = (α × Cc) + (β × Ch) + (γ × Cm) + (δ × M) i.e. Score = (-2 × Count of critical vulnerabilities) + (-1.5 × Count of high vulnerabilities) + (-0.5 × Count of medium/low vulnerabilities) + (1.5 × Complexity)
In yet another embodiment, building the knowledge web (250) visualizes relationships between repository components, creating a comprehensive map of dependencies and interconnections that enables stakeholders to understand the structural and functional relationships within the data, aiding in informed decision-making.
According to another embodiment, handling inference requests (260) processes user queries and iteratively refines requirements to ensure that repository recommendations align precisely with user objectives. By leveraging probabilistic ranking and real-time data from the knowledge web, it identifies repositories that best meet user needs.
According to a further embodiment, generating implementation plans (270) consolidates the results of the analysis, scoring, and inference handling into structured and actionable plans, including detailed workflows, suggested libraries, and optimal integration strategies tailored to the user's specific requirements.
In another embodiment, identifying compatible components (280) performs an advanced analysis to determine library compatibility, accounting for dependencies, system architecture, and operational constraints to recommend the most viable components.
Finally, creating the final manifest (290) generates a comprehensive manifest, including selected libraries, dependency hierarchies, and detailed implementation guidelines, ensuring the streamlined adoption of open-source libraries aligned with project goals.
In another preferred embodiment of the invention, a method of identifying most suitable open-source libraries using language models and context graphs for building a product or application. The method comprises the following steps:
1. Crawling public repositories (210): The system starts by crawling public repositories (210) such as, but not limited to GitHub, GitLab, or Bitbucket, thereby identifying repositories that contain manifest files (e.g., package.json, requirements.txt) and source code. The process involves using API integrations or web scraping to gather metadata and download relevant files, organizing the data into a structured dataset for analysis.
2. Summarizing repositories using LLMs (220): Fine-tuned language models analyze repository metadata and source code to extract attributes such as purpose, key features, workflows, dependencies, and quality metrics, enabling structured representations for downstream processes like scoring and compatibility analysis.
3. SBOM and static/dynamic code analysis (230): The system employs tools for Software Bill of Materials (SBOM) generation and code analysis to evaluate repositories for security and quality metrics. This includes generating a list of components, conducting vulnerability assessments, and measuring code quality metrics such as maintainability and technical debt.
4. Scoring repositories (240): A scoring formula is applied to evaluate repositories based on security vulnerabilities and complexity; wherein the scoring helps prioritize repositories for further analysis, ensuring that the most secure and reliable options are considered. The system assigns scores to repositories based on the predefined scoring formula:
S = (α × Cc) + (β × Ch) + (γ × Cm) + (δ × M)
Where:
● SS: Score of the repository.
● Cc: Count of critical vulnerabilities.
● Ch: Count of high vulnerabilities.
● Cm: Count of medium and low vulnerabilities.
● M: Complexity score of the repository.
● α,β,γ,δ: Weight factors representing the impact of each parameter.
Default weight values:
● α=−2
● β=−1.5
● γ=−0.5
● δ=1.5
5. Building the knowledge web (250): Relationships between repositories and components are mapped into a context graph, which serves as a relational database to store insights. Nodes represent repositories, libraries, and workflows, while edges indicate dependencies and shared use cases.
6. Handling Inference Requests (260): The system receives user intents and uses probing workflows to refine the requirements by generating clarifying questions. This helps define the project scope more accurately based on user needs.
7. Generating Implementation Plans (270): The system generates a detailed implementation plan that includes a list of features, use cases, workflows, and necessary pre-defined instructions and frameworks for seamless integration into the final product.
8. Identifying Compatible Components (280): The system queries the context graph to find libraries with the highest compatibility and retrieves versioning information to ensure seamless integration of components.
9. Creating the Final Manifest (290): Finally, the system creates a manifest file that lists all required components and their versions, ensuring that all dependencies are clearly defined for the application development process.
The present invention offers significant advantages by transforming the process of identifying suitable open-source libraries for software development through a comprehensive, data-driven methodology. By integrating advanced components such as the crawling for metadata retrieval, the fine-tuned language model for summarizing repository content, the scoring for evaluating security and complexity, and the context graph for mapping relationships, the invention ensures accurate identification of compatible libraries and their dependencies. Additionally, while the system manifest generation compiles all necessary components and their versions for seamless integration. This innovative approach overcomes the challenges posed by traditional methods, providing a robust solution that enables developers to efficiently analyze, select, and implement the most suitable open-source libraries, ultimately enhancing the application development lifecycle and improving application quality.
WORKING EXAMPLE-
The invention hereafter will be cited by way of examples only for a better and detailed understanding:
Consider a scenario where a development team seeks to identify the most suitable open-source libraries for their software project. The system begins its operation with the crawling public repositories step. It collects metadata and source code from various open-source repositories such as GitHub and GitLab, focusing on components with manifest files like package.json or requirements.txt. The collected data includes details such as contributors, stars, forks, and commit histories.
In the summarizing repositories using LLMs step, the system employs fine-tuned language models to analyze repository content. This generates concise summaries that highlight use cases, workflows, features, and dependencies, helping the development team quickly understand the repository's potential applicability.
Next, in the SBOM and static/dynamic code analysis phase, tools like SBOM generators and security analysis software evaluate repositories. The system identifies vulnerabilities, assesses code quality, and calculates maintainability metrics, ensuring the repositories meet required security and quality standards.
The system then moves to the scoring repositories step, where it assigns scores to repositories based on the predefined scoring formula:
A repository with 2 critical vulnerabilities, 3 high vulnerabilities, and a complexity score of 8 is calculated as follows: Score = (-2 × 2) + (-1.5 × 3) + (1.5 × 8) = 8.5
Here’s the detailed formula for scoring repositories:
S = (α × Cc) + (β × Ch) + (γ × Cm) + (δ × M)
Where:
● SS: Score of the repository.
● Cc: Count of critical vulnerabilities.
● Ch: Count of high vulnerabilities.
● Cm: Count of medium and low vulnerabilities.
● M: Complexity score of the repository.
● α,β,γ,δ: Weight factors representing the impact of each parameter.
Default weight values:
● α=−2
● β=−1.5
● γ=−0.5
● δ=1.5
Example Calculation:
● Cc=2, Ch=3, Cm= 5, M=8
● Substituting these values into the formula:
S= (−2×2) + (−1.5×3) + (−0.5×5) + (1.5×8)
S=−4−4.5−2.5+12
S=1
This scoring system helps prioritize repositories by penalizing vulnerabilities and rewarding complexity, ensuring a balanced and effective evaluation.
In the building the knowledge web phase, the system maps the relationships and dependencies between repositories into a context graph. This visual representation helps the development team understand the structural and functional interconnections among components, aiding in informed decision-making.
Finally, in creating the final manifest step, the system generates a detailed document summarizing the recommended components, their compatible versions, and implementation guidelines. This manifest provides the development team with a clear and actionable roadmap to integrate the best-suited libraries into their project, ensuring compatibility and enhancing the overall quality of the application.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation. , Claims:CLAIMS
We claim,
1. A system and method to identify the most suitable open-source libraries using fine-tuned language models and context graphs, for building a product or application; wherein the system (10) comprises an input unit (100), a processing unit (200), and an output unit (300);
characterized in that:
the processing unit (200) of the system (10) employs a stepwise workflow comprising the steps of:
a. crawling public repositories (210), that enables initiating the system by retrieving metadata and source code through API integrations or web scraping to gather relevant repositories and create a structured dataset;
b. summarizing repositories using LLMs (220) that enables utilizing fine-tuned large language models to analyze, and summarizing repository content, including use cases, workflows, features, and component dependencies;
c. SBOM and static/dynamic code analysis, (230) that includes constructing a Software Bill of Materials (SBOM) and performing static and dynamic code analyses to evaluate vulnerabilities, code quality, and maintainability;
d. scoring repositories (240) that enables assigning scores to repositories based on predefined criteria, including but not limited to security, architectural complexity, and usability for specific project requirements, the system calculates a repository score based on a pre-defined formula;
e. building a context graph (250); that includes mapping repository relationships and dependencies into a graphical structure to enable a visual understanding of interconnections and hierarchies;
f. handling inference requests (260), that enables refining user requirements through probing workflows, clarifying intents, and tailoring repository recommendations to meet specific needs;
g. generating implementation plans (270), that include producing detailed plans with features, workflows, predefined instructions, and compatible frameworks for seamless integration;
h. identifying compatible components (280), ensuring compatibility of selected repositories with user requirements by analyzing dependencies, architectural constraints, and operational considerations;
i. creating the final manifest (290), where outputting a manifest file with required components, their versions, and implementation guidelines for efficient application development.
2. The system and method as claimed in claim 1, wherein the repositories may include open-source libraries, public or private code repositories, and other data sources.
3. The system and method as claimed in claim 1, wherein the predefined criteria include the quality of the code, the frequency of updates, the activity of contributors, and the repository's compliance with security standards.
4. The system and method as claimed in claim 1, wherein the scoring is based on the said predefined formula, S = (α × Cc) + (β × Ch) + (γ × Cm) + (δ × M); where: SS: Score of the repository, Cc: Count of critical vulnerabilities, Ch: Count of high vulnerabilities, Cm: Count of medium and low vulnerabilities, M: Complexity score of the repository, α,β,γ,δ: Weight factors representing the impact of each parameter; and the default weight values are α=−2, β=−1.5, γ=−0.5 and δ=1.5.
5. The system as claimed in claim 1, wherein the context graph is a graphical representation of relationships between various repositories, components, and their attributes; such that it provides a visual and analytical means to understand how components interact, their dependencies, and their relative importance within the software ecosystem.
6. The system and method as claimed in claim 1, wherein the context-graph is queried to recommend reliable upgrade paths for components, providing an automated way to identify and resolve dependency updates.
7. The system as claimed in claim 1, wherein the system is capable of automatically generating a Software Bill of Materials (SBOM) to map application dependencies and ensure compatibility with the latest component versions.
Dated this 06th day of January, 2025.
| # | Name | Date |
|---|---|---|
| 1 | 202521001043-STATEMENT OF UNDERTAKING (FORM 3) [06-01-2025(online)].pdf | 2025-01-06 |
| 2 | 202521001043-POWER OF AUTHORITY [06-01-2025(online)].pdf | 2025-01-06 |
| 3 | 202521001043-FORM 1 [06-01-2025(online)].pdf | 2025-01-06 |
| 4 | 202521001043-FIGURE OF ABSTRACT [06-01-2025(online)].pdf | 2025-01-06 |
| 5 | 202521001043-DRAWINGS [06-01-2025(online)].pdf | 2025-01-06 |
| 6 | 202521001043-DECLARATION OF INVENTORSHIP (FORM 5) [06-01-2025(online)].pdf | 2025-01-06 |
| 7 | 202521001043-COMPLETE SPECIFICATION [06-01-2025(online)].pdf | 2025-01-06 |
| 8 | Abstract1.jpg | 2025-02-21 |
| 9 | 202521001043-POA [22-02-2025(online)].pdf | 2025-02-22 |
| 10 | 202521001043-MARKED COPIES OF AMENDEMENTS [22-02-2025(online)].pdf | 2025-02-22 |
| 11 | 202521001043-FORM 13 [22-02-2025(online)].pdf | 2025-02-22 |
| 12 | 202521001043-AMMENDED DOCUMENTS [22-02-2025(online)].pdf | 2025-02-22 |
| 13 | 202521001043-FORM-9 [25-09-2025(online)].pdf | 2025-09-25 |
| 14 | 202521001043-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |