Generative And Agentic Artificial Intelligence Based System And Method

< Back

Generative And Agentic Artificial Intelligence Based System And Method Of Operation

Abstract: The present invention relates to a Gen AI and agentic AI-based system (102) configured to receive metadata. The system (102) is integrated into enterprise databases and metadata catalogues, to identify and extract subject areas, entity/table names, attributes, relationships, and data types. The system (102) uses a LLM (318) to interpret the metadata (306) to understand the business context with respect to industry best practices. The system (102) utilizes a RAG (320) to process metadata embedding’s (310) and retrieves metadata and recommends potential use cases, data products, and attributes from within the enterprise metadata. The system (102) recommends structured data products based on metadata (306), suggests attributes for corresponding data products, and maps attributes back to the original metadata (306) sources. The system (102) generates an output (324) of structured tables for data products, attributes, and use cases by protecting sensitive information and generates executable SQL Queries for execution and data extraction.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

30 May 2025

Publication Number

38/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

TECH MAHINDRA LIMITED

Tech Mahindra Limited, Phase III, Rajiv Gandhi Infotech Park Hinjewadi, Pune - 411057, Maharashtra, India

Inventors

1. JHA, Saurabh

Tech Mahindra Limited, Oberoi Garden Estate, Chandivali, Off Saki Vihar Road, Andheri East, Mumbai - 400072, Maharashtra, India

2. TIWARI, Sushil

Tech Mahindra Limited, Phase 3, Hinjawadi Rajiv Gandhi Infotech Park, Hinjawadi, Pimpri-Chinchwad, Pune – 411057, Maharashtra, India

3. MAHARAJAN, Ezhilarasan

Tech Mahindra, KIADB Industrial Area, Plot No. 45- 47, First Main Rd, Phase II, Electronic City, Bengaluru - 560100, Karnataka, India

4. JAJOO, Sachin

Tech Mahindra Limited ,7th & 8th Floor, Capital Cyberscape, Sector-59, Golf Course Extension Road, Gurugram - 122102, Haryana, India

5. VISHNU, Bonthala Kanth

Tech Mahindra, Survey number 64,Plot number 35,36,#techmahindra, hitech city, Jubilee enclave, Madhapur, Hyderabad -500081, Telangana, India

6. KHAYALIA, Sonali

Tech Mahindra Limited ,7th & 8th Floor, Capital Cyberscape, Sector-59, Golf Course Extension Road, Gurugram - 122102, Haryana, India

7. KALYAN, Pavan C

Tech Mahindra, KIADB Industrial Area, Plot No. 45- 47, First Main Rd, Phase II, Electronic City, Bengaluru - 560100, Karnataka, India

8. VELAMAKURI, Himabindu

Tech Mahindra, KIADB Industrial Area, Plot No. 45- 47, First Main Rd, Phase II, Electronic City, Bengaluru - 560100, Karnataka, India

Specification

DESC:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of Invention:
GENERATIVE AND AGENTIC ARTIFICIAL INTELLIGENCE-BASED SYSTEM AND METHOD OF OPERATION

Applicant:
TECH MAHINDRA LIMITED
A company Incorporated in India under the Companies Act, 1956
Having address:
Tech Mahindra Limited, Phase III, Rajiv Gandhi Infotech Park Hinjewadi,
Pune - 411057, Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS REFERENCE TO RELATED APPLICATION AND PRIORITY
[001] The present invention claims priority from Indian patent application 202521052658 filed on date 30th May 2025.
TECHNICAL FIELD
[002] The invention pertains to the field of data analytics and artificial intelligence, specifically focusing on a Generative Artificial Intelligence (Gen AI) and Agentic AI-based recommendation system for data engineering and data management. Particularly, the system uses advanced natural language processing, large language models (LLMs), and a Retrieval-Augmented Generation (RAG) framework, referred to as the Langchain and LangGraph Agentic Framework, to automate the discovery, analysis, and operationalization of metadata.
BACKGROUND OF THE INVENTION
[003] Generally, various Organizations across the Industry face major challenges in their data modernization process. The data mesh adoption in the organization is the manual identification and governance of data products across different domains. Currently, organizations face several difficulty in understanding the full potential of existing metadata to efficiently utilizing the data.
[004] Lack of Clarity on Potential Analytical Use Cases: As the Organizations grow, Mergers and Acquisition happens, the complexity of their data eco-system keeps on increasing manifold. Many organizations struggle to identify valuable analytical use cases that can be derived from their own existing data assets to generate actionable insights. This increases the gap between the data-asset Organization possesses and the insight they generate leveraging those data-assets. Business teams rely on data scientists, SMEs, Source system Analyst and Business Analysts to define potential insights, causing delays and inefficiencies. This in general takes weeks of efforts altogether.
[005] Inefficient Data Product Generation and subsequently generation of DDL scripts and Data Model across data domains: Group of SMEs in Data teams manually design data products, leading to slow, inconsistent processes that fail to leverage available metadata efficiently and often leads to duplicate data products creation. Once the data products gets generated, the data teams typically takes weeks to months to identify attributes required for those data products, design data model based upon data products and attributes and finally generate DDL scripts to create table structure in the desired target data-base to load the data
[006] Inconsistent Metadata Utilization: Organizations may enrich their metadata repositories but lack standardized ways to map metadata to real business use cases. This leads to data misinterpretation and redundancy as well as increases the chance of losing out insights.
[007] Inefficient Data Monetization and Privacy Concerns: Organizations may have valuable datasets however even after generating data products they cannot share or monetize dataset due to privacy laws (GDPR, HIPAA, and CCPA) and the risk of exposing sensitive customer information.
[008] Slow and Manual Data Preparation for AI/ML Models: Organizations spend huge amount of AI/ML project time in data preparation and feature engineering. This slows down model deployment and innovation.
[009] Data Mesh Implementation: Organizations transitioning to data mesh architectures struggle with identifying domain-specific data products, ensuring metadata standardization, and governing decentralized data ownership. Traditional approaches require manual effort from data stewards, delaying implementation.
[0010] One Solution for All Industries: Multiple industries/domains may have to depend upon multiple customized solutions as one solution does not fit the requirements of every industry.
[0011] Target Database Deduplication: During migration or data-product on-boarding to address business requirements, often many similar data products are on-boarded that makes entire database inefficient and cost intensive.
[0012] Therefore, to overcome the problems associated with the traditional system, there is need for a system to offer a Generative Artificial Intelligence (Gen AI) and Agentic AI driven approach to automate insight generation, metadata utilization, data product creation, data attribute generation, data model creation, DDL script creation and synthetic data generation
OBJECTS OF THE INVENTION
[0013] Primary objective of the present invention is to provide a Generative Artificial Intelligence (Gen AI) and Agentic AI based system to transforms the metadata into structured insights, automate data product discovery, use case identification, ddl scripts creation, data model development, synthetic data generation, and enables data monetization.
[0014] Another objective of the present invention is to the system that leverages advanced natural language processing capabilities combined with the LangChain and LangGraph Agentic AI framework along with LLM models, vector database and RAG approach to analyze enterprise data metadata and designed to accelerate data mesh implementation.
[0015] Yet another objective of the present invention is to generate the SQL query to execute the similar data products and attributes associated with the input metadata.
[0016] Another objective of the present invention is to automatically identify analytical use cases, generate relevant data products, and map attributes to organizations metadata by taking input from organizational metadata to showcase lineage and further use data products to generate attributes, DDL scripts and Data Model for AIdriven insights.
SUMMARY OF THE INVENTION
[0017] Before the present system is described, it is to be understood that this application is not limited to the particular machine, device, or system, as there can be multiple possible embodiments that are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is to describe the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce aspects related to a generative artificial-intelligence-based recommendation system and method, and the aspects are further elaborated below in the detailed description. This summary is not intended to identify essential features of the proposed subject matter nor is it intended for use in determining or limiting the scope of the proposed subject matter.
[0018] In an embodiment, the present invention provide a Generative Artificial Intelligence (Gen AI) and Agentic AI-based recommendation system. The system takes metadata (Subject area, table name and column name), dictionary as input via file upload or database connection. The system is tuned to offer domain centric, industry specific solution that caters to multiple verticals without need to change anything in the solution components. The system is integrated into enterprise databases and metadata catalogues, configured to identify and extract subject areas, entity/table names, attributes, relationships, and data types. The system uses a Large Language Model (LLM) to interpret the metadata passed to understand the business context w.r.t to industry best practices. The system utilizes a Retrieval-Augmented Generation (RAG) called as LangChain and LangGraph Agentic Framework, to process metadata embedding’s using a large language model. The system gives user the flexibility to selects relevant subject areas to chunk, and then the system creates embedding’s and stores them in a vector database. The system operates LLMs within the organization’s metadata scope, preventing hallucination. The user input a query into a Natural Language Interface and accordingly, the system retrieves metadata and recommends potential use cases, data products, and attribute from within the enterprise metadata. The system may also provides flexibility to use global knowledge in addition to the input metadata to provide use case, data products and attributes recommendations as per industries best practices.
[0019] In an embodiment, the system generates recommendations for analytical use cases based on metadata patterns. The system offers two workflows, a) it can generate use cases from the given metadata of organization and then any or all use cases can be selected to generate data products or b) it enables users for direct generation of attributes and data products for predefined use cases while ensuring business mapping according to the provided metadata. As part of the mapping, the system searches existing data base/ catalogue to identify if there is any similar existing data products are available and if available then the system provides attribute level mapping with its GenAI based intelligent search capabilities, that does the search with synonyms, intent, metadata context to bring out similar data products or data-assets which might already be present and hence helps to avoid duplication of the target database. Additionally, the generated data products also are mapped to the input metadata to provide lineage of data products and attributes. The system recommends structured data products based on metadata, suggests attributes for corresponding data products, and maps attributes back to the original metadata sources (entities, subject areas, schema, source system).
[0020] In an embodiment, the present invention provides a system for automated metadata analysis and data product generation. The system includes a storage device and a processing unit. The processing unit provides a user interface. This interface receives structured and unstructured data sources. The processing unit runs a metadata extraction module. This module extracts metadata and converts it into a structured format. It preserves relationships and contextual associations. The processing unit runs a chunking module. This module segments selected subject areas into data chunks. Segmentation is based on user input from the interface. The data chunks are converted into embeddings using a large language model (LLM). These embeddings represent metadata in a vectorised format. They retain relationships and context. The embeddings are stored in a vector database. The processing unit indexes the embeddings hierarchically. It supports incremental updates to handle evolving metadata. A retrieval-augmented generation (RAG) module retrieves relevant embeddings. Retrieval is based on natural language queries from the user interface. The LLM processes the retrieved embeddings. It generates recommendations for analytical use cases, data products, or attributes. An output generation module produces structured outputs. It also generates privacy-compliant datasets. These datasets preserve statistical properties of the original data sources.
[0021] In an embodiment, the present invention provides that the user interface is configured to support input of both structured and unstructured data sources. Structured data sources include relational databases. Unstructured data sources include natural language queries, text files, and documents. The interface supports input from multiple user devices. These devices include computers, mobile devices, and IoT devices.
[0022] In yet another embodiment, the present invention provides that the vector database is a FAISS-based index. It is coupled with the storage device. The processing unit stores high-dimensional metadata embeddings. These embeddings enable efficient similarity search.
[0023] In still another embodiment, the present invention provides that the RAG module uses the LangChain agentic framework. This framework enhances contextual understanding of metadata queries.
[0024] In an embodiment, the present invention provides that the processing unit, when executing the metadata extraction module, connects to structured and unstructured data sources. Structured sources include relational database management systems. Unstructured sources include file uploads. The connection is made via a connector framework. The processing unit extracts schema information. It enriches metadata with domain-specific contextual associations.
[0025] In yet another embodiment, the present invention provides that the processing unit, when executing the chunking module, performs automated clustering. Clustering is based on schema relationships and historical usage patterns. It groups related metadata entities.
[0026] In still another embodiment, the present invention provides that the processing unit, when managing the vector database, implements a custom schema. This schema stores embeddings with metadata relationships and contextual associations. The processing unit uses hybrid search techniques. These combine vector similarity and relationship-based retrieval. It also supports incremental updates. Embeddings can be added or modified without rebuilding the database.
[0027] In an embodiment, the present invention provides that the processing unit, when executing the RAG module using the LangChain agentic framework, employs a supervisor agent. This agent interprets natural language queries. It coordinates tasks among co-worker agents. These agents include a use case generator, a data product generator, and an attribute generator. The system performs semantic search to retrieve relevant embeddings based on query intent. The LLM validates and refines recommendations. This ensures contextual accuracy and relevance.
[0028] In an embodiment, the present invention provides a method for automated metadata analysis and data product generation. The method begins by receiving structured and unstructured data sources. Data is received via a user interface provided by a processing unit. The processing unit extracts metadata using a metadata extraction module. The metadata is transformed into a structured format. Relationships and contextual associations are preserved. The processing unit segments selected subject areas into data chunks. Segmentation is done using a chunking module. It is based on user input from the interface. The data chunks are converted into embeddings using a large language model (LLM). The embeddings represent metadata in a vectorised format. They retain relationships and context. The embeddings are stored in a vector database. The processing unit indexes them hierarchically. It supports incremental updates to handle evolving metadata. A retrieval-augmented generation (RAG) module retrieves relevant embeddings. Retrieval is based on natural language queries from the user interface. The LLM processes the retrieved embeddings. It generates recommendations for analytical use cases, data products, or attributes. An output generation module produces structured outputs. It also generates privacy-compliant datasets. These datasets preserve statistical properties of the original data sources.

BRIEF DESCRIPTION OF DRAWING
[0029] The foregoing summary, as well as the following detailed description of embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, there is shown in the present document example constructions of the disclosure, however, the disclosure is not limited to the specific methods and device disclosed in the document and the drawing. The detailed description is described with reference to the following accompanying figures.
[0030] Figure 1: illustrates a network implementation of a system, in accordance with an embodiment of the present subject matter.
[0031] Figure 2: illustrates the block diagram of the Gen AI and agentic AI-based system, in accordance with an embodiment of the present subject matter.
[0032] Figure 3: illustrates the architecture of the Gen AI and agentic AI-based system, in accordance with an embodiment of the present subject matter.
[0033] Figure 4: illustrates a flow diagram for the data ingestion and response generation process, in accordance with an embodiment of the present subject matter.
[0034] The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
[0035] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising", “having”, and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any devices and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, devices and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0036] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein.
[0037] Following is a list of elements and reference numerals used to explain various embodiments of the present subject matter.
Reference Numeral Element Description
100 Network implementation of a system
102 System
106 Network
202 Processing Unit
206 Storage Device
302 User interface
304 Pre-processing
306 Metadata/Dictionary
308 Chunking
310 Embedding
312 Vector Database (DB)
314 Processing
316 Lang chain agentic framework
318 Large Language Model (LLM)
320 Retrieval-Augmented Generation (RAG)
322 Generative Artificial Intelligence (Gen AI)
324 Output
402 Data Ingestion
404 Response generation/Output generation
500 Method
[0038] Referring now to Figure 1, a network implementation (100) of a system (102) is illustrated. The system (102) is being implemented on a server, it can be understood that it may also operate on various computing systems, such as laptops, desktops, notebooks, workstations, mainframes, or within cloud-based environments. Multiple users can access the system (102) through various user devices (104-1) (104-2) (104-3) (104-4), collectively referred to as users or stakeholders, which may include IoT devices, IoT gateways, portable computers, personal digital assistants, handheld devices, and workstations, all communicatively coupled to the system (102) through a network (106). This network (106) may be wireless, wired, or a combination of both, and can take the form of intranets, local area networks (LAN), wide area networks (WAN), or the internet, utilizing various protocols such as HTTP, HTTPS, and TCP/IP. Furthermore, the network (106) may comprise a range of devices, including routers, bridges, servers, and storage devices.
[0039] In accordance with an embodiment illustrated in Figure 2, a block diagram (200) of the Generative Artificial Intelligence based recommendation system (102) is illustrated. The system (102) includes at least one processing circuitry or processing unit (202), an user interface (302), and a storage device (206). The processing unit (202), which may consist of microprocessors, microcontrollers, or digital signal processors, is responsible for executing computer-readable instructions stored in the storage device (206). The user interface (302) facilitates interaction with users and communication with other computing devices, supporting multiple network types and protocols. The storage device (206) encompasses various computer-readable media, including volatile and non-volatile memory, and contains a plurality of modules including Generative-Artificial Intelligence (Gen AI) (322), Large Language Model (LLM) (318) and chunking (308), embedding (310), long chain agentic framework, data model, and vector database (DB) (312). The plurality of modules comprises routines and programs that execute specific tasks, and the Vector database (312) serves as a storage space for processed, received, and generated data, including data associated with the invention. The system (102) is builds on container technologies so the system (102) is portable on any cloud platform.
[0040] The system (102) comprises plurality of components and modules, including Generative-Artificial Intelligence (Gen AI) (322), Large Language Model
(LLM) (318) and chunking (308), embedding (310), and vector database (DB) (312). The system (102) takes metadata/dictionary (306) as input via file upload or database connection. The system (102) is integrated into enterprise databases and metadata catalogues, configured to identify and extract subject areas, entity/table names, attributes, relationships, and data types. The system (102) uses a Large Language Model (LLM) (318) to interpret the metadata (306) passed to understand the business context w.r.t to industry best practices. The system (102) utilizes a Retrieval-Augmented Generation (RAG) (320) called as Lang-chain Agentic Framework (316), to process metadata embedding’s (310) using a large language model.
[0041] The system (102) provide the flexibility to the user to selects relevant subject areas to chunk. The system is configured to perform chunking
(308) and then the system (102) create plurality of embedding’s (310) and stores them in a vector database (312). In an embodiment, the vector database (312) is used to store the plurality of embedding’s (310), metadata repository, generated assets store (for SQL, sample data, etc.).
[0042] The system (102) operates LLMs (318) within the organization’s metadata scope, preventing hallucination. The user input a query into a Natural Language Interface and accordingly, the system (102) retrieves metadata (306) and recommends potential use cases, data products, and attribute from within the enterprise metadata. In an embodiment, the natural language interface may be referred to as the user interface (UI) (302). The system also provides flexibility to use global knowledge in addition to the input metadata (306) to provide use case, data products and attributes recommendations as per industries best practices.
[0043] The system (102) generates recommendations for analytical use cases based on metadata (306) patterns. It enables direct generation of attributes and data products for predefined use cases while ensuring business mapping according to the provided metadata (306). The system (102) recommends structured data products based on metadata (306), suggests attributes for corresponding data products, and maps attributes back to the original metadata (306) sources (entities and subject areas).
[0044] The system (102) finally generates an output (324) of structured tables for data products, attributes, and use cases by protecting sensitive information. Additionally, the system (102) generates executable SQL Queries for execution and data extraction, Mapping Reports, and Synthetic Datasets for model training and data monetization.
[0045] In another embodiment, a Figure 3 illustrates the architecture (300) of the Gen AI and agentic AI-based system (102). The architecture comprises a user interface (UI) (302) for the users to interact with the system (102). This UI (302) may be used by different users depending on their use. The system (102) provides ability to all the users involved in the AI use case implementation end-to-end.
[0046] The system (102) comprises a modern, modular architecture including ingestion, pre-processing (304), and interaction components. The ingestion layer comprises plurality of metadata connectors for various database systems, a schema extraction service, relationship mapping engine, and embedding generation module. The core intelligence layer comprises a vector database (312) (for metadata embedding’s storage), lang chain framework (316) integration, Gen AI model (322) integration, context management system (102), and query generation engine.
[0047] In an embodiment the interaction layer of the system (102) comprises a web-based user interface (302), subject area navigator, use case discovery panel, data product designer, SQL generation service, sample data generator, and conversational interface with natural language processing.
[0048] In an embodiment, the storage layer (may be referred to as storage device (206)) comprises a vector database (312) to store the plurality of embedding’s (310), metadata repository, generated assets store (for SQL, sample data, etc.). The plurality of implementation steps to build the architecture are illustrated below in detail.
[0049] Step 1: metadata (306) ingestion and pre-processing (304): The system (102) is configured to connect with enterprise databases/catalogues, extracts subject areas, entity/table names, attributes and their definitions, relationships, and data types. The system (102) uses LLM-based parsing to understand business context from metadata (306). A generative AI-powered, domain-centric (applicable across industries) system (102) takes metadata (306) as input via file upload or database connection. The system (102) is integrated into enterprise databases and metadata catalogues, configured to identify and extract subject areas, entity/table names, attributes, relationships, and data types.
[0050] Step 2: RAG (320)-based approach using lang-chain agentic framework (316): The user may select the subject areas to chunk. Further, the system (102) may create plurality of embedding’s (310). Further, the system (102) may load vector database (DB) (312) with selected subject areas and corresponding entities and attributes. The system (102) may use contextualized LLMs (318) global knowledge to restrict it the local metadata (306) (of the Organization) used to create Vector DB (312). The system (102) may provide a Natural language interface (302) to generate use cases or data products or attributes in any order.
[0051] The system (102) uses a Large Language Model (LLM) (318) to interpret the metadata (306) passed to understand the business context w.r.t to industry best practices. The system (102) utilizes a Retrieval-Augmented Generation (RAG) (320) called as Lang-chain Agentic Framework (316), to process metadata embedding’s (310) using a large language model. It gives user the flexibility to selects relevant subject areas to chunk, and then the system (102) creates he plurality of embedding’s (310) and stores them in a vector database (312).
[0052] Step 3: Analytical Use Case Recommendation: The system (102) may recommend potential business use cases based on the metadata (306) structure. The system (102) may directly generate data products or attributes if the use case is available in the organization metadata (306). The system (102) is configured to map business goals/use cases to relevant datasets and attributes. The system (102) may provide explainability for why a specific use case is suggested. The system (102) operates LLMs (318) within the organization’s metadata scope, preventing hallucination. The user input a query into a Natural Language Interface (302) and accordingly, the system (102) retrieves metadata (306) and recommends potential use cases, data products, and attribute from within the enterprise metadata. It also provides flexibility to use global knowledge in addition to the input metadata (306) to provide use case, data products and attributes recommendations as per industries best practices.
[0053] Step 4: Data Product Generation: The system (102) may suggests structured data products based on metadata (306) for the corresponding use case. The system (102) may generates attributes for corresponding data products. The system (102) may provide traceability of attributes to input metadata (306) (entities and subject area).
[0054] The system (102) generates recommendations for analytical use cases based on metadata patterns. It enables direct generation of attributes and data products for predefined use cases while ensuring business mapping according to the provided metadata (306). As part of the mapping, the system (102) searches existing data base/ catalogue to identify if there is any similar existing data products are available and if available then it provides attribute level mapping. Additionally, the generated data products also get mapped to the input metadata (306) to provide lineage of data products and attributes. The system (102) recommends structured data products based on metadata (306), suggests attributes for corresponding data products, and maps attributes back to the original metadata (306) sources (entities and subject areas).
[0055] The system (102) finally generates an output (324) of structured tables for data products, attributes, and use cases by protecting sensitive information. Additionally, the system (102) generates executable SQL Queries for execution and data extraction, Mapping Reports, and Synthetic Datasets for model training and data monetization.
[0056] Step 5: Synthetic Data Creation: The system (102) may generates sample/synthetic data in case of restriction on the sharing of the real data. The system (102) may ensures data utility while preserving privacy. The system (102) may provide AI-ready datasets for training models or marketplace publishing.
[0057] Step 6: Output (324) generation: The system (102) may generate output (324) in tabular format showing the insights (structured tables for data products and attributes, corresponding to the use cases). The system (102) may generate the output (324) of SQL queries for easy execution. The system (102) may generate the output (324) of mapping report. The synthetic datasets generated by the system (102) is used for Machine Learning (ML) Model training and data monetization.
[0058] In an embodiment, the system (102) is configured to receive input parameters from the user interface (302). The input parameters include metadata Inputs (306), Natural Language Queries, Subject area selection according to the use case preferences. In an embodiment, the metadata input (306) may be a Database Name, Table Name, Column Names (Subject Area, Entity name, and Attribute name), Column Data Types and Relationships (Foreign Keys, Joins), Business Metadata (if available). In an embodiment, the Natural Language Queries may be (For example- What are the data products required to analyze sales performance in USA and India across all channels?). In an embodiment, the subject area selection for use case preferences may be sales, customer retention, fraud detection and alike.
[0059] In an embodiment, the system (102) is configured to generate plurality of outputs (324) including the recommended analytical use cases, generated data products with mapped attributes, business metadata mapping for tables and columns, auto-generated SQL Queries for exploration, AI-generated synthetic data, and data products with provision to select attributes and save in the database.
[0060] In another embodiment, a Figure 4 illustrates a flow diagram (400) for the data ingestion (402) and response generation (404) process is illustrated. The system (102) may follow a structured pattern that enables transition from metadata (306) to actionable insights. The system (102) includes metadata (306) ingestion, vectorization, storage, intelligence processing, and response generation (404). In an embodiment, the response generation (404) may be further referred to as the output generation (404).
[0061] The metadata (306) ingestion process is illustrated further in detail. The database metadata is extracted through connectors. The system (102) identifies the relationships and constraints associated with the metadata. Further, the metadata (306) is transformed into a structured format, and Context enrichment with business glossary terms (if available). This helps the system to understand the Primary Key and Foreign Key relationships, datatypes, hierarchies of subject area, entities and attributes present in the metadata to co-relate, analyse and recommend potential use cases, data products and attributes.
[0062] In an embodiment, the vectorization and storage process is illustrated further in detail. The metadata (306) elements are converted to the plurality of embedding’s (310) using the LLM (318). Further, the Embedding’s are stored in a vector database (312) with relationship information, and contextual associations are preserved in the vector space.
[0063] In an embodiment, the intelligence processing of the system (102) is illustrated in further detail. The user selects a subject area of interest accordingly, the system (102) retrieves relevant metadata vectors. The LLM (318) and RAG (320) analyze the metadata context, and then the system (102) generates recommendations including use cases, data products, and attributes related to the input metadata (306).
[0064] The output generation (404) of the system (102) is illustrated in further detail. The system (102) generates executable SQL Queries for execution and data extraction, Mapping Reports, and Synthetic Datasets for model training and data monetization. The system (102) construct the SQL query construction based on selected attributes. In an embodiment, the system (102) provide sample data generation through LLM (318). The system (102) provide the export and download capabilities for generated assets.
[0065] In an embodiment, the metadata (306) ingestion and pre-processing (304) enables the system (102) to extract, analyze, and vectorise database metadata (306), forming all subsequent intelligence operations. The technical implementation of the metadata (306) ingestion process is illustrated further. The comprehensive connector framework is configured to support for major database systems or file uploads, custom connector SDK for proprietary systems, change detection for incremental updates.
[0066] The system (102) is configured to perform the metadata (306) extraction. The system (102) is configured to extract schema, table, and column identification, data type analysis and classification, primary and foreign key relationship mapping, and constraint identification and interpretation.
[0067] The system (102) support the semantic enhancement. The system (102) integrates with business glossaries. The system (102) is configured to perform the column name normalization and classification, entity recognition for domain-specific concepts, relationship strength assessment, and context interpretation through LLM (318) analysis.
[0068] The system (102) is further configured to perform the embedding (310) generation. The system (102) creates custom prompt engineering for metadata contextualization. The system (102) uses the generative AI powered embedding (310) creation, relationship-preserving vectorization techniques, incremental update mechanisms for vector database (312), and context window optimization for large metadata sets.
[0069] In an embodiment, the feature of the subject area selection of the metadata (306) is illustrated further. This feature allow the users to focus on the exploration on specific domains or segments of their data landscape, ensuring that the relevant and targeted recommendations being generated by the system (102).
[0070] In an embodiment, the technical implementation of the subject area selection feature of the system (102) is illustrated further. In an embodiment, the system (102) is configured to perform the subject area identification of the metadata (306). The system (102) is configured to perform the automated clustering of related data entities, schema-based grouping mechanisms, business glossary integration for domain alignment, user-defined custom groupings, and Historical usage pattern analysis.
[0071] In an embodiment, the system (102) may provide an interactive selection interface. The system (102) provide visual representation of data domains, hierarchical navigation capabilities, search, and filter functionality, and contextual information display.
[0072] In an embodiment, the system (102) is configured to perform the session context management. The system (102) may preserve the selected context across functional areas, subject area combination capabilities, and context switching with history tracking.
[0073] In an embodiment, the system (102) is configured to perform the use case generation. This feature analyses metadata (306) patterns to identify and suggest valuable business applications for the selected data domains. The technical implementation of the use case generation feature of the system (102) is illustrated further.
[0074] In an embodiment, the system (102) comprises a pattern recognition engine to identify the common data structures and perform the domain-specific pattern matching. The system (102) further comprises a use case recommendation logic. The RAG (320)-powered relevance ranking technique configure to provide industry-specific use case recommendations, custom prompt construction, and LLM (318)-based use case narrative generation.
[0075] In an embodiment, the system (102) may provide user interaction mechanisms. The system (102) is configured to provide one-click generation of use case recommendations, interactive refinement capabilities, "Add more" functionality with context preservation, use case categorization and filtering, and detailed explanation of recommendations with metadata mapping.
[0076] In an embodiment, the system (102) may provide the data product generation. This feature of the system (102) may transforms identified use cases into concrete data product recommendations that may be implemented to address specific business needs. The technical implementation of the data product generation feature of the system (102) is illustrated further.
[0077] In an embodiment, the system (102) may comprise the data product identification process. The system (102) may support use case to data product mapping framework, required entity identification, data transformation recommendation, aggregation level determination, and implementation complexity assessment.
[0078] In an embodiment, the system (102) is configured to perform the relevance optimization. The system (102) comprises a context-aware ranking technique and performs the relationship depth analysis. In an embodiment, the interactive selection interface provides the visual data product exploration, dependency visualization, implementation requirement details, alternative approach suggestions, and "Add more" capability with context.
[0079] In an embodiment, the system (102) may be configured to provide attribute and SQL Query generation. This feature of the system (102) may identify the specific data attributes required to build the selected data products and automatically generate the necessary SQL queries for implementation. The technical implementation of the attribute and SQL generation feature of the system (102) is illustrated further. The attribute identification process is configured to perform the data product requirement analysis, critical attribute detection, optional attribute recommendation, and cross-entity relationship traversal.
[0080] In an embodiment, the system (102) may comprise the SQL Generation Engine. The SQL generation engine is configured to perform the schema-aware query construction, join path optimization, performance consideration for complex queries, database dialect adaptation, and error handling and validation.
[0081] In an embodiment, the system (102) is configured to perform the sample data generation. This feature of the system (102) may create realistic sample datasets based on the selected attributes and data products, enabling rapid prototyping and validation. The technical implementation of the data product generation feature of the system (102) is illustrated further.
[0082] In an embodiment, the sample data generation is configured to perform the schema analysis for data type constraints, referential integrity preservation, domain-specific value generation, and relationship consistency.
[0083] In an embodiment, the LLM (318)-powered generation feature of the system (102) is configured to perform the context-aware sample data creation, industry-specific value patterns, realistic data variation, edge case inclusion, and custom prompt engineering for data authenticity.
[0084] In an embodiment, the system (102) may provide the export and format options. The system (102) may provide multiple format support (CSV, Excel), download capabilities, direct database insertion option, sampling strategy selection, and volume control (30-40 records default).
[0085] In an embodiment, the system (102) may provide the conversational Interface. In an embodiment, the conversational interface (302) may be referred to as the user interface (302). This interface (302) provides an alternative, natural language interaction mode where users can directly express their requirements and receive guidance throughout the data discovery process.
[0086] Technical Implementation of the conversational interface (302) is illustrated further. The system (102) may comprises the feature of a Natural Language Understanding. The system (102) is configured to perform the intent recognition for data-related queries, entity extraction for database elements, context preservation across conversation turns, ambiguity resolution through clarification, and domain-specific terminology comprehension.
[0087] In an embodiment, the system (102) may configured to provide a conversational flow management. The conversational flow management perform the sequential and non-sequential interaction support, contextual response generation (404), conversation history maintenance, and multi-turn reasoning capabilities. The system uses the data products and attributes to generate DDL scripts and data model as required for data-warehouse design and development
[0088] In an embodiment, the system (102) provide direct query resolution feature. The system (102) is configured to perform the immediate attribute recommendation, data product suggestion from requirements, SQL generation from natural language, and explanation generation for recommendations.
[0089] In some embodiment, the system (102) may simplify the data mesh adoption by automatically recommending domain-specific data products, ensuring metadata (306) consistency, interoperability, and self-service analytics. Additionally, the system (102) may facilitate data monetization by creating synthetic, privacy-compliant data products. The system (102) allows businesses to securely share structured datasets on data marketplaces for commercial use. The system (102) may reduce data preparation time by suggesting relevant attributes and generating training/synthetic data, accelerating AI/ ML model development by ensuring compliance with data privacy regulations.
[0090] In an embodiment, the system (102) embeds natural language capabilities for discovering existing data products/assets to de-duplicate the target database and leverage an existing set of data products for any business use.
[0091] In an embodiment, the system (102) may bridge the gap between metadata (306) and business insights. The system (102) may empower enterprises to rapidly implement decentralized data governance, unlock new revenue streams, and drive AI innovation. The system (102) may be transforming data into a strategic asset, by leveraging global knowledge of LLM (318). The LLM (318) contextualized on local knowledge of organizations metadata (306) and the system (102) is enabling data driven process for generating insights.
[0092] In an embodiment, the system (102) acts as an AI-powered assistant configured to eliminate manual data discovery, mapping, and structuring efforts. The system (102) automates the process of generating use cases, data products, and synthetic datasets. The system (102) further helps organizations maximize the value of their metadata (306) and increase new business opportunities.
[0093] In an embodiment, the system (102) may provide significant advancement in the data analytics space by combining the Agentic Framework along with Retrieval Augmented Generation (RAG) (320), Large Language Models (LLMs) (318), and vector database (312) technology to transform organizations to discover, analyze, and operationalize their data assets. This system (102) addresses critical challenges in data discovery, use case identification, and data product creation by providing an intelligent framework that understands the relationships between data entities and their potential applications across various business domains.
[0094] In an embodiment, the system (102) is a Generative AI (322)-powered, domain-centric (multi-domain) solution (that leverages advanced natural language processing capabilities combined with the LangChain framework (316) and vector database (312) to analyze enterprise data metadata (306)) designed to accelerate data mesh implementation, automate data product creation, discovery, and enable data monetization with synthetic data. The system (102) helps businesses to automatically identify analytical use cases, generate relevant data products, and map attributes to Organizations' metadata (306) by taking input from organizational metadata and AI-driven insights. The system (102) may eliminate manual effort and ensure a structured, scalable data strategy.
[0095] The system (102) is a Gen AI (322)-powered data intelligence platform that leverages Generative AI model (322), machine learning, and metadata pre-processing (304) to automate the identification of analytical use cases, generation of data products, mapping of attributes to metadata, and creation of synthetic data. The system (102) is designed to accelerate data mesh implementation, facilitate self-service analytics, and enable data monetization.
[0096] The system (102) is configured to provide other features. However, the system (102) may be generating the DDL scripts to execute the synthetic data. The system (102) also generating the data model by using an AGENTIC framework.
[0097] The system is a GenAI and Agentic AI-based system. The system is configured to receive organizations metadata as the input (namely, Subject Area, Table Name, Column Names) in order to chunk, embed the metadata input into the vector database. The proposed system is the domain centric solution which is applicable for all the industries to obtain potential use cases that may be analysed based upon the metadata associated with the organizations. The system may also suggest the kind of data products required to enable those use cases, further it generates attributes that are required to be captured to develop those data products, and the system may provide lineage of particular attributes to show from where these attributes are retrieved by the system. Additionally, the system may generate use case-based DDL scripts, Data Model, and generate synthetic data (if required) or leverage the organizations existing data corresponding to the data products generated by the system, to provide Natural Language-based actionable insights.
[0098] In an embodiment, the system may follow the agentic workflow to perform the plurality steps to obtain desired output. The agentic workflow may comprise a supervisor agent (may be referred to as the Data Product Manager). The supervisor agent is built with co-worker agents (Use case generator, Data Product Generator, Data Attribute Generator, DDL Script Generator, Data Modeler, Synthetic Data Generator, Data Ingestion, Insights generation). The supervisor agent built in LangGraph understands the natural language prompt given by the user and trigger the right agent depending upon the intent and metadata extracted from user query or prompt. As per one example, if the user has asked to generate data products directly, then supervisor agents triggers data product generator agent. As per another example, if the user asks to generate data attributes, ddl scripts and data model all in together, then the supervisor agent triggers the data attribute generator agent first, then the supervisor agent passes the output of data product generator agent to the ddl script generator agent and the output of ddl script generator agent is fed to the data model generator agent and finally all the output are consolidated by supervisor agent to pass on to the end user in the structured format. The system uses reasoning model that also does the validation and reasoning of output of all the agents so that before the final output is shared with end user, everything is validated and regenerated if required to get the right response.
[0099] In an embodiment, the system may uses RAG and LLM with the reasoning capabilities to provide an explanation about each stage output with proper reasoning and feed the output as a next step as input for the next agent. The system may provide two provision, either to generate output by using the organization metadata and contextualizing LLM’s global knowledge by applying it on input data or metadata. The second provision may help to augment LLM’s global knowledge in conjunction with the metadata so that industry standard use cases, data products may be generated by the system.
[00100] In an embodiment, the metadata is loaded on vector DB, then the retrieval engine is configured to analyse user input (given in the form of natural language) which may a use case or a data product etc., to correlate the vectors present in the Vector DB. The system may perform the semantic search, retrieves relevant information. The retrieved information passed to the LLM model for reasoning and evaluation. Finally, the system generate the response and give it back to the user in Natural language. In an embodiment, the user Natural language queries are also converted into vectors and matched and correlated with the vectors present in the Vector DB (consisting of metadata), to retrieve most contextual answers, refine them with LLM model and LLM’s reasoning capabilities to generate the answers.
[00101] In an embodiment, the system may delivers multiple benefits to customers spread across multiple Industries which are listed below.
[00102] In an embodiment, the complexity and volume of data models keep on increasing as soon as the organizations start growing, merging, and acquiring. The system slow down and became a time consuming process where a domain SME, Data Scientist, Business Analyst, Source System Analyst ideate what are the analytical use cases generated (like network performance and optimization, dynamic pricing, quality of service, predicting right campaigns etc.). Once the use cases are identified, manual efforts are needed to derive data products, map the data required, create data models, and generate insights. The system may automate the plurality of steps by using series of agents. The series of agents working as humans to retrieve the results for necessary validations by reducing manual workload.
[00103] In an embodiment, the system may be used for data modernization. The system may uses the data mesh architecture that is built on the framework of data products that are generated across various data domains.
[00104] In an embodiment, the system may be used in data monetization by hosting data products on the data marketplace. The system may have the feature of synthetic data generation which ensures that the data products retains organization structure by avoiding PII/data privacy and compliance issues.
[00105] In an embodiment, the system (102) provide automated metadata ingestion and vectorization that preserves relationships between data entities.
[00106] In an embodiment, the system (102) may provide automated use case discovery. The system (102) may provide Gen AI (322)-driven use case identification based on industry contexts and data patterns.
[00107] In an embodiment, the system (102) may provide the data product generation. The system (102) may recommends structured data products with mapped attributes and metadata alignment.
[00108] In an embodiment, the system (102) may provide the attribute and metadata mapping. The system (102) may links business-friendly names to technical column names across databases.
[00109] In an embodiment, the system (102) may provide synthetic data generation. The system (102) may produces privacy-preserving synthetic data products for AI training and monetization.
[00110] In an embodiment, the system (102) uses the Natural Language Interface (302) that enables business users to interact with data using conversational AI.
[00111] In an embodiment, the system (102) may provide rationale and explanations for generated use cases, product recommendations, and metadata mappings.
[00112] In an embodiment, the system (102) may provide data Mesh and Marketplace Support to facilitate the decentralized data ownership and generates data products that may be published on to data marketplaces.
[00113] In an embodiment, the system (102) may provide multi-domain support to enables domain centric use cases, data products and attributes generation for multiple industries without the need of Customization.
[00114] In an embodiment, the system (102) may provide the de-duplication of database. The system (102) uses NLP capability to search and discover data products, which further helps to avoid on-boarding an additional data products if a similar product already exists.
[00115] In an embodiment, the system (102) offer a unique data and analytics with Generative AI-driven automation, and contextual intelligence. The system (102) provide plurality of features like:
[00116] In an embodiment the system (102) may provide a structured Metadata Vectorization. The system (102) may provide custom embedding (310) techniques for schema elements, Relationship-preserving vector representations, Hierarchical metadata encoding, Context window optimization for database structures, and incremental update mechanism for evolving schemas.
[00117] In an embodiment, the system (102) may provide a metadata-aware prompt Engineering. The system (102) may provide dynamic prompt construction incorporating schema elements, relationship context inclusion in prompts, a technical-to-business translation layer, domain-specific instruction tuning, and context compression techniques for large schemas.
[00118] In an embodiment, the system (102) may provide a hybrid Retrieval mechanism. The system (102) may combine the vector similarity, relationship-based retrieval, contextual relevance, and temporal awareness for schema evolution.
[00119] In an embodiment, the system (102) may provide a pattern recognition system (102). The system (102) is configured to identify common data structures across domains, schema pattern matching against known use cases, relationship complexity analysis, data completeness assessment for use case viability, and implementation requirement determination.
[00120] In an embodiment, the system (102) may provide a schema-aware query construction. The system (102) provides the features of database structure understanding, optimal join path determination, appropriate aggregation level selection, performance consideration for complex queries, and error prevention through validation.
[00121] In an embodiment, the system (102) may utilise a LangChain Framework (316). The langchain framework (316) is used for custom chain development for metadata analysis, memory mechanisms for context preservation, tool integration for database interaction, prompt template management for consistent outputs (324), and the Gen AI model (322). The Gen AI model Configuration is illustrated further.
[00122] In an embodiment, the system (102) may provide fine-tuned prompts for database domains, temperature adjustment for different functional areas, context window optimization techniques, and token usage efficiency measures.
[00123] In an embodiment, the system (102) may comprise an Embedding Storage Architecture. The architecture provides a custom schema for metadata representation, hierarchical indexing strategies, performance optimization for retrieval operations, and incremental update support.
[00124] In an embodiment, the system (102) may provide retrieval mechanisms. The retrieval mechanism provides a hybrid search combining vector similarity and relationships, context-aware filtering, and multi-dimensional query support.
[00125] In an embodiment, the system (102) may provide a progressive disclosure pattern. The system (102) provides step-by-step guidance through the discovery process, contextual information display at each stage, detailed expansion on demand, visual relationship representation, and intuitive navigation between functional areas.
[00126] In an embodiment, the plurality of features of the system (102) is used in different industries for different purposes.
[00127] In some embodiment, the system (102) may provide AI-driven use case discovery. The system (102) may use Generative AI (322) to automatically identify analytical use cases from metadata, and eliminating manual exploration.
[00128] In some embodiment, the system (102) may provide automated data product structuring for Data Mesh. The system (102) may recommend and structure domain-specific data products to accelerate data mesh implementation.
[00129] In an embodiment, the system (102) may provide AI-powered attribute suggestions. The system (102) may dynamically suggests KPIs and relevant attributes based on table structures.
[00130] In an embodiment, the system (102) may provide secure monetization with synthetic data. The system (102) may create privacy-compliant synthetic datasets for AI training and marketplace sharing.
[00131] In an embodiment, the system (102) may provide conversational AI for Data Exploration. The system (102) may enable natural language querying, eliminating the need for SQL or technical expertise.
[00132] In an embodiment, the system (102) may provide fully automated insight generation, reduce manual efforts in data discovery, metadata mapping, and use case identification.
[00133] In an embodiment, the system (102) may provide AI-Powered contextualization where the system (102) uses the Agentic framework of Langchain and Langraph (LLMs (318) and RAG (320) approach) to understand business data relationships. The LangGraph agentic framework uses AI agent, LLM model, storage medium and tools. These components are used by all the co-worker agents working with supervisor agents to perform their task individually, like generating use case, data products, attributes, ddl scripts, data model, synthetic data, searching for similar data products and all these are performed by the worker agents in supervision of the supervisor agent.
[00134] In an embodiment, the system (102) may enable Data Mesh and Marketplace Publishing to accelerate enterprise-wide data decentralization & monetization.
[00135] In an embodiment, the system (102) may generate the SQL query to execute the corresponding data products and attributes.
[00136] In an embodiment, the system (102) works with metadata of any industry to generate domain centric use cases and data products without the need to update the back-end.
[00137] In an embodiment, the system (102) may work with any LLM (318) (commercial or Open-source).
[00138] In an embodiment, the system (102) may provide the data product discovery and de-duplication. The system (102) may not only generates use cases and data products. However, the system (102) also has capability to do discover existing similar data products (1:1 or 1:M tables) that helps to reduce the data-asset on-boarding time and also de-duplicate the target database to enhance efficiency and reduce maintenance cost.
[00139] In an embodiment, the system (102) uses Gen AI (322) to transform metadata into actionable insights. The system (102) may accelerates data-driven decision-making, fosters AI innovation, enables data mesh adoption, and unlocks new revenue streams through synthetic data monetization, while ensuring compliance and data security.
[00140] In an embodiment, the present invention provide a Generative Artificial Intelligence (Gen AI) and Agentic AI-based recommendation system. The system takes metadata (Subject area, table name and column name), dictionary as input via file upload or database connection. The system is tuned to offer domain centric, industry specific solution that caters to multiple verticals without need to change anything in the solution components. The system is integrated into enterprise databases and metadata catalogues, configured to identify and extract subject areas, entity/table names, attributes, relationships, and data types. The system uses a Large Language Model (LLM) to interpret the metadata passed to understand the business context w.r.t to industry best practices. The system utilizes a Retrieval-Augmented Generation (RAG) called as LangChain and LangGraph Agentic Framework, to process metadata embedding’s using a large language model. The system gives user the flexibility to selects relevant subject areas to chunk, and then the system creates embedding’s and stores them in a vector database. The system operates LLMs within the organization’s metadata scope, preventing hallucination. The user input a query into a Natural Language Interface and accordingly, the system retrieves metadata and recommends potential use cases, data products, and attribute from within the enterprise metadata. The system may also provides flexibility to use global knowledge in addition to the input metadata to provide use case, data products and attributes recommendations as per industries best practices.
[00141] In an embodiment, the system generates recommendations for analytical use cases based on metadata patterns. The system offers two workflows, a) it can generate use cases from the given metadata of organization and then any or all use cases can be selected to generate data products or b) it enables users for direct generation of attributes and data products for predefined use cases while ensuring business mapping according to the provided metadata. As part of the mapping, the system searches existing data base/ catalogue to identify if there is any similar existing data products are available and if available then the system provides attribute level mapping with its GenAI based intelligent search capabilities, that does the search with synonyms, intent, metadata context to bring out similar data products or data-assets which might already be present and hence helps to avoid duplication of the target database. Additionally, the generated data products also are mapped to the input metadata to provide lineage of data products and attributes. The system recommends structured data products based on metadata, suggests attributes for corresponding data products, and maps attributes back to the original metadata sources (entities, subject areas, schema, source system).
[00142] In an embodiment, the present invention provides a system (102) for automated metadata analysis and data product generation. The system includes a storage device and a processing unit (202). The processing unit provides a user interface (302). This interface receives structured and unstructured data sources. The processing unit runs a metadata extraction module (306). This module extracts metadata and converts it into a structured format. It preserves relationships and contextual associations. The processing unit runs a chunking module (308). This module segments selected subject areas into data chunks. Segmentation is based on user input from the interface. The data chunks are converted into embeddings using a large language model (LLM) (318). These embeddings represent metadata in a vectorised format. They retain relationships and context. The embeddings are stored in a vector database (312). The processing unit indexes the embeddings hierarchically. It supports incremental updates to handle evolving metadata. A retrieval-augmented generation (RAG) module (320) retrieves relevant embeddings. Retrieval is based on natural language queries from the user interface. The LLM processes the retrieved embeddings. It generates recommendations for analytical use cases, data products, or attributes. An output generation module (404) produces structured outputs (324). It also generates privacy-compliant datasets. These datasets preserve statistical properties of the original data sources.
[00143] In an embodiment, the present invention provides that the user interface (302) is configured to support input of both structured and unstructured data sources. Structured data sources include relational databases. Unstructured data sources include natural language queries, text files, and documents. The interface supports input from multiple user devices. These devices include computers, mobile devices, and IoT devices.
[00144] In an embodiment, the present invention provides that the vector database (312) is a FAISS-based index. It is coupled with the storage device. The processing unit (202) stores high-dimensional metadata embeddings. These embeddings enable efficient similarity search.
[00145] In an embodiment, the present invention provides that the RAG module (320) uses the LangChain agentic framework (316). This framework enhances contextual understanding of metadata queries.
[00146] In an embodiment, the present invention provides that the processing unit (202), when executing the metadata extraction module (306), connects to structured and unstructured data sources. Structured sources include relational database management systems. Unstructured sources include file uploads. The connection is made via a connector framework. The processing unit extracts schema information. It enriches metadata with domain-specific contextual associations.
[00147] In an embodiment, the present invention provides that the processing unit (202), when executing the chunking module (308), performs automated clustering. Clustering is based on schema relationships and historical usage patterns. It groups related metadata entities.
[00148] In an embodiment, the present invention provides that the processing unit (202), when managing the vector database (312), implements a custom schema. This schema stores embeddings with metadata relationships and contextual associations. The processing unit uses hybrid search techniques. These combine vector similarity and relationship-based retrieval. It also supports incremental updates. Embeddings can be added or modified without rebuilding the database.
[00149] In an embodiment, the present invention provides that the processing unit (202), when executing the RAG module (320) using the LangChain agentic framework (316), employs a supervisor agent. This agent interprets natural language queries. It coordinates tasks among co-worker agents. These agents include a use case generator, a data product generator, and an attribute generator. The system performs semantic search to retrieve relevant embeddings based on query intent. The LLM (318) validates and refines recommendations. This ensures contextual accuracy and relevance.
[00150] In an embodiment, the present invention provides a method (500) for automated metadata analysis and data product generation. The method begins by receiving structured and unstructured data sources. Data is received via a user interface (302) provided by a processing unit (202). The processing unit extracts metadata using a metadata extraction module (306). The metadata is transformed into a structured format. Relationships and contextual associations are preserved. The processing unit segments selected subject areas into data chunks. Segmentation is done using a chunking module (308). It is based on user input from the interface. The data chunks are converted into embeddings using a large language model (LLM) (318). The embeddings represent metadata in a vectorised format. They retain relationships and context. The embeddings are stored in a vector database (312). The processing unit indexes them hierarchically. It supports incremental updates to handle evolving metadata. A retrieval-augmented generation (RAG) module (320) retrieves relevant embeddings. Retrieval is based on natural language queries from the user interface. The LLM processes the retrieved embeddings. It generates recommendations for analytical use cases, data products, or attributes. An output generation module (404) produces structured outputs (324). It also generates privacy-compliant datasets. These datasets preserve statistical properties of the original data sources.
[00151] In an embodiment, the system (102) is Gen AI (322)-powered solution that transforms metadata into structured insights, automate data product discovery, use case identification, synthetic data generation, and enables data monetization. The system (102) may be used across multiple industries, platforms, and services.
[00152] Enterprise Data and Analytics Platforms: The system (102) discloses hidden insights by recommending potential use cases for cross-functional actionable insights
[00153] AI and Machine Learning Applications: The system (102) may enable AI model training with synthetic data, feature engineering, and metadata-driven ML pipelines.
[00154] Data Monetization and Marketplaces: The system (102) may help businesses create structured, privacy-compliant datasets for secure data sharing and monetization.
[00155] Privacy, Security, and Compliance Solutions: The system (102) may automate data anonymization, regulatory reporting, and privacy-preserving AI training.
[00156] Data Mesh: The system (102) may enhance data mesh adoption.
[00157] BFSI: The system (102) may be used for fraud detection, risk analytics, credit scoring.
[00158] Retail: The system (102) may be used for customer segmentation, demand forecasting.
[00159] Healthcare: The system (102) may be used for AI-powered patient insights, clinical trial data synthesis.
[00160] Manufacturing: The system (102) may be used for predictive maintenance, supply chain optimization.
[00161] Telecom: The system (102) may be used for network optimization, churn prediction.
[00162] The system (102) may be used in organizations for new insights, optimizing AI workflows, and maximizing the business value of their data faster, smarter, and at scale.
[00163] In an embodiment, the present invention discloses a method (500) for automated metadata analysis and data product generation. The system is receiving, via a user interface (302) provided by a processing unit (202) coupled with a storage device, structured and/or unstructured data sources.
[00164] The system is extracting, by the processing unit (202), metadata using the metadata extraction module (306) and transforming the metadata into a structured format, preserving relationships and contextual associations.
[00165] The system is segmenting, by the processing unit (202), selected subject areas of the metadata into data chunks using the chunking module (308) based on user input received via the user interface (302).
[00166] The system is converting, by the processing unit (202), the data chunks into a plurality of embeddings using the LLM (318), wherein the embeddings represent the metadata in a vectorised format while preserving relationships and contextual associations.
[00167] The system is storing, by the processing unit (202), the plurality of embeddings in the vector database (312), indexing the embeddings hierarchically to enable efficient retrieval, and supporting incremental updates to accommodate evolving metadata.
[00168] The system is executing, by the processing unit (202), the RAG module (320) to retrieve relevant embeddings from the vector database (312) based on natural language queries received via the user interface (302).
[00169] The system is processing the retrieved embeddings using the LLM (318) to generate recommendations for analytical use cases, data products, or attributes.
[00170] The system is executing, by the processing unit (202), the output generation module (404) to produce structured outputs (324) and generate privacy-compliant datasets preserving statistical properties of the received structured and/or unstructured data sources.
[00171] The system is further comprising ranking and refining suggested data products using feedback from users or automated model validation.
[00172] Equivalents
[00173] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for the sake of clarity.
[00174] It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as "open" terms (e.g., the term "including" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "includes" should be interpreted as "includes but is not limited to," etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present.
[00175] Although implementations for the Gen AI and agentic AI-based system and method have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features described. Rather, the specific features are disclosed as examples of implementation for the Gen AI and agentic AI-based system and method.
,CLAIMS:
1. A generative and agentic artificial intelligence-based system (102) for automated metadata analysis and data product generation, comprising:
a storage device coupled with a processing unit (202), wherein the processing unit (202) is configured to:
provide a user interface (302) to receive structured and/or unstructured data sources;
execute a metadata extraction module (306) to extract metadata from structured and unstructured data sources and transform the metadata into a structured format, preserving relationships and contextual associations;
execute a chunking module (308) to segment selected subject areas of the metadata into data chunks based on user input received via the user interface (302);
convert the data chunks into a plurality of embeddings using a large language model (LLM) (318), wherein the embeddings represent the metadata in a vectorised format while preserving relationships and contextual associations, and store the embeddings in a vector database (312);
index the embeddings hierarchically in the vector database (312) and support incremental updates to accommodate evolving metadata;
execute a retrieval-augmented generation (RAG) module (320), to retrieve relevant embeddings from the vector database (312) based on natural language queries received via the user interface (302);
process the retrieved embeddings using the LLM (318) to generate recommendations for analytical use cases, data products, or attributes; and
execute an output generation module (404) coupled with the storage device to produce structured outputs (324) based on the processed embeddings, and produce privacy-compliant datasets preserving statistical properties of the received structured and/or unstructured data sources.

2. The system (102) as claimed in claim 1, wherein the user interface (302) is further configured to support input of structured data sources, comprising relational databases, and unstructured data sources, comprising natural language query input, text files, and documents, from a plurality of user devices comprising computers, mobile devices, and IoT devices.
3. The system (102) as claimed in claim 1, wherein the vector database (312) is a FAISS-based index coupled with the storage device, and the processing unit (202) is configured to store high-dimensional metadata embeddings for efficient similarity search.

4. The system (102) as claimed in claim 1, wherein the RAG module (320) utilizes the LangChain agentic framework (316) to enhance contextual understanding of metadata queries.

5. The system (102) as claimed in claim 1, wherein the processing unit (202), in executing the metadata extraction function (306), is further configured to:
connect to structured data sources, comprising relational database management systems, and unstructured data sources, comprising file uploads, via a connector framework; and
extract schema information and enrich metadata with domain-specific contextual associations.

6. The system (102) as claimed in claim 1, wherein the processing unit (202), in executing the chunking function (308), is further configured to perform automated clustering of related metadata entities based on schema relationships and historical usage patterns.

7. The system (102) as claimed in claim 1, wherein the processing unit (202), in managing the vector database (312), is configured to:
implement a custom schema for storing embeddings with metadata relationships and contextual associations;
utilize hybrid search techniques combining vector similarity and relationship-based retrieval for efficient query processing; and
support incremental updates by adding or modifying embeddings without rebuilding the vector database (312).

8. The system (102) as claimed in claim 1, wherein the processing unit (202), in executing the RAG module (320) implemented using the LangChain agentic framework (316), is further configured to:
employ a supervisor agent to interpret natural language queries and coordinate tasks among co-worker agents, including a use case generator, a data product generator, and an attribute generator;
perform semantic search to retrieve relevant embeddings from the vector database (312) based on query intent; and
validate and refine recommendations using the LLM (318) to ensure contextual accuracy and relevance.

9. A method (500) for automated metadata analysis and data product generation, comprising:
receiving, via a user interface (302) provided by a processing unit (202) coupled with a storage device, structured and/or unstructured data sources;
extracting, by the processing unit (202), metadata using the metadata extraction module (306) and transforming the metadata into a structured format, preserving relationships and contextual associations;
segmenting, by the processing unit (202), selected subject areas of the metadata into data chunks using the chunking module (308) based on user input received via the user interface (302);
converting, by the processing unit (202), the data chunks into a plurality of embeddings using the LLM (318), wherein the embeddings represent the metadata in a vectorised format while preserving relationships and contextual associations;
storing, by the processing unit (202), the plurality of embeddings in the vector database (312), indexing the embeddings hierarchically to enable efficient retrieval, and supporting incremental updates to accommodate evolving metadata;
executing, by the processing unit (202), the RAG module (320) to retrieve relevant embeddings from the vector database (312) based on natural language queries received via the user interface (302);
processing the retrieved embeddings using the LLM (318) to generate recommendations for analytical use cases, data products, or attributes; and
executing, by the processing unit (202), the output generation module (404) to produce structured outputs (324) and generate privacy-compliant datasets preserving statistical properties of the received structured and/or unstructured data sources.
10. The method (500) as claimed in claim 9, further comprising ranking and refining suggested data products using feedback from users or automated model validation.

Documents

Application Documents

#	Name	Date
1	202521052658-STATEMENT OF UNDERTAKING (FORM 3) [30-05-2025(online)].pdf	2025-05-30
2	202521052658-PROVISIONAL SPECIFICATION [30-05-2025(online)].pdf	2025-05-30
3	202521052658-POWER OF AUTHORITY [30-05-2025(online)].pdf	2025-05-30
4	202521052658-FORM 1 [30-05-2025(online)].pdf	2025-05-30
5	202521052658-FIGURE OF ABSTRACT [30-05-2025(online)].pdf	2025-05-30
6	202521052658-DRAWINGS [30-05-2025(online)].pdf	2025-05-30
7	202521052658-DECLARATION OF INVENTORSHIP (FORM 5) [30-05-2025(online)].pdf	2025-05-30
8	202521052658-FORM-5 [05-09-2025(online)].pdf	2025-09-05
9	202521052658-FORM 3 [05-09-2025(online)].pdf	2025-09-05
10	202521052658-FORM 18 [05-09-2025(online)].pdf	2025-09-05
11	202521052658-DRAWING [05-09-2025(online)].pdf	2025-09-05
12	202521052658-COMPLETE SPECIFICATION [05-09-2025(online)].pdf	2025-09-05
13	202521052658-FORM-9 [08-09-2025(online)].pdf	2025-09-08
14	Abstract.jpg	2025-09-15