A Business Centric Multi Layered Big Data System (Daas) For Enhanced

< Back

A Business Centric Multi Layered Big Data System (Daas) For Enhanced Service Provision And Its Method Thereof

Abstract: The system comprises a metadata management module(102) to scan the data assets and generate a DataMap to imitate a communication point for metadata tracking; an access management module(104) to define role-based access ensuring that users not authorized to view certain datasets cannot also view associated metadata; a request workflow module(106) to manage data access requests in two categories selected from Data Fetch Requests, and Data Catalogue Requests and allow high-priority users to shuffle unattended requests while adhering to a predetermined protocol defining user or group permissions for queue management; a data quality assessment module(110) to evaluate and record the quality of data before its cataloging; a central processing unit(112) to generate a data quality report, which is further used for long-term data reporting; and a storage module(114) to store all configurations and logs at locations that can be accessed for serving the application and for decision-making.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

21 August 2023

Publication Number

41/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2024-12-11

Renewal Date

Applicants

Rubiscape Pvt. Ltd.

Inteliment House, SN 106, Baner Rd, next to Westside, Laxman Nagar, Baner, Pune, Maharashtra 411045

Inventors

1. Mr. Sachin Balasaheb Chougule

A-503, Chaitanya Platinum, Opposite Bharati Vidyapith School, Balewadi, Pune - 411045, Maharashtra, India

2. Dr. Prashant Shantaram Pansare

Dwarka, 24 Giriraj Hsg Soc, 81/A Baner Road, Aundh, Pune - 411007, Maharashtra, India

3. Mr. Anand Shantaram Pansare

Dwarka, 24 Giriraj Hsg Soc, 81/A Baner Road, Aundh, Pune - 411007, Maharashtra, India

4. Dr. Sheetal Naresh Ghorpade

G -1, A - 16, Planet Millennium, Aundh Camp, Pune - 411027, Maharashtra, India

5. Mr. Nagesh Naik

902, Ramswaroop Palai Tower CHS, Baburao Parulekar Marg, Dadar West, Mumbai - 400028

Specification

Description:FIELD OF THE INVENTION

The present invention relates generally to the field of data management, and more specifically, to the development and implementation of a Multi-layered Business Centric Big Data as a Service system (MBC-BDaaSS). The invention focuses on data services for organizations' business needs, including metadata management, data quality management, data approval mechanisms, user-interaction features, and an overall logical system that optimizes data flows within the system.

BACKGROUND OF THE INVENTION

In the realm of data management, creating a user-friendly, scalable, and configurable Data Governance system is crucial. One that visualizes the entire data stack of an organization is of great importance. The cornerstone of this concept lies in metadata management and representation. The system comprises of a Metadata Layer, Processing Layer, User Interface (UI) Layer, and a Reporting Layer capability. These multi-layered data systems have proved useful for data management. The latest system termed a Multi-layered Business Centric Big-Data as a Service System (MBC-BDaaSS), has evolved from the pain points that businesses have experienced in the previous versions of Data as a Service (DaaS).

Historically, instances of data leakage occurred because data could move without an approval mechanism and business justification. The current solution incorporates a mechanism to provide for an approval process to ensure maximum data security. The metadata layer maintains the details about all the changes that happen to the data and keeps updating in a timely fashion. The processing layer works as a mediator between all the other layers and communicates with all other layers. This layer reads from the metadata layer and is responsible to feed data to the Reporting layer and the UI layer.

Data as a Service provides a one-stop location to go through the entire data landscape and request access to data assets from a unified location. The concept was originally focused on buying, selling, and trading soft-copy data as a service, with the intent of making data available to the end user with ease. The present invention focuses on making data available to the user with multiple add-on services, thereby providing a novel offering. The data quality indicators, the ability to request data in a self-service fashion, the ability to automate repeated data delivery, and a steward view to give a birds-eye view of the entire system are all novel add-ons to the initial offerings that had been designed previously. Also, this innovation discusses a mechanism to ensure the data remains safe under different circumstances. The entire system is built on Cloud and hence is a novel approach to building a Data as a Service offering.

In view of the foregoing discussion, it is portrayed that there is a need to have a business-centric multi-layered big data system and method for enhanced service provision. The system captures and displays the data dictionary and lineage for data, making it easier for business users to start deriving more meaning from the data at hand. The Home view displays all the data sources that are connected for governance and all the assets that are a subset of these sources. The data assets are also classified based on business terms, also known as "Glossary Terms" which makes them easier to be located for business users. The catalog search option has also been incorporated to provide a capability to perform a semantic search on the assets from a single location, based on the keywords. Each data asset also can request access, wherein a user can request access to the data and the request goes through an approval mechanism before providing the data access to the user. The system keeps an audit trail of all activities performed on the web portal, for audit purposes as well as to help further visualization or insights. The system also allows users to rate and comment on the data assets to enable data democratization. The Steward View helps understand the entire portal functioning in all its aspects, thus helping make better decisions for future revisions of data policies.

SUMMARY OF THE INVENTION

The present disclosure seeks to provide a Multi-layered Business Centric Big Data as a Service system (MBC-BDaaSS), catering to an organization's business-centric needs. The system streamlines the flow of Metadata management and Data Quality management, providing a comprehensive understanding of the request flow system. MBC-BDaaSS empowers users to locate data, evaluate its quality, rate, and comment, and share metadata. It also provides organizations with an approval mechanism, offering a clear overview of data access. Compared to previous frameworks, MBC-BDaaSS stands out with numerous advantages. These include calculating data quality, providing a platform for users to share metadata, rating, commenting, support for approval, reminder and feedback mechanisms, compatibility with all types of data assets, including dashboards, and a steward view aiding in better data policy creation. Moreover, being entirely cloud-based, MBC-BDaaSS facilitates businesses and organizations to offer an internal marketplace for data, thus fostering value gains from big data. The system has the potential for future enhancements toward greater maturity.

In an embodiment, a business-centric multi-layered big data system for enhanced service provision is disclosed. The system includes a metadata management module equipped with a cloud virtual machine, to scan the data assets and ensure tracking of the data assets, wherein the metadata management module comprises: a) a processing unit, connected to the metadata management module, to generate a DataMap to imitate a communication point for metadata tracking, including data asset details, updates, and inquiries; and b) a Cloud Serverless Executor, connected to the processing unit, to track and determine patterns and frequency of data flow into the system to optimize compute resources for managing metadata.
The system further includes an access management module, coupled with the metadata management module, to define role-based access to the data based on an analysis of user types and data types, wherein the role-based access governs access to metadata, ensuring that users not authorized to view certain datasets also lack the ability to view associated metadata, thereby enhancing metadata security. The visibility of metadata allows users to request access to associated datasets.
The system further includes a request workflow module, coupled with the access management module, to manage data access requests in two categories selected from Data Fetch Requests, and Data Catalogue Requests, wherein the Data Fetch Requests allow users to search for an available data asset and request for access the underlying data, whereas the Data Catalogue Requests is performed when unidentified singular data assets are submitted by users for the perusal of others or when a new data source is observed and needs to be tracked, wherein the request workflow module comprises: a) a dynamic request queue processor, connected to the request workflow module, to allow high-priority users to shuffle unattended requests while adhering to a predetermined protocol defining user or group permissions for queue management.
The system further includes a graphical user interface, connected to the request workflow module, to host the web user interface, enabling end users to interact with the application, while ensuring seamless communication with backend services.
The system further includes a data quality assessment module equipped with a processor, connected to the graphical user interface, to evaluate and record the quality of data before its cataloging, wherein the data quality assessment module incorporates a rule-definition unit to define specific data quality checks, with newly defined checks thoroughly tested before integration into the pipeline. During data quality processing, the dataset is cross-referenced with the mapping table to determine the types of rules to be applied to each data column, with rule definitions retrieved from a Rule Definitions table.
The system further includes a central processing unit, connected to the data quality assessment module, to generate a data quality report and store the generated data quality report in a Data Quality Management (DQM) Results table, which is further used for long-term data reporting.
The system further includes a storage module, connected to the central processing unit, to store all configurations and logs at locations that can be accessed for serving the application and for decision-making purposes.

In another embodiment, a business-centric multi-layered big data method for enhanced service provision. The method includes creating and maintaining a DataMap that stores metadata details, is informed by data volume, and operates in compliance with identified data inflow patterns and metadata refresh frequencies.
The method further includes implementing a pull mechanism to perform scheduled data scans at determined intervals to monitor changes in the data layer and retrieve the index of all data assets and implementing a push mechanism to capture specific points of data pipelines or services interacting with the data.
The method further includes assigning data columns to corresponding business aspects, facilitating interaction with users, and classifying data columns automatically based on predefined characteristics during data scans.
The method further includes determining data scan frequency in accordance with data refresh rates to optimize compute resource allocation and determining user access levels to specific data types, ensuring group-level access security for metadata and datasets.
The method further includes implementing a data request workflow, capable of handling two types of data requests selected from data fetch requests and data cataloging requests and including an approval process in the data request workflow, which verifies the nature of the requested data, validates the request, and upon approval, places the request in a dynamic request queue for execution.
The method further includes prioritizing the requests in the dynamic request queue based on the priority of the users.
The method further includes storing predefined quality check rules in a rule definitions table and dataset-to-rule mappings in a mapping table and applying the stored predefined quality check rules during data quality assessments based on the stored dataset-to-rule mappings.
The method further includes capturing quality metrics of the data for further reporting on the application.

An object of the present disclosure is to streamline metadata management to keep track of all changes happening to the data in a timely fashion.

Another object of the present disclosure is to implement an effective Data Quality Management system to calculate data quality, which helps to add more value to the overall data of the organization.

Another object of the present disclosure is to implement a secure approval mechanism, providing controlled access to data and ensuring data security within the organization.

Another object of the present disclosure is to construct a user-friendly platform allowing users to locate, evaluate, and share metadata, fostering a culture of data democratization.

Another object of the present disclosure is to leverage cloud-based infrastructure for the system, ensuring scalability, reliability, and efficient data management.

Yet another object of the present invention is to deliver an expeditious and cost-effective Multi-layered Business Centric Big Data as a Service system (MBC-BDaaSS) that caters to the data service requirements of organizations in a business-centric manner.

To further clarify the advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail in the accompanying drawings.

BRIEF DESCRIPTION OF FIGURES

These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read concerning the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:

Figure 1 illustrates a block diagram of a business-centric multi-layered big data system for enhanced service provision in accordance with an embodiment of the present disclosure;
Figure 2 illustrates a flow chart of a business-centric multi-layered big data method for enhanced service provision in accordance with an embodiment of the present disclosure;
Figure 3 illustrates a process flow of metadata management in accordance with an embodiment of the present disclosure;
Figure 4 illustrates a process flow of request workflow in accordance with an embodiment of the present disclosure;
Figure 5 illustrates a process flow of data quality management in accordance with an embodiment of the present disclosure;
Figure 6 illustrates Table 1 depicts component details; and
Figure 7 illustrates Table 2 depicts a comparison with previous work.

Further, skilled artisans will appreciate those elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.

DETAILED DESCRIPTION:

To promote an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.

It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.

Reference throughout this specification to “an aspect”, “another aspect” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.

Embodiments of the present disclosure will be described below in detail concerning the accompanying drawings.

Referring to Figure 1, a block diagram of a business-centric multi-layered big data system for enhanced service provision is illustrated in accordance with an embodiment of the present disclosure. System 100 includes a metadata management module (102) equipped with a cloud virtual machine (102a) to scan the data assets and ensure tracking of the data assets.

The metadata management module (102) includes a processing unit (102b) connected to the metadata management module (102) to generate a DataMap to imitate a communication point for metadata tracking, including data asset details, updates, and inquiries. In one embodiment, a Cloud Serverless Executor (102c) is connected to the processing unit (102b) to track and determine patterns and frequency of data flow into the system to optimize compute resources for managing metadata.

In an embodiment, an access management module (104) is coupled with the metadata management module (102) to define role-based access to the data based on an analysis of user types and data types, wherein the role-based access governs access to metadata, ensuring that users not authorized to view certain datasets also lack the ability to view associated metadata, thereby enhancing metadata security. The visibility of metadata allows users to request access to associated datasets.

In an embodiment, a request workflow module (106) is coupled with the access management module (104) to manage data access requests in two categories selected from Data Fetch Requests, and Data Catalogue Requests, wherein the Data Fetch Requests allow users to search for an available data asset and request for access the underlying data, whereas the Data Catalogue Requests is performed when unidentified singular data assets are submitted by users for the perusal of others or when a new data source is observed and needs to be tracked.

The request workflow module (106) includes a dynamic request queue processor (106a) connected to the request workflow module (106) to allow high-priority users to shuffle unattended requests while adhering to a predetermined protocol defining user or group permissions for queue management.

In an embodiment, a graphical user interface (108) is connected to the request workflow module (106) to host the web user interface, enabling end users to interact with the application, while ensuring seamless communication with backend services.

In an embodiment, a data quality assessment module (110) is equipped with a processor and connected to the graphical user interface (108) to evaluate and record the quality of data before its cataloging, wherein the data quality assessment module (110) incorporates a rule-definition unit to define specific data quality checks, with newly defined checks thoroughly tested before integration into the pipeline. During data quality processing, the dataset is cross-referenced with the mapping table to determine the types of rules to be applied to each data column, with rule definitions retrieved from a Rule Definitions table.

In an embodiment, a central processing unit (112) is connected to the data quality assessment module (110) to generate a data quality report and store the generated data quality report in a Data Quality Management (DQM) Results table, which is further used for long-term data reporting.

In an embodiment, a storage module (114) is connected to the central processing unit (112) to store all configurations and logs at locations that can be accessed for serving the application and for decision-making purposes.

In another embodiment, the metadata management module (102) comprises a pull mechanism that is deployed to schedule data scans at a fixed frequency and monitor changes in the data layer. In one embodiment, a push mechanism is deployed to push activity details from specific points of the data pipelines or service layer to the metadata management module (102).

In another embodiment, the pull and push mechanism further comprises a rule-definition module (116) deployed to generate business terms and classification rules. The rule-definition module (116) defines business terms to ensure the representation of business facets in the data columns, thereby enhancing system interactions and user understanding. The rule-definition module (116) includes a classification rule processing unit for defining classification rules aimed at an automated classification of data columns based on certain characteristics, thus reducing manual classification efforts, wherein these classification rules are activated during data scans for improved data understanding; and
In one embodiment, the pull and push mechanism further comprises a scan planning module (118) to determine the scoping and scheduling of data scans, wherein the scan planning module (118) is configured to determine the frequency of data refreshes, enabling efficient and effective planning of data scans, considering the high computational efforts associated with scanning data.

In another embodiment, the request workflow module (106) incorporates a personal identification information (PII) evaluation component, simplifying the approval process for non-PII data requests and utilizing classification rules and user request forms to identify PII data for enabling the submission of supporting documentation in multiple formats for both Data Fetch and Data Catalogue requests, with provision for approvers to add comments, request clarification, or decline requests. Data Fetch Requests, post-approval and queue placement, undergo a historical verification process for potential re-execution of old data fetch scripts to save time and effort. Upon acceptance of a data sample by the requestor, the full dataset is shared and cataloged for future availability and searchability in the system.

In another embodiment, Data Catalogue Requests follow a process where data is scanned and scripts are built for seamless future data deliveries, and then entries are made in the data catalog, wherein following Data Fetch or Data Catalogue procedures, data quality is measured and the details are appended to the data catalog for further reference.

In another embodiment, the data quality assessment module (110) utilizes the Rule Definitions table to store the details of defined data quality checks, and a separate mapping table to store dataset-to-rule mappings, wherein these predefined rules and mappings are used for efficient data quality assessments.

In another embodiment, role-based access control is employed to define which types of data should be available to what types of users, ensuring the metadata and data assets are secure and visible only to specific user groups.

In another embodiment, a communication module (120) is deployed to trigger email notifications at each interaction point in the system, providing an end-to-end business-friendly system approach, wherein the email notifications are triggered at each interaction point in the system, from request raising to final feedback, providing an end-to-end business-friendly system approach.

In another embodiment, an approval mechanism is connected to the storage module (114) to ensure data security by controlling the movement of data.
Figure 2 illustrates a flow chart of a business-centric multi-layered big data method for enhanced service provision in accordance with an embodiment of the present disclosure. At step 202, method 200 includes creating and maintaining a DataMap that stores metadata details, is informed by data volume, and operates in compliance with identified data inflow patterns and metadata refresh frequencies.
At step 204, method 200 includes implementing a pull mechanism to perform scheduled data scans at determined intervals to monitor changes in the data layer and retrieve the index of all data assets and implementing a push mechanism to capture specific points of data pipelines or services interacting with the data.
At step 206, method 200 includes assigning data columns to corresponding business aspects, facilitating interaction with users, and classifying data columns automatically based on predefined characteristics during data scans.
At step 208, method 200 includes determining data scan frequency in accordance with data refresh rates to optimize compute resource allocation and determining user access levels to specific data types, ensuring group-level access security for metadata and datasets.
At step 210, method 200 includes implementing a data request workflow, capable of handling two types of data requests selected from data fetch requests and data cataloging requests and including an approval process in the data request workflow, which verifies the nature of the requested data, validates the request, and upon approval, places the request in a dynamic request queue for execution.
At step 212, method 200 includes prioritizing the requests in the dynamic request queue based on the priority of the users.
At step 214, method 200 includes storing predefined quality check rules in a rule definitions table and dataset-to-rule mappings in a mapping table and applying the stored predefined quality check rules during data quality assessments based on the stored dataset-to-rule mappings.
At step 216, method 200 includes capturing quality metrics of the data for further reporting on the application.
Figure 3 illustrates a process flow of metadata management in accordance with an embodiment of the present disclosure. The main part of the flow is the management of Metadata to ensure that all data assets are tracked accurately. To start with, a DataMap is created that can store the details of the metadata. The DataMap will act as the point of all communication for the metadata, all its updates, and inquiries. Before utilizing the DataMap, initially understand the pattern in which the data will be expected to flow into the system and the also the frequency at which the metadata refresh is expected. This will guarantee that unnecessary computational resources are not expended for continuous tracking. Another approach to maintaining the DataMap is the Push approach, wherein the modules that perform some action on the data can Push the details to the DataMap. This approach is not preferred as it will only cover files that are being used in some code blocks and also induce manual efforts and errors while pushing the activity details to DataMap. The configuration of the DataMap is typically determined by the volume of data in consideration.
Push and pull mechanism for metadata management is still existing in the designed system and is being implemented as follows:
a. Pull mechanism – Data scans that are scheduled at a fixed frequency to keep monitoring the changes in the data layer. These keep fetching the index of all data assets and thus track changes in the data layer.
b. Push Mechanism – Specific points of the Data Pipelines, Service layer, or any other code/service that interacts with the Data can be covered by this mechanism. If the source and sink for the interacting service are already under monitoring by the Pull / Scan mechanism, it gives us insight into the transformed data but leaves us with the question of who transformed the data. The Push mechanism plays a crucial role in this aspect to ensure that everything under the roof is under better governance.
For the Pull / Scan Mechanism, there are a set of other activities that should be performed simultaneously:
a. Defining the Business Terms
a. This is to ensure that data columns represent which facet of the Business. It is a critical part that ensures any system that builds on top of these layers has a better understanding of the Business and can interact with users in a better way.
b. Defining the Classification Rules
a. This ensures that the columns can be automatically classified based on certain characteristics, thus reducing the efforts of manual classification. Classification rules are supposed to run at the time of data scans to understand the data better.
c. Planning the Scoping and Scheduling for Scans
a. Scanning data can be a huge compute effort and hence it is necessary to understand which data refreshes at what frequency. This helps define the scans in a better and more efficient way.
Another activity that needs to be done at the time of data analysis is to understand which types of data should be available to what types of users. This is used to define role-based access for the datasets. This group-level access security ensures that the metadata will also be secure and people who are not supposed to view certain datasets, will not even be able to view the metadata for those as well. This typically helps protect the sensitive data of an organization, wherein the metadata for such data is visible to only specific groups. Once the metadata is visible to the users, they can request access to the datasets.
Figure 4 illustrates a process flow of request workflow in accordance with an embodiment of the present disclosure. The request workflow helps management of data access requests that can be raised in the system. Data requests in this system are defined into two types:
a. Data Fetch Requests - This is a frequently used request type where users search for an available data asset and request access to the underlying data.
b. Data Catalogue Requests – Cataloguing data is when unidentified singular data assets are submitted by users for the perusal of others or when a new data source is observed and needs to be tracked.
The process has been designed with some important features to ensure that data security, governance, and monitoring happen accurately. Once a request is raised, it goes to a person at a higher authority for approval. To keep Personal Data secure, the system tries to evaluate if the data requested has PII data or not. In the case of non-PII Data, the approval process will be the simpler one. Identification of PII data happens in two ways: Making use of Classification rules that will look for certain traits in data and Request Forms that are raised by users. The approach makes provision for uploading supporting documentation in multiple formats to present a strong case and evidence for data requests, both data fetch and data cataloging. Approvers have the option to send back the request if they seek further clarification or decline the request. Approvers also have the option to add their comments while they take decisions on the request.
The request queue is dynamic in nature, with the facility for high-priority users to be able to shuffle the requests that are not yet being worked on. This ensures that businesses can cater to critical requests that support the business but come in late. The interactions on the request queue will be taken care of in a predetermined protocol for the organization, where only a certain number of users or specific groups can shuffle queues. Post approvals, the requests get placed in the queue. Once a Data Fetch Request is picked up for execution, it is first verified for any historic executions. This ensures that the old data fetch script can be rerun to save time and effort. If historic executions do not exist, then the script for data delivery is created and then a sample of the dataset is presented to the requester. The complete dataset is shared with the user once the sample is accepted by the user. This same dataset is cataloged as a version of the dataset to be made available and searchable in the system.
For the Data Catalogue request, the process happens slightly differently than expected. Data is scanned to be fit to be cataloged, a script is built to ensure future data deliveries are seamless and then the entry is made in the catalog. Post the Data Fetch or Data Catalogue step, the Data Quality is measured, and the details are appended to the catalog for further reference.
Email notifications will be triggered at each interaction point in the system, right from raising the request to the final feedback. Thus, making it an end-to-end business-friendly system approach.
Figure 5 illustrates a process flow of data quality management in accordance with an embodiment of the present disclosure. Once a dataset is ready to be cataloged, the system is built in such a way that the Quality of the data is assessed and recorded. As a one-time activity, the Data quality checks mapping needs to be done into the system. This consists of two steps, first is to define how exactly a particular data check will work. The logical piece of the execution lies here. Any new check type is first defined here and tested thoroughly before adding to the pipeline. The second step consists of mapping the check types to the datasets. While the first step is stored in the Rule Definitions table, the details for dataset to query mapping are stored in a mapping table.
When the file starts getting processed for Data Quality, the dataset is looked up in the custom mapping table. This helps fetch the details of which types of rules need to be run on which columns of the data. With the use of this list, the exact rule definitions are looked up from the rule definitions table.
The processing then takes place and the output of the data quality report is collected for the dataset. This is kept as a CSV report for easy sharing and perusal and stored in a DQM Results table. This then finds use in making data reports over the long run.
Figure 6 illustrates Table 1 depicts component details.
Figure 7 illustrates Table 2 depicts a comparison with previous work. The previous works have been examined for data as a service, utilizing this information to distinguish the MBC-BDaaSS from preceding efforts. MBC-BDaaSS is currently supporting numerous features which can adapt to the needs of businesses in the big data era.

Data as a Service is consistently evolving to accommodate rapidly changing requirements. This innovation presents a Multi-layered Business Centric Big Data as a Service system (MBC-BDaaSS) that is designed to cater to an organization's specific needs for Data as a Service. The flow of Metadata Management, Data Quality management, and the overall logical architecture have been developed to facilitate an understanding of the system's request flow. MBC-BDaaSS provides the users with the ability to locate data, see the quality, rate, and comment, and share metadata; while organizations can have an approval mechanism and an overview of who has access to what data. In comparison with previous frameworks, MBC-BDaaSS has many advantages, such as the ability to calculate data quality, provide a market for users to share metadata, rate and comment, support for approval, reminder and feedback mechanism, support for all types of data assets along with dashboards, steward view to help better data policies and is built entirely on the cloud. It makes it easier for such a model to support businesses and organizations to provide an internal marketplace for data that fosters value gains from big data. Enhancement of this architecture to the next level of maturity can be a future development.

The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.

Benefits, other advantages, and solutions to problems have been described above about specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims. , Claims:1. A business-centric multi-layered big data system for enhanced service provision, the system comprises:
a metadata management module (102) equipped with a cloud virtual machine (102a), to scan the data assets and ensure tracking of the data assets, wherein the metadata management module (102) comprises:
a) a processing unit (102b), connected to the metadata management module (102), to generate a DataMap to imitate a communication point for metadata tracking, including data asset details, updates, and inquiries;
b) a Cloud Serverless Executor (102c), connected to the processing unit (102b), to track and determine patterns and frequency of data flow into the system to optimize compute resources for managing metadata;
an access management module (104), coupled with the metadata management module (102), to define role-based access to the data based on an analysis of user types and data types, wherein the role-based access governs access to metadata, ensuring that users not authorized to view certain datasets also lack the ability to view associated metadata, thereby enhancing metadata security;
wherein the visibility of metadata allows users to request access to associated datasets;
a request workflow module (106), coupled with the access management module (104), to manage data access requests in two categories selected from Data Fetch Requests, and Data Catalogue Requests, wherein the Data Fetch Requests allow users to search for an available data asset and request for access the underlying data, whereas the Data Catalogue Requests is performed when unidentified singular data assets are submitted by users for the perusal of others or when a new data source is observed and needs to be tracked, wherein the request workflow module (106) comprises:
a) a dynamic request queue processor (106a), connected to the request workflow module (106), to allow high-priority users to shuffle unattended requests while adhering to a predetermined protocol defining user or group permissions for queue management;
a graphical user interface (108), connected to the request workflow module (106), to host the web user interface, enabling end users to interact with the application, while ensuring seamless communication with backend services;
a data quality assessment module (110) equipped with a processor, connected to the graphical user interface (108), to evaluate and record the quality of data before its cataloging, wherein the data quality assessment module (110) incorporates a rule-definition unit to define specific data quality checks, with newly defined checks thoroughly tested before integration into the pipeline;
wherein during data quality processing, the dataset is cross-referenced with the mapping table to determine the types of rules to be applied to each data column, with rule definitions retrieved from a Rule Definitions table;
a central processing unit (112), connected to the data quality assessment module (110), to generate a data quality report and store the generated data quality report in a Data Quality Management (DQM) Results table, which is further used for long-term data reporting; and
a storage module (114), connected to the central processing unit (112), to store all configurations and logs at locations that can be accessed for serving the application and for decision-making purposes.

2. The system as claimed in claim 1, wherein the metadata management module (102) comprises:
a pull mechanism, deployed to schedule data scans at a fixed frequency, and monitor changes in the data layer; and
a push mechanism, deployed to push activity details from specific points of the data pipelines or service layer to the metadata management module (102).

3. The system as claimed in claim 2, wherein the pull and push mechanism further comprises:
a rule-definition module (116) deployed to generate business terms and classification rules;
wherein the rule-definition module (116) defines business terms to ensure the representation of business facets in the data columns, thereby enhancing system interactions and user understanding;
wherein the rule-definition module (116) includes a classification rule processing unit for defining classification rules aimed at an automated classification of data columns based on certain characteristics, thus reducing manual classification efforts, wherein these classification rules are activated during data scans for improved data understanding; and
a scan planning module (118) to determine the scoping and scheduling of data scans, wherein the scan planning module (118) is configured to determine the frequency of data refreshes, enabling efficient and effective planning of data scans, considering the high computational efforts associated with scanning data.

4. The system as claimed in claim 1, wherein the request workflow module (106) incorporates a personal identification information (PII) evaluation component, simplifying the approval process for non-PII data requests and utilizing classification rules and user request forms to identify PII data for enabling the submission of supporting documentation in multiple formats for both Data Fetch and Data Catalogue requests, with provision for approvers to add comments, request clarification, or decline requests;
wherein Data Fetch Requests, post-approval and queue placement, undergo a historical verification process for potential re-execution of old data fetch scripts to save time and effort; and
wherein upon acceptance of a data sample by the requestor, the full dataset is shared and cataloged for future availability and searchability in the system.

5. The system as claimed in claim 1, wherein Data Catalogue Requests follow a process where data is scanned and scripts are built for seamless future data deliveries, and then entries are made in the data catalog; and
wherein following Data Fetch or Data Catalogue procedures, data quality is measured and the details are appended to the data catalog for further reference.

6. The system as claimed in claim 1, wherein the data quality assessment module (110) utilizes the Rule Definitions table to store the details of defined data quality checks, and a separate mapping table to store dataset-to-rule mappings, wherein these predefined rules and mappings are used for efficient data quality assessments.

7. The system as claimed in claim 1, wherein a role-based access control is employed to define which types of data should be available to what types of users, ensuring the metadata and data assets are secure and visible only to specific user groups.

8. The system as claimed in claim 1, further comprises a communication module (120) to trigger email notifications at each interaction point in the system, providing an end-to-end business-friendly system approach, wherein the email notifications are triggered at each interaction point in the system, from request raising to final feedback, providing an end-to-end business-friendly system approach.

9. The system as claimed in claim 1, further comprises an approval mechanism, connected to the storage module (114), to ensure data security by controlling the movement of data.

10. A business-centric multi-layered big data method for enhanced service provision, the method comprising:

creating and maintaining a DataMap that stores metadata details, informed by data volume and operating in compliance with identified data inflow patterns and metadata refresh frequencies;
implementing a pull mechanism to perform scheduled data scans at determined intervals to monitor changes in the data layer and retrieve the index of all data assets and implementing a push mechanism to capture specific points of data pipelines or services interacting with the data;
assigning data columns to corresponding business aspects, facilitating interaction with users, and classifying data columns automatically based on predefined characteristics during data scans;
determining data scan frequency by data refresh rates to optimize compute resource allocation and determining user access levels to specific data types, ensuring group-level access security for metadata and datasets;
implementing a data request workflow, capable of handling two types of data requests selected from data fetch requests and data cataloging requests and including an approval process in the data request workflow, which verifies the nature of the requested data, validates the request, and upon approval, places the request in a dynamic request queue for execution;
prioritizing the requests in the dynamic request queue based on the priority of the users;
storing predefined quality check rules in a rule definitions table and dataset-to-rule mappings in a mapping table and applying the stored predefined quality check rules during data quality assessments based on the stored dataset-to-rule mappings; and
capturing quality metrics of the data for further reporting on the application.

Documents

Application Documents

#	Name	Date
1	202321055792-STATEMENT OF UNDERTAKING (FORM 3) [21-08-2023(online)].pdf	2023-08-21
2	202321055792-PROOF OF RIGHT [21-08-2023(online)].pdf	2023-08-21
3	202321055792-FORM FOR STARTUP [21-08-2023(online)].pdf	2023-08-21
4	202321055792-FORM FOR SMALL ENTITY(FORM-28) [21-08-2023(online)].pdf	2023-08-21
5	202321055792-FORM 1 [21-08-2023(online)].pdf	2023-08-21
6	202321055792-FIGURE OF ABSTRACT [21-08-2023(online)].pdf	2023-08-21
7	202321055792-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [21-08-2023(online)].pdf	2023-08-21
8	202321055792-EVIDENCE FOR REGISTRATION UNDER SSI [21-08-2023(online)].pdf	2023-08-21
9	202321055792-DRAWINGS [21-08-2023(online)].pdf	2023-08-21
10	202321055792-DECLARATION OF INVENTORSHIP (FORM 5) [21-08-2023(online)].pdf	2023-08-21
11	202321055792-COMPLETE SPECIFICATION [21-08-2023(online)].pdf	2023-08-21
12	202321055792-FORM-9 [02-09-2023(online)].pdf	2023-09-02
13	202321055792-FORM-26 [02-09-2023(online)].pdf	2023-09-02
14	Abstact.jpg	2023-10-06
15	202321055792-STARTUP [06-05-2024(online)].pdf	2024-05-06
16	202321055792-FORM28 [06-05-2024(online)].pdf	2024-05-06
17	202321055792-FORM 18A [06-05-2024(online)].pdf	2024-05-06
18	202321055792-FER.pdf	2024-06-10
19	202321055792-OTHERS [01-07-2024(online)].pdf	2024-07-01
20	202321055792-FER_SER_REPLY [01-07-2024(online)].pdf	2024-07-01
21	202321055792-CLAIMS [01-07-2024(online)].pdf	2024-07-01
22	202321055792-US(14)-HearingNotice-(HearingDate-15-10-2024).pdf	2024-09-03
23	202321055792-Correspondence to notify the Controller [09-10-2024(online)].pdf	2024-10-09
24	202321055792-FORM-26 [10-10-2024(online)].pdf	2024-10-10
25	202321055792-Written submissions and relevant documents [26-10-2024(online)].pdf	2024-10-26
26	202321055792-PatentCertificate11-12-2024.pdf	2024-12-11
27	202321055792-IntimationOfGrant11-12-2024.pdf	2024-12-11

Search Strategy

1	SearchHistoryE_04-06-2024.pdf