Sign In to Follow Application
View All Documents & Correspondence

Data Validation

Abstract: The present subject matter relates to computer implementable method to validate data in a data system. The method includes generating at least one test query based on a testing scenario, compiling the at least one test query to form at least one test case, storing the at least one test case in a data repository for subsequent utilization, selecting one of the stored test cases based on fresh data to be validated, and modifying and executing the selected test case.. Fig. 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
04 October 2011
Publication Number
15/2013
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building  9th Floor  Nariman Point  Mumbai  Maharashtra

Inventors

1. RATH  Silpa
IT/ITES Special Economic Zone  Plot - 35  Chandaka Industrial Estate  Patia  Chandrasekharpur  Bhubaneswar  Orissa 751024
2. MAHAPATRA  Rashmi Ranjan
IT/ITES Special Economic Zone  Plot - 35  Chandaka Industrial Estate  Patia  Chandrasekharpur  Bhubaneswar  Orissa 751024
3. ACHARYA  Saswati
IT/ITES Special Economic Zone  Plot - 35  Chandaka Industrial Estate  Patia  Chandrasekharpur  Bhubaneswar  Orissa 751024
4. PASUPATHY  Vaithiya S
M/s. Tata Consultancy Services Ltd.  200 Ft. Thoraipakkam - Pallavaram Ring Road  Chennai  Tamil Nadu 600096
5. NARAYANASWAMY  Kumaresan
M/s. Tata Consultancy Services Ltd.  200 Ft. Thoraipakkam - Pallavaram Ring Road  Chennai  Tamil Nadu 600096
6. NOOKALA  Suresh
M/s. Tata Consultancy Services Ltd.  200 Ft. Thoraipakkam - Pallavaram Ring Road  Chennai  Tamil Nadu 600096

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: DATA VALIDATION
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Nirmal Building, 9th Floor, Nariman Point,
SERVICES LIMITED Mumbai, Maharashtra 400021, India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[0001] The present subject matter is related, in general to data validation and, particularly but not exclusively, to a method and a system to validate data in a data system.
BACKGROUND
[0002] A data warehouse is a repository of data, such as data derived from transaction data. For example, in a banking domain, multiple transactions pertaining to a user can be stored in a database, and this transaction data can be transformed and stored in the data warehouse. Therefore, for multiple users, this data can be generated on a continuous basis and stored in the database. Due to the constant updating of the data in the database, a dynamic nature is conferred to the data in the data warehouse.
[0003] The data stored in such data warehouses is generally queried to generate reports pertaining to decision support systems. Generally, data warehouses are optimized for querying and analysis, which is why they are often referred to as online analytical processing (OLAP) databases. For example, data warehouses may be subject-oriented in order for the users to effectively analyze the data pertaining to a specific industry or market sector. In order for the data warehouses to respond quickly to analytical questions or queries that are targeted at the data therein, the data warehouses are read-optimized.
[0004] Due to the constant transactions that occur on a daily basis in businesses and industries, the corresponding data stored in the data warehouses are also subject to constant change and updates. Furthermore, due to differences in data format between various disparate data sources, the data is generally integrated into a consistent format by transformation of said data. Therefore, the data is extracted, transformed, and loaded by what is generally referred to as an Extract Transform and Loading (ETL) module, which utilizes business rules associated with each of the transformations before loading said data into the data warehouse. Due to the vast amounts of data that is transformed and stored in the data warehouses, the data needs to be analyzed and validated on a regular basis to ensure integrity and completeness of said data.
SUMMARY

(0005] This summary is provided to introduce concepts related to data validation, and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0006] In one implementation, a computer implementable method to validate data in a data system is provided. In an implementation, the method includes generating at least one test query based on a testing scenario, compiling the at least one test query to form at least one test case, storing the at least one test case in a data repository for subsequent utilization, selecting one of the stored test cases based on fresh data to be validated, and modifying and executing the selected test case.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The detalled description is described with reference, to the accompanying figures, fn the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[0008] Fig. 1 illustrates a network environment implementing a data validation system, in accordance with an implementation of the present subject matter.
[0009] Fig. 2 illustrates a data validation system for validating data stored in a data warehouse, in accordance with an implementation of the present subject matter.
[00010] Fig. 3 illustrates a method for validating data in a data warehouse, in accordance with an implementation of the present subject matter.
[00011] Fig. 4 illustrates a screenshot of a graphical user interface of the data validation system, in accordance with an implementation of the present subject matter.
DETAILED DESCRIPTION
[00012] Systems and methods for data validation are described herein. The systems and methods can be implemented in a variety of computing devices, such as, laptops, desktops,

workstations, tablet-PCs, smart phones, notebooks or portable computers, tablet computers, mainframe computers, mobile computing devices, entertainment devices, computing platforms, internet appliances and similar systems. Although the description herein is with reference to certain networks, the systems and methods may be implemented in other networks and devices, albeit with a few variations, as will be understood by a person skilled in the art. [00013] Data warehouses are utilized by many organizations and government agencies to consolidate data from across the organization, with a single consistent historical view. These data warehouses are generally optimized for reporting and analysis. The data warehouses collect and organize data for performing various functions, say strategic decision. For storing data in these data warehouses, the data is generally extracted over a period of time from various sources, such as transaction systems and databases. Upon extraction of the data, business rules can be associated with said extracted data, and subsequently the extracted data can be transformed into a format consistent with the data warehouse, where the transformed data is loaded.
[00014J Due to the volume of the data that is extracted, transformed, and stored, errors can be incurred during the process of moving data from a source (e.g., database) to a target (e.g., data warehouse). The errors can arise due to different reasons, such as incorrect implementation of business rules and errors during migration processes. Ensuring quality of the extracted and transformed data requires recurrent checks and validations. Furthermore, in order to minimize these errors, data warehouses incorporate data testing processes, where the data stored therein can be validated, for example, for integrity and completeness. Testing and validation of data is critical for the success of a data warehouse project, as users need to rely on the quality of the data stored therein.
[00015] Conventional methods involve defining test categories in order to validate the data in the data warehouses for record counts (expected vs. actual), duplicate checks, reference data validity, and referential integrity. Once the test categories have been defined, one or more test cases can be formed based on the test categories. The test cases may then be executed to validate the data. Test cases can be defined as a set of inputs, execution conditions, and expected results developed for a particular objective, such as to verify compliance with a specific requirement. Therefore, based on the type of data stored in the data warehouse, specialized test cases need to be designed and executed.

[00016] Moreover, due to the ever changing nature of businesses and industry, the historical analytical data associated with the operation of such businesses and industry, i.e., the data stored in data warehouses, is also in a constant state of change or improvement. Due to this dynamic nature of the data, the testing processes also need to keep pace. The testing processes are critical for maintaining substantially high levels of quality and effectively exposing defects therein. Therefore, there is a need for data warehouse testing technology, which can maintain pace with the ever changing data stored in data warehouses, along with reducing manual effort.
[00017] The present subject matter describes systems and methods for validating data in a data system, such as a data warehouse. The validation can be based on a plurality of criteria, such as integrity and completion checks of data that is extracted, transformed, and loaded into the data warehouse. In one implementation, a data validation system allows for a user to create one or more test cases for validating the data in the data warehouse. In one example, the test cases can be a group of test steps, which in turn can include queries. These test cases can be used to query the data warehouse to determine a quality of the data stored therein. Furthermore, the data validation system facilitates an execution of the test cases from multiple points within the data validation system. In one implementation, a single test case may be executed at a time in the data validation system.
[00018] In one implementation, the data validation system includes an in-built query builder and data repository. The query builder may be used to create the test steps of the test case, according to a test scenario, which further depends on the type of data to be tested. Generally, a test scenario refers to an overall workflow or process regarding one or more transactions. For example, a test scenario might be "using an ATM". The usage of an ATM can involve a series of steps, such as enquiring a bank balance, depositing a cheque, and withdrawing/depositing cash. There can be various stages of transactions in this scenario, and the test cases may be created based on an analysis of the above test scenario. In one implementation, the data validation system can store the test cases, for example, in the in-built data repository, hereinafter referred to as a test case repository, for easy access and execution of the test cases when required. Furthermore, in one implementation, the data validation system can be interfaced to a plurality of types of databases and transaction systems, thereby imparting enhanced adaptability and flexibility to the data validation system. The test cases can then be executed on any of these databases and transaction systems in order to validate the data stored therein.

[00019] In one implementation, during operation, the data validation system may be interfaced with a data system, such as a source data system. In one example, the source data system can be an operational database. The data validation system can then be configured to query a target source data within the source data system. The target source data can subsequently be associated with customizable business rules, and then transformed based on the associated business rules into a particular format in order to migrate the data into the data warehouse. The data warehouse in this case, may be referred to as a destination data system. The data validation that is carried out at this stage involves the query builder generating a further test case for validating or querying the data in the destination data system. The validation can include integrity and completeness checks on the data in the destination data system, to verify if the data has been properly migrated with the proper implementation of the business rules.
[00020] In one implementation, the data validation system can be configured to access previously stored test cases in the test case repository in order to validate data in a similar test scenario. In an example, the test cases may be modified as required using the query builder, and executed in the data system, i.e., either the source data system, the destination data system (the data warehouse) or a combination thereof. As mentioned earlier, in one example, a single test case can be selected from the test case repository and executed at a given point of time. By facilitating storage and easy accessibility of said stored test cases, the present subject matter provides a substantially high degree of reusability. Moreover the data validation according to the present subject matter provides a substantially high degree of flexibility and adaptability with various databases and data systems. The data validation system according to the present subject matter provides an effective framework with which to monitor data accuracy, completeness and consistency on an ongoing basis that substantially automates the data validation process in a data warehouse. As a result, manual effort required to maintain and validate the data warehouses is also substantially reduced. Furthermore, according to the present subject matter, providing said degree of automation can not only improve timelines for data warehousing, but also substantially increases confidence on the quality of data stored in the data warehouses.
[00021] In one implementation, the present subject matter allows for a creation of summary reports or test reports by which the user can be informed of an overall data quality and a summarized version of the data validation processes. These test reports can be effectively stored in a chosen file format and accessed when required.

[00022] In one implementation, according to the present subject matter, the data validation system is provided with a convenient and easy to understand graphical user interface (UI). In an example, the UI may include but is not limited to, a home page, a user configuration page, a database configuration page, a file configuration page, a report configuration page, a password management page, and a test case module page.
[00023] These and other advantages of the present subject matter would be described in greater detail in conjunction with the following figures. While aspects of described systems for data validation in data warehouses can be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[00024] Fig. 1 illustrates a network environment 100 implementing a system for data validation, according to an implementation of the present subject matter. Hereinafter, the system for data validation may be referred to as a data validation system 102. In the network environment 100, the data validation system 102 is connected to a network 104. Moreover, a source data system 105, a destination data system 106, and an extraction, transformation, and loading (ETL) system 107 are also connected to the network 104. In one implementation, the destination data system can be implemented as a data warehouse 106. In one implementation, the ETL system 107 can be configured to extract data of interest from the source data system 105, and perform transformation processes to the extracted data before loading said transformed data into the data warehouse 106. Furthermore, one or more client devices 108-1, 108-2... 108-N, collectively referred to as client devices 108, are also connected to the network 104.
[00025] The data validation system 102 can be implemented as any computing device connected to the network 104. For instance, the data validation system 102 may be implemented as mainframe computers, workstations, personal computers, desktop computers, multiprocessor systems, laptops, network computers, minicomputers, servers and the like. In addition, the data validation system 102 may include multiple servers to perform mirrored tasks for users, thereby relieving congestion or minimizing traffic.
[00026] Furthermore, the data validation system 102 is connected to the client devices 108 through the network 104. Examples of the client devices 108 include, but are not limited to personal computers, desktop computers, smart phones, PDAs, and laptops. Communication links

between the client devices 108 and the data validation system 102 are enabled through a desired form of connections, for example, via dial-up modem connections, cable links, digital subscriber lines (DSL), wireless or satellite links, or any other suitable forrn of communication.
[00027] Moreover, the network 104 may be a wireless network, a wired network, or a combination thereof. The network 104 can also be an individual network or a collection of many such individual networks interconnected with each other and functioning as a single large network, e.g., the internet or an intranet. The network 104 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet and such. The network 104 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), etc., to communicate with each other. Further, the network 104 may include network devices, such as network switches, hubs, routers, host bus adapters (HBAs), for providing a link between the data validation system 102, the data warehouse 106, the source data system 105, the ETL system 107 and the client devices 108. The network devices within the network 104 may interact with the data validation system 102, the data warehouse 106, the source data system 105, the ETL system 107 and the client devices 108 through communication links.
[00028] In one implementation, the data validation system 102 includes a query generation module 112 and a test case module 114. In one implementation, the query generation module 112 can be configured to generate at least one test query, such as an SQL query, to form part of a test case. In one example, one or more of the test queries can form a test case for a given test scenario. The test case may be used to validate data in the source data system 105, or the ETL system 107, or the data warehouse 106, or any combination thereof. In one implementation, the test case module 114 can be configured to compile the one or more test queries to form the test case based on the test scenario. The test scenario generally refers to an overall workflow or process regarding one or more transactions, Furthermore, in one implementation, the test case module 114 can further be configured to execute the test cases, based on the test scenario. For example, the test case module 114 can be configured to execute the test cases either in the data that is extracted from the source data system 105, or the extracted data once it is transformed based on specified business rules in the ETL system 107, or data that is loaded into the

destination data system such as the data warehouse 106. Hereinafter, the location of the data to be validated, either in the source data system 105, the ETL system 107, or the data warehouse
106, can be referred to interchangeably as data systems. In this manner, the test case module 114
can be configured to specify a target location to execute the test cases. In one implementation,
the test case module 114 can be configured to execute a single test case at a given point of time.
[00029] In one implementation, the user may utilize the client devices 108 to access and control the query generation module 112 and the test case module 114. In one example, the user may specify rules and constraints with which to generate the query. In another example, the user may specify the rules and constraints based on the test scenario via the client devices 108. The test scenario may also specify a set of input data and expected results to expect from running the test case. In one implementation, the user may select the source data system 105, the ETL system
107, the data warehouse 106, or any combination thereof, to validate the data through the client
devices 108 via the network 104. The manner in which the data validation system 102 validates
the data stored in a data warehouse is described in further detail in conjunction with fig. 2.
[00030] Fig. 2 illustrates the data validation system 102, in accordance with an implementation of the present subject matter. In said implements on, the data validation system 102 includes one or more processor(s) 202, interface(s) 204, and a memory 206 coupled to the processor 202. The processor 202 can be a single processing unit or a number of units, all of which could also include multiple computing units. The processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions and data stored in the memory 206.
[00031] The interfaces 204 may include a variety of software and hardware interfaces, for example, interface for peripheral device(s), such as a keyboarq, a mouse, an external memory, and a printer. Further, the interfaces 204 may enable the data validation system 102 to communicate with other computing devices, such as web servers and external data repositories in the communication network (not shown in the figure). The interfaces 204 may facilitate multiple communications within a wide variety of protocols and networks, such as a network, including

wired networks, e.g., LAN, cable, etc., and wireless networks, e.g., WLAN, cellular, satellite, etc. The interfaces 204 may include one or more ports for connecting the data validation system 102 to a number of computing devices. In one implementation, the data validation system 102 can be interfaced with the other computing devices with Open Database Connectivity (ODBC) drivers.
[00032] The memory 206 may include any computer-readable medium known in the art including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 also includes module(s) 208 and data 210.
[00033J The module(s) 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the module(s) 208 includes the query generation module 112, a test case module 114, and other module(s) 216. The other module(s) 216 may include programs or coded instructions that supplement applications and functions of the data validation system 102.
[00034] On the other hand, the data 210, inter alia serves as a repository for storing data processed, received, and generated by one or more of the module(s) 208. The data 210 includes for example, query generation data 220, test case data 222, and other data 224. The other data 224 includes data generated as a result of the execution of one or more modules in the module(s) 208.
[00035] In one implementation, the data validation system 102 validates data in a data system based on rules specified by a user. In order to validate the data in the data system, the data validation system 102 can be configured to form test cases depending on a test scenario. In another example, the query generation module 112 can be configured to create the test scenario, where the user may specify a mandatory set of inputs. In one implementation, once the test scenario is created, the rules pertaining to the test scenario can be stored in the query generation data 220. In one example, the test case can include multiple test queries. The test queries rely on an analytical approach to validating data stored in the data system. For example, the test queries may fall under various test categories, such as, data completeness and data transformation. Data completeness verifies that the data has been entirely loaded from the source data system 105, via

the ETL system 107, to the data warehouse 106, and data transformation verifies that business rules specified by the user have been correctly implemented during the transformation stage in the ETL system 107.
[00036] Furthermore, in one example, the query generation module 112 can be configured to generate the test queries. In an example, the test query can be in the form of an SQL query. The user may be provided with a list of functions in order to effectively generate a test query based on the type of data that needs to be validated. In one implementation, the test queries thus generated can be stored in the query generation data 220. In another example, the test queries can be stored in a data repository, such as a MYSQL® database.
[00037] Furthermore, in one example, the data validation system 102 may be interfaced to one or more of the source data systems 105, such as for example, an online transaction processing (OLTP) database. In one implementation, the data validation system 102 can include an extraction module (not shown in figure), configured to extract the data of interest from the one or more source data systems 105. In one example, the extraction module of the data validation system 102 can be configured to extract data irrespective of file format, i.e., fixed length files and delimited files can be extracted by the extraction module. Each of the source data systems 105 might not utilize the same data format or organizational structure. Moreover, the ETL system 107 can be configured to extract data from the source data systems 105 of various data formats and organizational structure and convert said data into a standardized format for transformation processes. The transformation process in the ETL system 107 can include but is not limited to parsing, standardization, aggregation, cleansing, reformatting, or application of one or more business rules. These business rules are generally associated with the data during the transformation stage, where the business rules define the manner in which the data must be transformed before loading into a destination data system, such as, the data warehouse 106. In one example, for a customer details table in the source data system 105, which needs to be migrated to the data warehouse 106, the user may specify a business rule in the ETL system 107 to combine a customer name field and a customer address field. Upon transformation, the ETL system 107 may be configured to load the transformed data into the data warehouse 106.
[00038] In one implementation, the test case module 114 can be configured to compile the test queries generated by the query generation module 112, to form a test case. In said

implementation, the test case module 114 can be configured to select a target location for the execution of the test case. In one implementation, the test case module 114 can be further configured to execute the test case in order to validate the data. For example, the test case can be executed either in the source database during extraction and before transformation, or after transformation in the ETL system 107 before loading in the data warehouse 106, or in the data warehouse 106 after loading, or in any combination thereof. For example, the test case module 114 can be configured to check completeness and integrity of the data before association of the business transformation rules in the ETL system 107. Furthermore, in another example, the test case module 114 can be configured to check the completeness and integrity of the data in the data warehouse 106 after transformation and loading. Moreover, in another implementation, the test case module 114 can be further configured to verify correct implementation of the business rules on transformation of the data in the ETL system 107.
[00039] In one implementation, the test case module 114 can be configured to create a summary or report containing results of the test case execution as described above. For example, if a test case pertaining to the completeness of data is executed in the data that is loaded in the data warehouse 106, the test case module 114 can be configured to extract a summary or report of the test case execution. In one example, the report can include a detailed report on case statistics such as number of entries tested, number of data mismatches, time taken to execute the test case, date and time of the test case and an overall quality of the loaded data. In another example, the user can specify an output format of the report, for example in Hyper Text Markup Language (HTML) or an excel format. In one implementation, the test case module 114 can be configured to store the reports in the test case data 222.
[00040] In one implementation, the test case module 114 can be configured to store the test cases in a data repository, such as the test case data 222. In one example, the test case module 114 can be configured to access the stored test cases in the test case data 222. Therefore, the test cases can be stored, accessed, modified and re-executed depending on the requirement of the data validation process. For example, for regression testing purposes, the test cases can be easily accessed by the data repository, modified as per the requirement of the test scenario, and executed in the target location as described earlier. In this manner, a substantially high degree of re-usability of the test cases can be provided by the present subject matter, thereby reducing the need to re-create test cases for each case of data validation. This further reduces testing times,

manual labor, and chances of errors during the data validation process. Furthermore, in one implementation, the test case module 114 can be configured to upload the test cases to a quality management system, where said quality management system can be provided with predetermined parameters in order to assess the quality of the uploaded test cases.
[00041] Fig. 3 illustrates a method 300 for validating data stored in a data warehouse 106, according to one implementation of the present subject matter. The method 300 may be implemented in a variety of computing systems in several different ways. For example, the method 300, described herein, may be implemented using the data validation system 102, as described above.
[00042] The method 300, completely or partially, may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc.,' that perform particular functions or implement particular abstract data types. A person skilled in the art will readily recognize that steps of the method can be performed by programmed computers. Herein, some embodiments are also intended to cover program storage devices, e.g., digital data storage media, which are machine or computer readable and encode machine-executable or computer-executable programs of instructions, wherein said instructions perform some or all of the steps of the described method 300.
[00043] The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternative method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the methods can be implemented in any suitable hardware, software, firmware, or combination thereof. It will be understood that even though the method 300 is described with reference to the data validation system 102, the description may be extended to other systems as well.
[00044] At block 302, a test scenario may be created depending on the data to be validated. A test scenario may pertain to a testing strategy depending on the type of data. The test scenario can include but is not limited to, type of test query to be executed, and an order in which to conduct the test queries. For example, in a data migration process, data that is present on a source

data system 105 may undergo an extraction, transformation and loading (ETL) process in a manner described earlier. During this ETL process, for example if customer details such as customer names, addresses and phone numbers, from one or more of the source data systems 105 are extracted and loaded into a destination data system, such as a data warehouse 106, the testing scenario can include but is not limited to, testing for data completeness and integrity of the extracted, transformed and loaded data. This could include a data validation process for the data from the one or more source systems 105, a data validation process for data transformed in an ETL system 107 as provided earlier, as well as data loaded into the data warehouse 106. In one implementation, a query generation module 112 of a data validation system 102 as described earlier can be configured to create the test scenario where the user can specify a mandatory set of inputs. As described earlier, once the test scenario is created, the query generation module 112 can be configured to store rules pertaining to the test scenario in the query generation data 220.
[00045] At block 304, based on the testing scenario, one or more test queries can be generated. The test queries in one example can be in the form of an SQL query. As described earlier, a list of functions may be provided in order to effectively generate a test query based on the test scenario as defined at block 302. In one example, as described earlier, the query generation module 112 of the data validation system 102 can be configured to generate the test queries and further configured to store the test queries in a data repository, such as a MYSQL® database.
[00046] At block 306, the one or more test queries can be compiled to form a test case. The test cases thus compiled are specific to the testing scenario as defined in the block 302. For example, if the testing scenario involves extraction, transformation, and loading of data pertaining to financial transactions in a bank server, the test case can include test queries relating to checking for correct implementation of transformation rules, data completeness, and data integrity. In one example, a test case module 114 of a data validation system 102 can be configured to compile the one or more test queries into the test case. Furthermore, the test case module 114 can be configured to fetch the one or more test queries from an internal data repository, such as the query generation data 220 in order to compile the test cases.
[00047] At block 308, one of the test cases can be executed at a specific location in the data systems, i.e., the source data system 105, the ETL system 107, the data warehouse 106, or any

combination thereof. As described earlier, in one example, a single test case can be executed at a given point of time. For example, in one case, the test case can be executed in the data that is extracted from the one or more source data systems 105. In another example, the test case can be executed in the data that is loaded into the destination data system, such as the data warehouse 106. In yet another example, the test case can also be executed in the data that is transformed in the ETL system 107, before loading into the data warehouse 106. In this example, the data can be checked for correct implementation of transformation or business rules as described earlier. In a further example, the test cases can be checked for quality, such as by means of a quality management system as provided earlier.
[00048] In one implementation, upon execution of the test case, as provided at block 308, a result can be generated in the form of a summarized report or the like. In one implementation, the report can include but is not limited to, a time taken for the execution of the test, date of the test case execution, number of entries tested, overall quality of the data validation, type of data validation, and the location of the test case execution. The report thus generated can be output in a configurable file format, such as HTML or Excel, and saved in a preconfigured folder. The reports generated thus can be stored and accessed when required.
[00049J At block 310, the test cases can be stored for further use. For example, the test cases can be stored in an external or internal data repository, from where the test cases can be accessed. As described earlier, in one example, the test cases can be stored in a test case data 222 of a data validation system 102. By the provision of an accessible storage of the test cases, a substantially high degree of reusability is conferred on the data validation method as provided in the present subject matter.
[00050J At block 312, for a fresh data validation one or more of the stored test cases can be accessed and one of the test cases selected. In one example, the data validation system 102 can be configured to provide a menu for selection of one of the previously stored test cases for further modification and re-execution. In another example, the test case module 114 can be configured to query the test case data 222 and fetch information pertaining to the stored test cases therein in a manner previously described.
[00051] At block 314, the selected test case can be suitably modified based on the type of data to be validated in the fresh data validation project. For example, the test case module 114 can be

configured to provide an interactive menu to the user in order for the user to suitably perform modifications on the stored test cases or test queries and save the changes in a fresh test case. Furthermore, in one implementation, the target location to execute the modified test case can be specified. For example, the test case module 114 can be configured to specify a target location of the data to be validated and execute the modified test case therein.
[00052] Therefore, in cases where a regression testing of data has to be performed, or data validation for the same or similar data has to be performed, test cases from previous data validation projects can be obtained, modified as per the requirement and re-executed. Therefore, in this manner, manual effort is substantially reduced and an overall efficiency of the data validation process can be increased.
[00053] In one implementation, an easy to understand and user friendly graphical user interface (GUI) is provided as a part of the data validation tool. The GUI is designed to provide an easy control of the data validation system 102, by a user of even average computing skill. Various options indicative of the different functions of the data validation system 102 are presented to the user in order to facilitate an easy function and control of the data validation system 102.
[00054] Fig. 4 illustrates a screenshot of a graphical user interface of the data validation system 102, in accordance with an implementation of the present subject matter. Fig. 4 illustrates a home screen 400 of the data validation system 102. The home screen 400 serves as a navigation point to other modules of the data validation system 102. In an example, the home screen 400 contains navigation keys that facilitate efficient navigation between the various modules in the data validation system 102. Further, the GUI can have other pages, icons and features for interfacing with the user.
[00055] In one example, the GUI can include a user configuration page. In one example, a user with administrator privileges on the data validation system 102 can add and edit users in this module. In one example, the user can edit access privileges of other users through the user configuration page. In another example, a database configuration page can be provided, where the user can configure connectivity of the data validation system 102 with one or more other systems. In one example, the user can configure the connectivity of the data validation system 102 with the MSQL® database, where the test cases can be stored. In a further example, a file

configuration page can be provided, where in one example, the user can set fixed length file header and footer information and also delimit file information. In yet another example, a report configuration page can be provided in the GUI, where the user can select a report path, i.e., the destination folder where the test case report will be generated, and also a format in which the test case report can be generated. Furthermore, in another example, a password management page can be provided, where in one example, the user can edit access credentials, such as user names and passwords. Moreover, a test case module page can be provided in the GUI, where the user can create, store, access, modify and execute test cases in the data validation system 102.
[00056] Although implementations of data validation in a data system have been described in language specific to structural features and/or methods, it is to be understood that the present subject matter is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as implementations for data validation in a data system.

I/We claim:
1. A computer implementable method to validate data in a data system, the method
comprising:
generating at least one test query based on a testing scenario;
compiling the at least one test query to form at least one test case;
storing the at least one test case in a data repository for subsequent utilization;
selecting one of the stored test cases based on fresh data to be validated; and
modifying and executing the selected test case.
2. The method as claimed in claim 1, wherein the method further comprises defining the testing scenario based on at least one of a location of data and a type of data.
3. The method as claimed in claim 1, wherein the executing further comprises creating a report including at least one of a time taken for the execution of the test case, date of the execution of the selected test case, number of entries tested, overall quality of the data validation, type of data validation, and the location of the execution.
4. A data validation system (102) for validating data stored in a data system, the data validation system (102) comprising:
a processor (202); and
a memory (206) coupled to the processor (202), the memory (206) comprising:
a query generation module (112) configured to generate at least one test query based on a testing scenario; and
a test case module (114) configured to:
compile the at least one test query to form at least one test case; and
store the at least one test case in a test case data (222) for subsequent access, modification and re-execution.

5. The data validation system (102) as claimed in claim 4, wherein the query generation module (112) is further configured to define the testing scenario based on a set of user inputs.
6. The data validation system (102) as claimed in claim 4, wherein the test case module (114) is configured to:
select at least one target location in the data system; and
execute the test case at the at least one target location.
7. The data validation system (102) as claimed in claim 4, wherein the test case module (114) is further configured to validate implementation of business rules during transformation of data in an extraction, transformation, and loading (ETL) system (107).
8. The data validation system (102) as claimed in claim 4, wherein the test case module (114) is further configured to create a report based on the test case execution, wherein the report comprises at least one of a time taken for the execution of the test, date of the test case execution, number of entries tested, overall quality of the data validation, type of data validation, and the location of the test case execution.
9. The data validation system (102) as claimed in claim 4, wherein the test case module (114) is further configured to upload the at least one test case to a quality management system.
10. A computer-readable medium having embodied thereon a computer program for executing a method comprising:
generating at least one test query based on a testing scenario;
compiling the at least one test query to form at least one test case;
storing the at least one test case in a data repository for subsequent utilization:
selecting one of the stored test cases based on fresh data to be validated; and
modifying and executing the selected test case.

Documents

Orders

Section Controller Decision Date

Application Documents

# Name Date
1 2839-MUM-2011-POWER OF ATTORNEY(14-11-2011).pdf 2011-11-14
1 2839-MUM-2011-US(14)-HearingNotice-(HearingDate-19-10-2020).pdf 2021-10-03
2 2839-MUM-2011-CORRESPONDENCE(14-11-2011).pdf 2011-11-14
2 2839-MUM-2011-Written submissions and relevant documents [03-11-2020(online)].pdf 2020-11-03
3 2839-MUM-2011-FORM-26 [15-10-2020(online)].pdf 2020-10-15
3 2839-MUM-2011-FORM 5(28-12-2011).pdf 2011-12-28
4 2839-MUM-2011-FORM 3(28-12-2011).pdf 2011-12-28
4 2839-MUM-2011-Correspondence to notify the Controller [12-10-2020(online)].pdf 2020-10-12
5 2839-MUM-2011-FORM 2(TITLE PAGE)-(28-12-2011).pdf 2011-12-28
5 2839-MUM-2011-CLAIMS [10-01-2019(online)].pdf 2019-01-10
6 2839-MUM-2011-FORM 2(28-12-2011).pdf 2011-12-28
6 2839-MUM-2011-COMPLETE SPECIFICATION [10-01-2019(online)].pdf 2019-01-10
7 2839-MUM-2011-FORM 18(28-12-2011).pdf 2011-12-28
7 2839-MUM-2011-FER_SER_REPLY [10-01-2019(online)].pdf 2019-01-10
8 2839-MUM-2011-OTHERS [10-01-2019(online)].pdf 2019-01-10
8 2839-MUM-2011-FORM 1(28-12-2011).pdf 2011-12-28
9 2839-MUM-2011-DRAWING(28-12-2011).pdf 2011-12-28
9 2839-MUM-2011-FORM 4(ii) [09-11-2018(online)].pdf 2018-11-09
10 2839-MUM-2011-CORRESPONDENCE(3-1-2012).pdf 2018-08-10
10 2839-MUM-2011-DESCRIPTION(COMPLETE)-(28-12-2011).pdf 2011-12-28
11 2839-MUM-2011-CORRESPONDENCE(28-12-2011).pdf 2011-12-28
11 2839-MUM-2011-FER.pdf 2018-08-10
12 2839-MUM-2011-CORRESPONDENCE(28-12-2011)-.pdf 2011-12-28
12 2839-MUM-2011-FORM 1(3-1-2012).pdf 2018-08-10
13 2839-MUM-2011-CLAIMS(28-12-2011).pdf 2011-12-28
13 ABSTRACT1.jpg 2018-08-10
14 2839-MUM-2011-ABSTRACT(28-12-2011).pdf 2011-12-28
14 Drawings.pdf 2018-08-10
15 Form-1.pdf 2018-08-10
15 Form-3.pdf 2018-08-10
16 Form-1.pdf 2018-08-10
16 Form-3.pdf 2018-08-10
17 Drawings.pdf 2018-08-10
17 2839-MUM-2011-ABSTRACT(28-12-2011).pdf 2011-12-28
18 2839-MUM-2011-CLAIMS(28-12-2011).pdf 2011-12-28
18 ABSTRACT1.jpg 2018-08-10
19 2839-MUM-2011-CORRESPONDENCE(28-12-2011)-.pdf 2011-12-28
19 2839-MUM-2011-FORM 1(3-1-2012).pdf 2018-08-10
20 2839-MUM-2011-CORRESPONDENCE(28-12-2011).pdf 2011-12-28
20 2839-MUM-2011-FER.pdf 2018-08-10
21 2839-MUM-2011-CORRESPONDENCE(3-1-2012).pdf 2018-08-10
21 2839-MUM-2011-DESCRIPTION(COMPLETE)-(28-12-2011).pdf 2011-12-28
22 2839-MUM-2011-DRAWING(28-12-2011).pdf 2011-12-28
22 2839-MUM-2011-FORM 4(ii) [09-11-2018(online)].pdf 2018-11-09
23 2839-MUM-2011-FORM 1(28-12-2011).pdf 2011-12-28
23 2839-MUM-2011-OTHERS [10-01-2019(online)].pdf 2019-01-10
24 2839-MUM-2011-FORM 18(28-12-2011).pdf 2011-12-28
24 2839-MUM-2011-FER_SER_REPLY [10-01-2019(online)].pdf 2019-01-10
25 2839-MUM-2011-FORM 2(28-12-2011).pdf 2011-12-28
25 2839-MUM-2011-COMPLETE SPECIFICATION [10-01-2019(online)].pdf 2019-01-10
26 2839-MUM-2011-FORM 2(TITLE PAGE)-(28-12-2011).pdf 2011-12-28
26 2839-MUM-2011-CLAIMS [10-01-2019(online)].pdf 2019-01-10
27 2839-MUM-2011-FORM 3(28-12-2011).pdf 2011-12-28
27 2839-MUM-2011-Correspondence to notify the Controller [12-10-2020(online)].pdf 2020-10-12
28 2839-MUM-2011-FORM-26 [15-10-2020(online)].pdf 2020-10-15
28 2839-MUM-2011-FORM 5(28-12-2011).pdf 2011-12-28
29 2839-MUM-2011-Written submissions and relevant documents [03-11-2020(online)].pdf 2020-11-03
29 2839-MUM-2011-CORRESPONDENCE(14-11-2011).pdf 2011-11-14
30 2839-MUM-2011-US(14)-HearingNotice-(HearingDate-19-10-2020).pdf 2021-10-03
30 2839-MUM-2011-POWER OF ATTORNEY(14-11-2011).pdf 2011-11-14

Search Strategy

1 2839_MUM_2011_08-02-2018.pdf