Data Validation

< Back

Data Validation

Abstract: The present subject matter discloses a system and a method for data validation. In one implementation, the method includes receiving at least one mapping rule and at least one conversion rule pertaining to a source data repository and a target data repository and mapping at least one field of the source data repository with at least one field of the target data repository based in part on the at least one mapping rule. The method further comprises determining the relationship between the at least one field of the source data repository and the at least one mapped field of the target data repository based in part on the at least one conversion rule and scanning the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to validate the data present in the source data repository and the target data repository.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

30 September 2011

Publication Number

19/2014

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2020-10-29

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building 9th Floor Nariman Point Mumbai Maharashtra

Inventors

1. RAJAGOPALAN Sudharsan

26 Karumari Amman Nagar Madambakkam Chennai 600073 Tamil Nadu

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: DATA VALIDATION
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Indian Nirmal Building, 9th Floor, Nariman Point, SERVICES LIMITED Mumbai 400021 Maharashtra, India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.

TECHNICAL FIELD
[0001] The present subject matter relates, in general, to data repositories and, in
particular, to validation of data in data repositories.
BACKGROUND
[0002] Migration of data from one data repository to another data repository is often
performed in an organization. As part of the migration process, the organization may optimize the physical media on which the data is stored to avail of efficient storage technologies. In order to optimize the physical storage media physical blocks of data from a physical media, such as a magnetic tape, a disk, are moved to another physical media often using data migration techniques.
[0003] In another example, the organization may choose to move data from a database
provided by one vendor, such as Sybase™, MySQL™, DB2™, SQL Server™, Oracle™ to another database provided by the same or a different vendor. In such a scenario, a physical transformation process is usually done to counter any resulting changes in the underlying data format which may affect the behavior of the data in an applications layer of the data repository. For example, changes in behavior of the application layer are dependent on whether the data manipulation language, i.e., the language used to manipulate the data repository by various operations such as inserting a record, deleting a record, modifying a record, has been changed or not.
[0004] Moreover a software tool associated with the data repository may have
undergone changes, and thus may require data in a new format for which data conversions are necessary. For example, changes made in the software tool, for instance a new customer relationship management (CRM) systems or an enterprise resource planning (ERP) systems, usually involve substantial transformation of data as almost every software tool is associated with its respective data model. Another examples of such a software tool may include, supply chain management (SCM) systems.
[0005] The transfer and conversion of data may also be driven by multiple business
requirements. These changes in the business processes of the organization may require

changes to be made in the data repository so as to reflect the changes made in the business processes, products, services, and operations.
[0006] Once the data has been migrated, a data validation process to ensure data
consistency and integrity in a source and a target data repository is performed. The data validation process may also determine if the data is in a format which is compatible with the new or upgraded software tool. For example, the data validation process may be done to ensure that the data is in accordance with the data model of the new or upgraded software tool.
SUMMARY
[0007] This summary is provided to introduce concepts related to systems and
methods data validation based on pre-defined rules and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[0008] In one implementation, the method of data validation based on pre-defined
rules includes receiving at least one mapping rule and at least one conversion rule pertaining to a source data repository and a target data repository and mapping at least one field of the source data repository with at least one field of the target data repository based in part on the at least one mapping rule. Once the mapping and the conversion rule are obtained, a relationship between the at least one field of the source data repository and the at least one mapped field of the target data repository is determined based in part on the at least one conversion rule and scanning the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to validate the data present in the source data repository and the target data repository.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The detailed description is described with reference to the accompanying
figures. In the figures, the left-most digit(s) of a reference number identifies the figure in

which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components.
[0010] Fig. 1 illustrates a network environment implementing a data validation
system, according to an embodiment of the present subject matter.
[0011] Fig. 2 illustrates an exemplary method of data validation, according to an
embodiment of the present subject matter.
DETAILED DESCRIPTION
[0012] Systems and methods for data validation based on pre-defined rules are
described herein. The systems and methods can be implemented in a variety of computing systems. Examples of such computing systems include, but are not restricted to, mainframe computers, workstations, personal computers, desktop computers, minicomputers, servers, multiprocessor systems, laptops, network servers, and the like.
[0013] Conventionally, a software tool is used to automate the various business
processes associated with the operations of an organization. Examples of such software tools include enterprise resource planning (ERP) systems, customer relationship management (CRJVI) systems, and supply chain management (SCM) systems. The software tools are usually associated with various data repositories that store various types of data, such as enterprise data and customer data. However, in due course of time the software tools may be upgraded or changed, or the data repository may be upgraded or changed. These changes may be affected for enhancing the security of the data, enhancing data storage capacity , optimize the data for faster response time, ensure compatibility of software tools using the data stored in the data repository, etc.
[0014] It is well known that software tools, such as those indicted above, are usually
associated with its own specific data model and each data repository has its own proprietary technique of storing data, and may implement its respective protocols and language. Thus, often upgradation or change in the software tools or the data repository requires migration of data from the source data repository undergoing upgradation or change, to another target data repository. Further, it should be appreciated by those skilled in the art that the source data

repository and the target data repository may be part of the same storage medium or different storage media hosted in the same computing system or different computing systems. In one example, the data may be migrated from selected blocks or sectors of a data repository to other sectors or blocks of the same data repository. In another example, the source data repository itself may undergo changes, say in terms or data structure, format of data, etc., and data validation may be carried out against a backup copy of the data repository before the changes.
[0015] The data migration phase is often followed by a data validation phase. In the
data validation phase, the data in the target data repository is validated against the data in the source data repository based on pre-defined rules. Data validation usually checks for the correctness, integrity, and security of data stored in a data repository based on various validation rules, such as data integrity rules, procedure-based business rules. In case the data which has been migrated from the source data repository to the target data repository, henceforth referred to as the migrated data, does not conform to these data validation rules, the data may not be compatible with the upgraded or new software tool. Further, data which is not validated may lead to corruption and inconsistency in the whole data repository or may compromise security of the data repository.
[0016] The conventional data validation tools are usually dependent on various
parameters such as the nature of the software tool, structure of the data repository. Thus for validating the data after data migration, various mapping and conversion rules have to be defined for defining the relationship between various fields of the source data repository and the target data repository. Hence such data validation tools have to be developed, tested and deployed every time a data migration is completed, making data validation an expensive and time consuming process.
[0017] The present subject matter describes systems and methods for data validation.
It should be appreciated by those skilled in the art that though the systems and methods for data validation are described in the context of data migration, the same should not be construed as a limitation. For example, the systems and methods for data validation may be

implemented for various other purposes, such for determining data consistency, determining compatibility of data repository with software tools.
[0018] In one implementation, the method of data validation includes configuring a
rule sheet. In said implementation, the rule sheet includes various configurable mapping rules and configurable conversion rules. The mapping rules define the corresponding fields in a source and target data repository. For example, a column named 'students' in the target data repository may correspond to a column named 'pass outs' in the target data repository. In one implementation, the rule sheet may include placeholders so as to facilitate a user to mention the names of the corresponding fields, tables, databases in the rule sheet. As would be known by those skilled in the art, a first category of the conventional data repositories store data in a rigid structure referred to as table, wherein the table comprises of one or more columns. In the said category, the data is stored in form of rows in the data repository. In one example, the columns of the source and target data repositories may differ in terms of names of the columns, structure of the column, data format of the column, etc., as would be defined by the conversion rules. Moreover, multiple columns in the source data repository may correspond to a single column of the target data repository or a single column of the source data repository may correspond to multiple columns of the target data repository as would be defined by the mapping rules. Further, the rule sheet also facilitates the user to define a default value for a field in the target repository for which a corresponding field in the source repository does not exist. The default value may be understood to be a constant as well as a value computed based in part on other values stored in the source data repository.
[0019] The rule sheet may also include various conversion rules defining the
relationship between a filed in the source data repository and a corresponding field in the target data repository. In one implementation, the relationship may be defined as a function or a procedure. Further for defining the relationship, the rule sheet may be configured to facilitate the user to define customized functions or user defined functions as well as inbuilt functions, such as expressions or functions that may be supported by the data repository. For example, if the data repository is based on Oracle™, the rule sheet may be configured to support one or more functions supported by Oracle™, such as add_months, avg, chr(n),

concat (string 1,string 2), convert (character_to_convert, new_character_set, old_character_set), count(*), decode, floor, lower(char), mod(x,y), replace(char, search_str[, replace_str]) round, substr, to_lob, to_number, to_date, tochar, translate, trim, trunk, upper. The rule sheet may be further configured to support additional rule parameters so as to provide values which may be required by the customized functions, user defined functions and in-built functions which define the relationship between a field in the source data repository and a corresponding field in the target data repository.
[0020] The method of data validation further includes validating the values in
corresponding fields of the source and the target data repository based on the configured mapping rules and conversion rules defined in the rule sheet. In one implementation, the data stored in the data repositories may be implicitly parsed so as to convert the values of corresponding mapped fields stored as different data types or formats or stored as data types of different lengths to a base data type and base data format. In said implementation, the data validation may be done by generating a hash or a signature corresponding to each of the values stored in the source and the target data repository and comparing the same to determine either a match or a mismatch. In one implementation, the validation of data may be carried out in various scan modes such as table scan, row scan, rule based scan, dual scan, sample scan. In table scan mode, the data validation is performed on all the entire corresponding tables of the source data repository and the target data repository, whereas in the row scan mode, the data validation is performed on each row of the source data repository and the target data repository. In rule based scan, the values in a row of the source data repository and values in the corresponding mapped row of the target data repository are validated based on specified mapping and conversion rules. In one example, the specified mapping and conversion rules may be identified by a unique rule identification code. Further, the user may also save any combination of specified mapping and conversion rules as a rule group identified by a unique rule group identification code. In sample scan mode, a selected portion of the data, for example a random sample of data, stored in the data repositories may be validated. For example, in one implementation, the user may select the portion of data to be validated using various selection parameters such as, percentage of total number of records stored in the data repositories, a specific number of records stored in the data repositories,

specific records selected by the user. Based on the evaluation, various data evaluation reports may be generated. For example, a random sample scan may be performed to evaluate a portion of the data and based on the same a sample data evaluation report be generated wherein the report may include, for example, the total rows scanned, total number of conversion and mapping rules evaluated, the total number of conversion and mapping rules validated.
[0021] Thus, the systems and methods for data validation reduce time, costs and risks
involved in validating data in the source and target data repository. In the said systems and methods for data validation, the rules need not be defined for every data migration. For validating the data after data migration, the placeholders of the rule sheet may be updated on a case to case basis. Thus the conversion rules, which have been defined as user defined functions, may be used again. This saves costs and time involved in developing the rule sheet to be used for data validation. These and other features of the present subject matter would be described in greater detail in conjunction with the following figures. While aspects of described systems and methods for the data validation can be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system(s).
[0022] Fig. 1 illustrates a network environment 100 implementing a data validation
system 102, according to an embodiment of the present subject matter. In said embodiment, the network environment 100 includes the data validation system 102 designed to validate data based on pre-defined rules. In one implementation, the data validation system 102 may be included within an existing information technology infrastructure or an existing software tool of an organization. The data validation system 102 may be implemented in a variety of computing systems such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the data validation system 102 may be accessed by users through one or more client devices 104-
1, 104-2, 104-3 104-N, collectively referred to as client devices 104. Examples of the
client devices 104 include, but are not limited to, a desktop computer, a portable computer, a mobile phone, a handheld device, a workstation. As shown in the figure, such client devices

104 are communicatively coupled to the data validation system 102 through a network 106 for facilitating one or more end users to access and operate the data validation system 102.
[0023] In one example, various computing systems 107-1, 107-2, 107-3.... 107-N,
such as a enterprise resource planning (ERP) system, a customer relationship management (CRM) system, and a supply chain management (SCM) system, henceforth collectively referred to as the application servers 107 and singularly referred to as the application server 107, may be connected to the network 106. The application server 107 may be implemented in form of a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. The application servers 107 may be configured to host and run various software tools used by the organization to conduct its business and/or automate business processes. It should be appreciated by those skilled in the art that the application servers may be configured for dedicatedly hosting a single software tool or may be configured to host and run multiple software tools. Further, a single software tool may be hosted in multiple application servers 107 so as to distribute the load on the application servers 107 using conventionally known load balancing techniques or to provide redundancy in case of failure of an application server 107. The client devices 104 may communicate with the application servers 107 either directly or through the network 106.
[0024] Each of the application servers 107 comprises their respective data. For
example, the application server 107 hosting the CRM system may comprise client related data, while the ERP system may comprise enterprise resource related data. This data may either be stored in a local memory component of each of the application servers 107 or may be present in an external data store (not shown in figure) that may be coupled to the respective application server 107 either directly or through the network 106.
[0025] The network 106 may be a wireless network, wired network or a combination
thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol

(TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.
[0026] In one implementation, the data validation system 102 includes a processor(s)
108, input-output (I/O) interface(s) 110, and a memory 112. The processor(s) 108 are electronically coupled to the memory 112. The processor(s) 108 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) 108 are configured to fetch and execute computer-readable instructions stored in the memory 112.
[0027] The I/O interface(s) 110 may include a variety of software and hardware
interfaces, for example, a web interface, a graphical user interface, etc., allowing the data validation system 102 to interact with the client devices 104. Further, the I/O interface(s) 110 may enable the data validation system 102 to communicate with other computing devices, such as web servers and external data servers (not shown in figure). The I/O interface(s) 110 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example LAN, cable, etc., and wireless networks such as WLAN, cellular, or satellite. The I/O interface(s) 110 may include one or more ports for connecting the data validation system 102 to a number of devices to or to another server.
[0028] The memory 112 can include any computer-readable medium known in the art
including, for example, volatile memory (e.g., RAM), and/or non-volatile memory (e.g., EPROM, flash memory, etc.). In one embodiment, the memory 112 includes module(s) 114 and program data 116. The module(s) 114 further include a rule validation module 118, a scanning module 120, a report generation module 122, and other module(s) 124. It will be appreciated that such modules may be represented as a single module or a combination of different modules. Additionally, the memory 112 further includes data 116 that serves, amongst other things, as a repository for storing data fetched processed, received and generated by one or more of the module(s) 114. The data 116 includes, for example, mapping rules 126, conversion rules 128, scanning modes 130 and other data 132. Additionally, the

aforementioned data can be organized using data models, such as relational or hierarchical data models.
[0029] As explained previously, each of the application servers 107 comprises their
respective data which may either be stored in a local memory component of each of the application servers 107 or may be present in an external data store. In one example depicted in the figure, it may be considered that data associated with the application server 107-1 is stored in a source data repository 134-1 and is migrated to a target data repository 134-2.
[0030] The source data repository 134-1 and the target data repository 134-2 are
coupled to the data validation system 102 and are collectively referred to as the data repositories 134. Further, though the source data repository 134-1 and the target data repository 134-2 have been shown in Fig.l as separate entities, it should be appreciated by those skilled in the art that the source data repository 134-1 and the target data repository 134-2 may also be a single physical entity. For example, the source data repository 134-1 and the target data repository 134-2 may different sections of a large database. Moreover, in an implementation, the source data repository 134-1 and the target data repository 134-2 may be an integral part of the data validation system 102.
[0031] The source data repository 134-1 may store various data generated and/or used
by the application servers 107-1 for providing various functionalities of the application servers 107-1. In said example, the application servers 107-1 communicate with the source data repository 134-1 over the network 106. In another example, the application servers 107-1 may be connected directly with the source data repository 134-1. It would be appreciated by those skilled in the art that the application servers 107-1 and the data repositories 134 may be implemented in the same computing system or different computing systems. The source data repository 134-1 may be storing data in various formats such as relational tables, object oriented relational tables, indexed tables.
[0032] In one example the data from the source data repository 134-1 may be
migrated to target data repository 134-2. The migration may take place due to various reasons, such as upgradation of existing software tools or installation of new software tools in the

application servers 107, changing the data structure or data format of the stored data for added security or compatibility with the upgraded or new software tools. In another example, the source data repository 134-1 and the target data repository 134-2 may correspond to data repositories provided by different vendors such as Sybase™, MySQL™, DB2™, SQL Server™, Oracle™. Further, it should be appreciated by those skilled in the art, that though the data validation is explained in context of data migration from the source data repository 134-1 to the target data repository 134-2, the same should not be construed as a limitation. Data migration may also be done within the same data repository
[0033] As mentioned previously, the source data repository 134-1 may be storing data
in various formats such as relational tables, object oriented relational tables, indexed tables and it is possible that this format of storing data is varied from the format in which the target data repository 134-2 may stored data. Accordingly, during migration various checks such as cardinality checks, consistency checks, data type checks, hash totals, uniqueness checks are done to ensure data consistency. Post migration validation is done to ensure cross system consistency, i.e. consistency of data present in the source data repository 134-1 and the target data repository 134-2. Validation may also be done for other reasons, such as ensuring compliance with the software tools running on the application servers 107, checking for security vulnerabilities, etc.
[0034] The data validation system 102 provided for validation of data migrated from
the source data repository 134-1 to the target data repository 134-2. It is only for the sake of explanation that the operation of the data validation system 102 is explained in the context of an exemplary scenario in which data is migrated from the source data repository 134-1 to the target data repository 134-2 and should not be construed as a limitation.
[0035] In operation, the data validation system 102 facilitates the user to define
mapping rules 126 using the client devices 104. Each of the defined mapping rules 126 may be identified by a unique rule identification code or rule ID. The mapping rules 126 define the corresponding fields in the source data repository 134-1 and the target data repository 134-2. For example a table named 'customerdetails' in a database named 'customers' in the source data repository 134-1 may correspond to the table name 'customer' in a database named

'customer_master' in the target data repository 134-2. Further, in the said example, the columns named 'customer_no', 'titlecode', 'customername', 'address', 'creditrating', 'dob' of the table 'customerdetails' may correspond to the columns named 'customer_reference', 'salutation', 'name', 'shipping address', 'score', 'dateofbirth' of the table named 'customer'. In said example, the column named 'activitystatus' of the table 'customer' may not have a corresponding column in the table named 'customer_details'. In one implementation, the data validation system 102 may include placeholders so as to facilitate the user to specify the names of the corresponding fields, tables, databases, also referred to as mapped fields, of the data repositories 134 in the mapping rules. Further multiple fields of the source data repository 134-1 may be mapped with a single field of the target data repository 134-2. Alternatively, a single field of the source data repository 134-1 may be mapped with multiple fields of the target data repository 134-2. The mapping rules 126 makes the data validation process as implemented by the data validation system 102 independent of the number of fields and/or the size of the fields being considered for data validation. Table 1 below shows an exemplary denotation of mapping rules.

Source Database Source Table Target Database Target Table
customers customer details customer master customer
Source Column Target Column
customerno customer_reference
titlecode Salutation
customername Name
Address shipping address

credit rating
Score
Dob date_of_birth
activity_status
Table-1
[0036] The data validation system 102 further facilitates the user to define various
conversion rules 128 defining the relation between the values stored in a field in the source data repository 134-1 and the value stored in the corresponding mapped field in the target data repository 134-2. Each of the defined conversion rules 128 may be identified by a unique rule identification code or rule ID. The conversion rules 128 may be in form of in-built functions, procedures, user-defined functions, functions supported by the data manipulation language of the data repositories 134, etc. The conversion rules 128 ensure the consistency of values in the data repositories 134 despite differences in data format, data type, etc., that may exist between the data repositories 134. For example, in one implementation, a conversion rule 128 may state that the values in the corresponding fields may be equal; another conversion rule 128 may state that the value in the source data repository 134-1 is to be used to query a lookup table and the resultant value in the lookup table is to be used to verify the value in the corresponding field in the target data repository 134-2. Examples of other conversion rules 128 include truncating a portion of the value in the source data repository 134-1 to obtain the value stored in the target data repository, etc. Table 2 is an exemplary table depicting the conversion rules.

Conversion Additional
Source Column Rule Function Parameters Target Column

customerno
Equal customer_reference
title_code Map 01, Mr., 02, Ms., 03, Mrs., 04, Dr. salutation
customername Substr 1,40 name
address Equal shipping address
credit rating equal score
dob dateofbirth
1 Value activity_status
Table -2
[0037] As shown in Table 2, the conversion rules 128 state that the values in the
column named customerno would be same as the column named customer_reference, the values in the column named address would be same as the values in the column named shipping address, the values in the column named customername would be same as the values in the column named name. However for the values stored in the column named credit rating only the first 40 characters would be stored in the column named score. As mentioned in the Table 2, the conversion rule function defined for the column named creditrating is substr. As would be known to those skilled in the art, the function substr generates a new string, which is a sub-string of an old string. The substr function requires two arguments, a first argument denoting the starting character number of the old string and a second argument denoting the ending character number of the old string. All the characters of the old string in between and inclusive of the first and second argument are used to generate the new string. In said implementation, the first argument and the second argument are stored as additional

parameters and are separated by a delimiter. Examples of delimiters may include but are not limited to comma, semi colon, colon, back slash, special characters. As mentioned earlier, the conversion rules 128 may also be an in-built or user defined function such as map. In said example, the map function may be in the format of map (argument 1, argument 2), wherein if argument 1 is present in any value of the source data repository 134-1, the argument 1 is replaced by the argument 2 in the target data repository 134-2. As mentioned earlier, the arguments for the map function are stored as additional parameters and are separated by a delimiter.
[0038] Examples of conversion rules 128 include null to value mapping, wherein a
null value in any field of the source data repository 134-1 is stored as a specified value in the corresponding mapped field of the target data repository 134-2; value to null mapping, wherein a specified value in the source data repository 134-1 is stored as a null value in the corresponding mapped field of the target data repository 134-2.
[0039] To perform data validation, the user may configure the rule validation module
118 to initiate the validation of data stored in the source data repository 134-1 and the target data repository 134-2. The rule validation module 118 may be configured to retrieve various mapping rules 126 and conversion rules 128 to initiate the validation of data. In one example, the defined mapping rules 126 and the conversion rules 128 may be uploaded by the user in various formats such as tabular format, spreadsheet, to the data validation system 102 on the fly and may be stored in the mapping rules 126 and the conversion rules 128 for future use during the process of data validation. Further, the data validation system 102 may be configured to deactivate any pre-existing mapping rules 126 and the conversion rules 128 associated with the data repositories 134 on uploading any new mapping rules 126 and the conversion rules 128. In another implementation, the user may be facilitated to deactivate selected mapping rules 126 and the conversion rules 128 associated with the data repositories 134 on the uploading any new mapping rules 126 and the conversion rules 128. The scanning module 120 initiates the scanning of the data repositories 134.
[0040] In one implementation, various scanning modes are may be defined as
scanning modes 130. In one implementation, the scanning module 120 may be configured to

scan the data in the data repositories 134 in various scan modes such as table scan, row scan, rule based scan, dual scan, sample scan. In table scan mode, the scanning module 120 scans entire corresponding tables of the source data repository 134-1 and the target data repository 134-2, whereas in the row scan mode, the scanning module 120 scans each row of the source data repository 134-1 and the target data repository 134-2 to validate the data. Thus in table scan mode, the data validation system 102 would notify the user if a mapping rule 126 or a conversion rule 128 fails, however the data validation system 102 would not be able to detect the exact record(s) due to which the said rule failed. In row scan, since the validation is carried out in more depth, the data validation system 102 would notify the user of the exact record(s) which caused a mapping rule 126 or a conversion rule 128 to fail. Moreover, in the rule based scan, the scanning module 120 validates the values in the data repositories 134 based on mapping rules 126 and conversion rules 128 selected by the user. For example, the user may select the mapping rules 126 and the conversion rules 128 based on an associated rule ID.
[0041] In the sample scan mode, the scanning module 120, instead of validating the
whole data stored in the data repositories 134, validates a selected portion of the data stored in the data repositories. The user may configure to select the portion of data to be validated using various selection parameters such as, percentage of total number of records stored in the data repositories 134, a specific number of records stored in the data repositories 134, specific records selected by the user. In one implementation, the scanning module 120 may be configured to determine one or more hashes or signatures corresponding to the data stored in each of the source data repository 134-1 and the target data repository 134-2 and compare the generated hashes or signatures to validate the data.
[0042] If the generated hashes or the values in the corresponding mapped fields of the
data repositories 134 are in accordance with the mapping rules 126 and the conversion rules 128, the values are deemed to be valid and the mapping rules 126 and the conversion rules 128 are said to have passed for the said values. In case the values in the corresponding mapped fields of the data repositories 134 do not conform to the mapping rules 126 and the conversion rules 128, the values are said to be invalid and the mapping rules 126 and the

conversion rules 128 are said to have failed for the said values. Further, in one implementation, the scanning module 120 may be configured to implicitly parse the data stored in the data repositories 134 so as to convert the values of corresponding mapped fields stored in the data repositories 134 as different data types or formats or stored as data types of different lengths to a base data type and base data format before generating the hashes. This makes the data validation process independent of changes made in data structure or data format or data type of any field in either of the repositories 134. In yet another implementation, the scanning module 120 may be configured to generate queries which when run on the data repositories 134 perform the data validation. For example, the scanning module 120 may generate queries in a language compatible with the data repositories 134, such as structured query language (SQL), which may be run on the data repositories 134 for validating the data stored in the data repositories 134.
[0043] Based on the scanning, the report generation module 122 may be configured to
generate various data evaluation reports based on the scanning. In one implementation, the report generation module 122 may associate a unique identification code, for example an alphanumeric string, with each scan of the scanning module 120. The various data evaluation reports generated by the report generation module 122 may be indicative of the total rows scanned, total number of conversion rules 128 and mapping rules 126 evaluated, the total number of conversion and mapping rules validated. As would be understood by those skilled in the art, that the data validation system 102 may also be run in various other modes such as in table scan mode or row scan mode for a specified rule(s) or sample scan mode for specified rule(s).
[0044] Thus the data validation system 102 reduces costs, time and risks involved in
validating data after migration to a new data repository or after upgrading to a new version or a new software tool. Further, the data validation system 102 does not store any data for the purpose of data validation. Also, the mismatched records of the data repositories 134 may be stored as hashes, thus ensuring data security and data confidentiality during the data validation process. The data validation system 102 may also be configured to be simultaneously accessible to multiple users through various client devices 104 or, in one

embodiment, for security purposes, may be accessible to selected individuals for example, the data analyst or the data base administrator. Moreover, in one implementation, the data validation system 102 may be configured to support remote objects. For example, if the data repositories 134 are Oracle™ databases, the data validation system 102 may facilitate the user to define various remote objects such as tables, views, and procedures throughout a distributed database. The defined remote objects may be referenced in SQL queries using global object names. For example, in Oracle™, the global name of a remote object comprises the name of the schema that contains the object, the object name, followed by an '@' symbol, and a database name. Though the operation of the data validation system 102 has been explained in the context of validation data after migration, the same should not be construed as a limitation and the described concepts may be used for other purposes as well. For example, the data validation system 102 may be used to validate data after critical or major updates or patches and fixes albeit modifications as will be understood by those skilled in the art.
[0045] Fig. 2 illustrates an exemplary method of data validation implemented by the
data validation system 102, according to an embodiment of the present subject matter. The method 200 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 200 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0046] The order in which the method 200 is described is not intended to be construed
as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200 or alternative methods. Additionally, individual blocks may be deleted from the method 200 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method 200 can be implemented in any suitable hardware,

software, firmware, or combination thereof. The method 200 is presently provided for data validation.
[0047] At block 202, the details pertaining to the source data repository 134-1 and the
target data repository 134-2 is received. The details may include name and location of the data repositories for example in form of a network address, the names of the databases and tables of the data repositories. In one example, the relationships between the data repositories 134 may be defined by the user in form of mapping rules 126 and conversion rules 128. In one implementation, the user may define the mapping rules 126 and conversion rules 128 in various formats such as tabular format, indexed file, and spreadsheets and upload the same.
[0048] At block 204, the various mapping rules 126 and conversion rules 128
corresponding to the data repositories 134 are retrieved. The mapping rules define the mapping of fields between the source data repository 134-1 and the target data repository 134-2. The conversion rules 128 define the relationship between the fields of the source data repository 134-1 and corresponding mapped fields of the target data repository 134-2. The conversion rules 128 also account for differences between the to data repositories 134 for example in terms of data representation, data format, data type and so on.
[0049] As depicted in clock 206, at least one field of the source data repository is
mapped to at least one field of the target data repository based on the defined mapping rules. The mapping rules may be configured to map multiple fields in the source data repository 134-1 to a single field of the target data repository 134-2 or map a single field of the source data repository 134-1 to multiple fields of the target data repository 134-2. In one implementation, the mapping rules may be defined in form of a spreadsheet with placeholders for the user to fill in the names of the corresponding mapped fields of the source data repository 134-1 and target data repository 134-2.
[0050] As illustrated in block 208, the data stored in the corresponding fields of the
data repositories is evaluated based in part on the mapping rules 126 and the conversion rules 128. In one implementation, the data in the data repositories 134 may be scanned in various formats such as row scan, table scan, rule based scan, sample scan to evaluate the data stored

in the data repositories. The evaluation of the data may also be understood to include retrieving a value from a field of the source data repository 134-1, query a lookup table using the retrieved value to obtain a resultant value, and compare the resultant value with a value stored in the corresponding mapped field of the target data repository 134-2.
[0051] As depicted in block 210, the validation of data stored in the data repositories
134 is performed based on the evaluation. In one implementation, the data stored in the data repositories 134 may be implicitly parsed so as to convert the values of corresponding mapped fields stored in the data repositories as different data types or formats or stored as data types of different lengths to a base data type and base data format. Further, a hash or signature corresponding to the data or the portion of data may be generated for each of the data repositories 134. The same may be compared to perform the validation.
[0052] As shown in block 212, various data evaluation reports may be generated
based on the validation. In one implementation, every scan for validating the data in the data repositories 134 may be identified by a unique scan identification code, for example an alphanumeric string. The data evaluation reports may be indicative of the type of scan, date and time of the scan, the total rows scanned, total number of conversion and mapping rules evaluated, the total number of conversion and mapping rules validated.
[0053] Thus the data validation system 102 facilitates data validation of data stored in
the source data repository 134-1 and the target data repository 134-2. The systems and method for rule validation as described in the present subject matter are generic and platform independent and thus can be used for various types of systems. For example, by appropriate modification of the mapping rules and conversion rules, the same user defined functions used to validate the data for a core banking system may be reused to validate data pertaining to a human resource management system. Hence time and costs involved in validating data after a data migration process or after the upgradation of a software tool or the installation of a new software tool is reduced.

I/We claim:
1. A method for data validation, the method comprising:
receiving at least one mapping rule and at least one conversion rule pertaining to a source data repository and a target data repository;
mapping at least one field of the source data repository with at least one field of the target data repository based in part on the at least one mapping rule;
determining the relationship between the at least one field of the source data repository and the at least one mapped field of the target data repository based in part on the at least one conversion rule; and
scanning the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to validate the data present in the source data repository and the target data repository.
2. The method as claimed in claim 1, wherein the scanning further comprises generating at least one hash corresponding to each of the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository.
3. The method as claimed in claim 1, wherein the scanning further comprises converting, by implicit parsing, each of the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to at least one of a base data format, base data type and base data length.
4. The method as claimed in claim 1, wherein the scanning is done in at least one of a table scan mode, row scan mode, rule based scan mode and a sample scan mode.
5. The method as claimed in claim 1, wherein the at least one mapping rule and at least one conversion rule are. defined in at least one of a tabular format, indexed file format, spreadsheet format.
6. The method as claimed in claim 1, wherein the at least one mapping rule and at least one conversion rule are defined in form of at least one of a user defined function and an inbuilt function.
7. A data validation system (102) comprising:
a processor (108); and
a memory (112) coupled to the processor (108), the memory (112) comprising: a rule validation module (118) configured to:

map at least one field of a source data repository (134-1) with at least one field of a target data repository (134-2) based on at least one mapping rule (126);
determine the relationship between the at least one field of a source data repository (134-1) and the at least one mapped field of the target data repository (134-2) based on at least one conversion rule (128); and
a scanning module (120) configured to validate the values stored in the at least one field of a source data repository (134-1) and the at least one mapped field of the target data repository (134-2).
8. The data validation system (102) as claimed in claim 7 wherein the scanning module
(120) is further configured to:
generate at least one hash based on each of the values stored in the at least one field of a source data repository (134-1) and the at least one mapped field of the target data repository (134-2); and
compare the generated hashes to determine at least one of a match and a mismatch between the values stored in the at least one field of a source data repository (134-1) and the at least one mapped field of the target data repository (134-2).
9. The data validation system (102) as claimed in claim 7, wherein the scanning module (120) is further configured to scan the data stored in at least one of the source data repository (134-1) and the target data repository (134-2) in at least one of a table scan mode, row scan mode, rule based scan mode, dual scan mode, sample scan mode.
10. The data validation system (102) as claimed in claim 7, wherein the scanning module (120) is further configured to convert, by implicit parsing, the each of the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to at least one of a base data format, base data type and base data length.
11. The data validation system (102) as claimed in claim 7, wherein the at least one conversion rule 128 is at least one of an inbuilt function, a customized function and a user defined function.
12. The data validation system (102) as claimed in claim 7 further comprising a report generation module (122) configured to generate at least one data evaluation report indicative of at least one of the total rows scanned, total number of conversion and mapping rules evaluated and the total number of conversion and mapping rules validated based on the validation.

13. A computer-readable medium having embodied thereon a computer program for executing a method comprising:
retrieving at least one mapping rule and at least one conversion rule pertaining to a source data repository and a target data repository;
mapping at least one field of the source data repository with at least one field of the target data repository based in part on the at least one mapping rule;
determining the relationship between the at least one field of the source data repository and the at least one mapped field of the target data repository based in part on the at least one conversion rule; and
scanning the values present in the at least one field of the source data repository and the mapped at least one field of the target data repository to validate the data present in the source data repository and the target data repository.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2804-MUM-2011-FORM 18(24-10-2011).pdf	2011-10-24
1	2804-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26
2	2804-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
2	2804-MUM-2011-CORRESPONDENCE(24-10-2011).pdf	2011-10-24
3	2804-MUM-2011-POWER OF ATTORNEY(13-12-2011).pdf	2011-12-13
3	2804-MUM-2011-IntimationOfGrant29-10-2020.pdf	2020-10-29
4	2804-MUM-2011-PatentCertificate29-10-2020.pdf	2020-10-29
4	2804-MUM-2011-CORRESPONDENCE(13-12-2011).pdf	2011-12-13
5	Form-3.pdf	2018-08-10
5	2804-MUM-2011-Written submissions and relevant documents [15-09-2020(online)].pdf	2020-09-15
6	Form-1.pdf	2018-08-10
6	2804-MUM-2011-Correspondence to notify the Controller [10-08-2020(online)].pdf	2020-08-10
7	Drawings.pdf	2018-08-10
7	2804-MUM-2011-US(14)-HearingNotice-(HearingDate-01-09-2020).pdf	2020-07-31
8	ABSTRACT1.jpg	2018-08-10
8	2804-MUM-2011-CLAIMS [29-01-2019(online)].pdf	2019-01-29
9	2804-MUM-2011-FORM 1(2-11-2011).pdf	2018-08-10
9	2804-MUM-2011-COMPLETE SPECIFICATION [29-01-2019(online)].pdf	2019-01-29
10	2804-MUM-2011-DRAWING [29-01-2019(online)].pdf	2019-01-29
10	2804-MUM-2011-FER.pdf	2018-08-10
11	2804-MUM-2011-CORRESPONDENCE(2-11-2011).pdf	2018-08-10
11	2804-MUM-2011-FER_SER_REPLY [29-01-2019(online)].pdf	2019-01-29
12	2804-MUM-2011-OTHERS [29-01-2019(online)].pdf	2019-01-29
13	2804-MUM-2011-CORRESPONDENCE(2-11-2011).pdf	2018-08-10
13	2804-MUM-2011-FER_SER_REPLY [29-01-2019(online)].pdf	2019-01-29
14	2804-MUM-2011-DRAWING [29-01-2019(online)].pdf	2019-01-29
14	2804-MUM-2011-FER.pdf	2018-08-10
15	2804-MUM-2011-COMPLETE SPECIFICATION [29-01-2019(online)].pdf	2019-01-29
15	2804-MUM-2011-FORM 1(2-11-2011).pdf	2018-08-10
16	2804-MUM-2011-CLAIMS [29-01-2019(online)].pdf	2019-01-29
16	ABSTRACT1.jpg	2018-08-10
17	2804-MUM-2011-US(14)-HearingNotice-(HearingDate-01-09-2020).pdf	2020-07-31
17	Drawings.pdf	2018-08-10
18	2804-MUM-2011-Correspondence to notify the Controller [10-08-2020(online)].pdf	2020-08-10
18	Form-1.pdf	2018-08-10
19	2804-MUM-2011-Written submissions and relevant documents [15-09-2020(online)].pdf	2020-09-15
19	Form-3.pdf	2018-08-10
20	2804-MUM-2011-PatentCertificate29-10-2020.pdf	2020-10-29
20	2804-MUM-2011-CORRESPONDENCE(13-12-2011).pdf	2011-12-13
21	2804-MUM-2011-POWER OF ATTORNEY(13-12-2011).pdf	2011-12-13
21	2804-MUM-2011-IntimationOfGrant29-10-2020.pdf	2020-10-29
22	2804-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
22	2804-MUM-2011-CORRESPONDENCE(24-10-2011).pdf	2011-10-24
23	2804-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26
23	2804-MUM-2011-FORM 18(24-10-2011).pdf	2011-10-24

Search Strategy

1	search_08-06-2018.pdf

Data Validation

Patent Information

Applicants

TATA CONSULTANCY SERVICES LIMITED

Inventors

1. RAJAGOPALAN Sudharsan

Specification

Documents

Orders

Application Documents

Search Strategy

ERegister / Renewals

3rd: 01 Nov 2020

From 30/09/2013 - To 30/09/2014

4th: 01 Nov 2020

From 30/09/2014 - To 30/09/2015

5th: 01 Nov 2020

From 30/09/2015 - To 30/09/2016

6th: 01 Nov 2020

From 30/09/2016 - To 30/09/2017

7th: 01 Nov 2020

From 30/09/2017 - To 30/09/2018

8th: 01 Nov 2020

From 30/09/2018 - To 30/09/2019

9th: 01 Nov 2020

From 30/09/2019 - To 30/09/2020

10th: 01 Nov 2020

From 30/09/2020 - To 30/09/2021

11th: 12 Aug 2021

From 30/09/2021 - To 30/09/2022

12th: 02 Sep 2022

From 30/09/2022 - To 30/09/2023

13th: 14 Sep 2023

From 30/09/2023 - To 30/09/2024

14th: 12 Sep 2024

From 30/09/2024 - To 30/09/2025

15th: 19 Sep 2025

From 30/09/2025 - To 30/09/2026

Data Validation

Patent Information

Applicants

TATA CONSULTANCY SERVICES LIMITED

Inventors

1. RAJAGOPALAN Sudharsan

Specification

Documents

Orders

Application Documents

Search Strategy

ERegister / Renewals Inforce

3rd: 01 Nov 2020

From 30/09/2013 - To 30/09/2014

4th: 01 Nov 2020

From 30/09/2014 - To 30/09/2015

5th: 01 Nov 2020

From 30/09/2015 - To 30/09/2016

6th: 01 Nov 2020

From 30/09/2016 - To 30/09/2017

7th: 01 Nov 2020

From 30/09/2017 - To 30/09/2018

8th: 01 Nov 2020

From 30/09/2018 - To 30/09/2019

9th: 01 Nov 2020

From 30/09/2019 - To 30/09/2020

10th: 01 Nov 2020

From 30/09/2020 - To 30/09/2021

11th: 12 Aug 2021

From 30/09/2021 - To 30/09/2022

12th: 02 Sep 2022

From 30/09/2022 - To 30/09/2023

13th: 14 Sep 2023

From 30/09/2023 - To 30/09/2024

14th: 12 Sep 2024

From 30/09/2024 - To 30/09/2025

15th: 19 Sep 2025

From 30/09/2025 - To 30/09/2026

ERegister / Renewals