Bigdata Mismatch Identification

Abstract: At least one mismatched line from amongst one or more lines is ascertained based on source values assigned to a key corresponding to each line. The lines are read from plurality of files obtained from first data warehouse system (108-1) and second data warehouse system (108-2). A record string is created for the at least one mismatched line by combining the key corresponding to the at least one mismatched line with the source value. The key includes one or more fields of the mismatched line separated by a predefined field delimiter. The source value is one of a positive value and a negative value. A string identifier corresponding to the record string is generated using the predefined field delimiter and predetermined fields from among the one or more fields. One or more mismatched fields are determined in the at least one mismatched line based at least on the source value.

Patent Information

Application #

Filing Date

16 March 2015

Publication Number

40/2016

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-10-26

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021, India

Inventors

1. MANAKKAL, Rony Pius

Tata Consultancy Services, Inforpark, Kusumagiri Post, Kakkanad Kochi – 682030, Kerala, India

Specification

CLIAMS:1. A method for identifying mismatch in big data obtained from different data warehouse systems, the method comprising:
ascertaining, by a processor (112), at least one mismatched line from amongst one or more lines based on values assigned to a key corresponding to each line, wherein the one or more lines are read from a plurality of files obtained from a first data warehouse system (108-1) and a second data warehouse system (108-2);
creating, by the processor (112), a record string for the at least one mismatched line by combining a key corresponding to the at least one mismatched line with a source value, wherein the key includes one or more fields of the at least one mismatched line separated by a predefined field delimiter, and wherein the source value is one of a positive value and a negative value assigned for the string based on source of the mismatched line for which the string is created;
generating, by the processor (112), a string identifier corresponding to the record string using predetermined fields from among the one or more fields of the mismatched lines and the predefined field delimiter, and
determining, by the processor (112), one or more mismatched fields in the at least one mismatched line based at least on the source value.
2. The method as claimed in claim 1, wherein the ascertaining further comprises:
reading, by the processor (112), the one or more lines;
generating, by the processor (112), the key for each of the one or more lines; and
assigning, by the processor (112), the source value to the key based on source of the files.
3. The method as claimed in claim 1, wherein the determining further comprises:
combining the string identified and the record string to create string arrays, for the positive value and the negative value of the source value being equal to one; and
ascertaining the one or more mismatched fields by comparing each element of the string arrays.
4. The method as claimed in claim 3, wherein the ascertaining the one or more mismatched fields further comprises identifying mismatch in number of elements in the string array based on the comparing.
5. The method as claimed in claim 1, wherein the determining further comprises:
identifying whether any of the positive source value and the negative source value of the record string is not equal to one; and
ascertaining a unique key violation based on the identification.
6. The method as claimed in claim 1, wherein the determining further comprises:
identifying whether the record string has one of the positive value and the negative value; and
ascertaining lines missing from the first data warehouse system and the second data warehouse system based on the identification.
7. The method as claimed in claim 1, further comprising obtaining, by the processor (112), unique key fields and a predefined field delimiter based on user inputs.
8. A mismatch identification system (102) for identifying mismatch in big data obtained from different data warehouse systems, the mismatch identification system (102) comprising:
a processor (112);
a mapper module (120) coupled to the processor (112) to,
read one or more lines from a plurality of files obtained from a first data warehouse system and a second data warehouse system, wherein the plurality of files are obtained from big data stored in the first data warehouse system and the second data warehouse system;
generate a key for each of the one or more lines, wherein the key includes one or more fields of the at least one mismatched line separated by a predefined field delimiter,; and
assign a source value to the key based on source of the files, wherein the source value is one of a positive value and a negative value assigned for the string based on source of the mismatched line for which the string is created; and
a reducer module (122) coupled to the processor (112) to ascertain at least one mismatched line from amongst the one or more lines based on a summation of the source values assigned to the key.
9. The mismatch identification system (102) as claimed in claim 8, wherein the mapper module (120) further is to,
generate a string identifier using predetermined fields from among the one or more fields of the mismatched lines and the predefined field delimiter; and
create a record string for the at least one mismatched line by combining the key with the source value.
10. The mismatch identification system (102) as claimed in claim 9, wherein the reducer module (122) is to,
combine the string identified and the record string to create string arrays, for the positive value and the negative value of the source value being equal to one; and
determine mismatched fields in the at least one mismatched lines based on the source value assigned to the record string.
11. The mismatch identification system (102) as claimed in claim 10, wherein the reducer module (122) further is to ascertain the one or more mismatched fields by comparing each element of the string arrays.
12. The mismatch identification system (102) as claimed in claim 11, wherein the reducer module (122) further is to identify mismatch in number of elements in the string array based on the comparing.
13. The mismatch identification system (102) as claimed in claim 9, wherein the reduce module (122) is to,
identify whether any of the positive source value and the negative source value of the record string is not equal to one; and
ascertain a unique key violation based on the identification.
14. The mismatch identification system (102) as claimed in claim 9, wherein the reducer module (122) is to,
identify whether the record string has one of the positive value and the negative value; and
ascertain lines missing from the first data warehouse system and the second data warehouse system based on the identification.
15. The mismatch identification system (102) as claimed in claim 9, wherein the reducer module (122) writes the mismatched lines in a mismatch string stored in the comparison data (126).
16. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method of identifying mismatch in big data obtained from different data warehouse systems, the method comprising:
ascertaining at least one mismatched line from amongst one or more lines based on values assigned to a key corresponding to each line, wherein the one or more lines are read from a plurality of files obtained from a first data warehouse system (108-1) and a second data warehouse system (108-2);
creating a record string for the at least one mismatched line by combining a key corresponding to the at least one mismatched line with a source value, wherein the key includes one or more fields of the at least one mismatched line separated by a predefined field delimiter, and wherein the source value is one of a positive value and a negative value assigned for the string based on source of the mismatched line for which the string is created;
generating a string identifier corresponding to the record string using predetermined fields from among the one or more fields of the mismatched lines and the predefined field delimiter, and
determining one or more mismatched fields in the at least one mismatched line based at least on the source value.
17. The non-transitory computer-readable medium as claimed in claim 16, wherein the method further comprising:
reading the one or more lines;
generating the key for each of the one or more lines; and
assigning the source value to the key based on source of the files.
,TagSPECI:As Attached

Documents

Application Documents

#	Name	Date
1	PD012059IN-SC, FIGURES FOR FILING.pdf.pdf	2018-08-11
2	PD012059IN-SC SPEC FOR FILING.pdf	2018-08-11
3	PD012059IN-SC FORM 5.pdf	2018-08-11
4	PD012059IN-SC FORM 3.pdf	2018-08-11
5	874-MUM-2015-POWER OF ATTORNEY(8-6-2015).pdf	2018-08-11
6	874-MUM-2015-FORM 1-070415.pdf	2018-08-11
7	874-MUM-2015-CORRESPONDENCE-070415.pdf	2018-08-11
8	874-MUM-2015-CORRESPONDENCE(8-6-2015).pdf	2018-08-11
9	874-MUM-2015-FER.pdf	2019-09-27
10	874-MUM-2015-OTHERS [26-03-2020(online)].pdf	2020-03-26
11	874-MUM-2015-FER_SER_REPLY [26-03-2020(online)].pdf	2020-03-26
12	874-MUM-2015-COMPLETE SPECIFICATION [26-03-2020(online)].pdf	2020-03-26
13	874-MUM-2015-CLAIMS [26-03-2020(online)].pdf	2020-03-26
14	874-MUM-2015-PatentCertificate26-10-2023.pdf	2023-10-26
15	874-MUM-2015-IntimationOfGrant26-10-2023.pdf	2023-10-26

Search Strategy

1	SearchStrategyMatrix_27-09-2019.pdf