Storage Of Structured Data Onto Distributed Data Storage

Abstract: System(s) and method(s) for processing and storage of structured data onto a columnar database hosted on a distributed data storage are described. According to the present subject matter, a mapping system (102) and a reducing system (104) are described for processing and storage of structured data. The described systems implement methods that include receiving a segment of the structured data including at least one record. Further, the method includes determining a primary set of columns and a secondary set of columns from amongst a plurality of columns of each record. The method also includes re-arranging values of the secondary set of columns in a pre-defined order and generating an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, where the intermediate key and the intermediate value form an intermediate key-value pair.

Patent Information

Application #

Filing Date

24 March 2014

Publication Number

40/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-04-18

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. OMATHIL, Gelesh George

Omathil House, Meemutty PO, Kozhikode, Kerala, 673580

Specification

CLIAMS:1. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage , the method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for processing to store onto the columnar database.

2. The method as claimed in claim 1, wherein the pre-determined order is based on lexicographic arrangement of column families and their associated qualifiers.

3. The method as claimed in claim 1, wherein the pre-defined order is based on nature of the columnar database to which the structured data is stored.

4. The method as claimed in claim 1, wherein values of the row key uniquely identifies the at least one record of the structured data.

5. The method as claimed in claim 4, wherein the row key is a composition of a list of values.

6. The method as claimed in claim 1, wherein the distributed data storage is a Hadoop distributed file system (HDFS) and the columnar database is HBase.

7. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage, the method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to the intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in the pre-defined order for generating the set of output key-value pairs.

8. The method as claimed in claim 5, wherein the method further comprises emitting the output key-value pairs into a pre-defined format.

9. The method as claimed in claim 8, wherein the method further comprises emitting into Hfiles, the output key-value pairs in a First-in-First-out (FiFo) order.

10. The method as claimed in claim 7, wherein the method further comprises storing the output key-value pairs onto the distributed data storage.

11. The method as claimed in claim 7, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.

12. The method as claimed in claim 7, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.

13. A mapping system (102) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the mapping system (102) comprising:
a processor (112-1);
a communication module (126) coupled to the processor (112-1) to receive a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
a classification module (122) coupled to the processor (112-1) to determine, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of column families; and
a mapping module (124) coupled to the processor (112-1) to:
re-arrange, for each record, values of the secondary set of columns in a pre-defined order; and
generate, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair.

14. The mapping system (102) as claimed in claim 13, wherein the communication module (126) is further configured to transmit, for each record, the intermediate key-value pair for further processing.

15. The mapping system (102) as claimed in claim 13, wherein the mapping module (124) identifies the pre-determined order based on lexicographic occurrence of column families and their associated qualifiers.

16. The mapping system (102) as claimed in claim 13, wherein the distributed data storage (106) is a Hadoop distributed file system (HDFS).

17. A reducing system (104) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the reducing system (104) comprising:
a processor (112-2);
a splitting module (134) coupled to the processor (112-2) to:
receive an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order; and
split, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
an output module (136) coupled to the processor (112-2) to generate a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.

18. The reducing system (104) as claimed in claim 17, wherein the output module (136) further emits the output key-value pairs into Hfiles in the FiFo order.

19. The reducing system (104) as claimed in claim 17, wherein the output module (136) further stores the output key-value pairs onto the distributed data storage (106).

20. The reducing system (104) as claimed in claim 17, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.

21. The reducing system (104) as claimed in claim 17, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.

22. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for further processing

23. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value;
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.
,TagSPECI:As Attached

Documents

Application Documents

#	Name	Date
1	SPEC FOR FILING.pdf	2018-08-11
2	FORM 5.pdf	2018-08-11
3	FORM 3.pdf	2018-08-11
4	FIG FOR FILING.pdf	2018-08-11
5	ABSTRACT1.jpg	2018-08-11
6	988-MUM-2014-Power of Attorney-130215.pdf	2018-08-11
7	988-MUM-2014-FORM 18.pdf	2018-08-11
8	988-MUM-2014-FORM 1(22-4-2014).pdf	2018-08-11
9	988-MUM-2014-Correspondence-130215.pdf	2018-08-11
10	988-MUM-2014-CORRESPONDENCE(22-4-2014).pdf	2018-08-11
11	988-MUM-2014-FER.pdf	2019-10-17
12	988-MUM-2014-OTHERS [16-04-2020(online)].pdf	2020-04-16
13	988-MUM-2014-FER_SER_REPLY [16-04-2020(online)].pdf	2020-04-16
14	988-MUM-2014-CLAIMS [16-04-2020(online)].pdf	2020-04-16
15	988-MUM-2014-US(14)-HearingNotice-(HearingDate-22-02-2023).pdf	2023-01-30
16	988-MUM-2014-Correspondence to notify the Controller [08-02-2023(online)].pdf	2023-02-08
17	988-MUM-2014-FORM-26 [17-02-2023(online)].pdf	2023-02-17
18	988-MUM-2014-Written submissions and relevant documents [03-03-2023(online)].pdf	2023-03-03
19	988-MUM-2014-PatentCertificate18-04-2023.pdf	2023-04-18
20	988-MUM-2014-IntimationOfGrant18-04-2023.pdf	2023-04-18

Search Strategy

1	STM_15-10-2019.pdf
2	searchTPO_15-10-2019.pdf