Sign In to Follow Application
View All Documents & Correspondence

Storage Of Structured Data Onto Distributed Data Storage

Abstract: System(s) and method(s) for processing and storage of structured data onto a columnar database hosted on a distributed data storage are described. According to the present subject matter, a mapping system (102) and a reducing system (104) are described for processing and storage of structured data. The described systems implement methods that include receiving a segment of the structured data including at least one record. Further, the method includes determining a primary set of columns and a secondary set of columns from amongst a plurality of columns of each record. The method also includes re-arranging values of the secondary set of columns in a pre-defined order and generating an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, where the intermediate key and the intermediate value form an intermediate key-value pair.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
24 March 2014
Publication Number
40/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
iprdel@lakshmisri.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-04-18
Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. OMATHIL, Gelesh George
Omathil House, Meemutty PO, Kozhikode, Kerala, 673580

Specification

CLIAMS:1. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage , the method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for processing to store onto the columnar database.

2. The method as claimed in claim 1, wherein the pre-determined order is based on lexicographic arrangement of column families and their associated qualifiers.

3. The method as claimed in claim 1, wherein the pre-defined order is based on nature of the columnar database to which the structured data is stored.

4. The method as claimed in claim 1, wherein values of the row key uniquely identifies the at least one record of the structured data.

5. The method as claimed in claim 4, wherein the row key is a composition of a list of values.

6. The method as claimed in claim 1, wherein the distributed data storage is a Hadoop distributed file system (HDFS) and the columnar database is HBase.

7. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage, the method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to the intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in the pre-defined order for generating the set of output key-value pairs.

8. The method as claimed in claim 5, wherein the method further comprises emitting the output key-value pairs into a pre-defined format.

9. The method as claimed in claim 8, wherein the method further comprises emitting into Hfiles, the output key-value pairs in a First-in-First-out (FiFo) order.

10. The method as claimed in claim 7, wherein the method further comprises storing the output key-value pairs onto the distributed data storage.

11. The method as claimed in claim 7, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.

12. The method as claimed in claim 7, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.

13. A mapping system (102) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the mapping system (102) comprising:
a processor (112-1);
a communication module (126) coupled to the processor (112-1) to receive a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
a classification module (122) coupled to the processor (112-1) to determine, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of column families; and
a mapping module (124) coupled to the processor (112-1) to:
re-arrange, for each record, values of the secondary set of columns in a pre-defined order; and
generate, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair.

14. The mapping system (102) as claimed in claim 13, wherein the communication module (126) is further configured to transmit, for each record, the intermediate key-value pair for further processing.

15. The mapping system (102) as claimed in claim 13, wherein the mapping module (124) identifies the pre-determined order based on lexicographic occurrence of column families and their associated qualifiers.

16. The mapping system (102) as claimed in claim 13, wherein the distributed data storage (106) is a Hadoop distributed file system (HDFS).

17. A reducing system (104) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the reducing system (104) comprising:
a processor (112-2);
a splitting module (134) coupled to the processor (112-2) to:
receive an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order; and
split, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
an output module (136) coupled to the processor (112-2) to generate a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.

18. The reducing system (104) as claimed in claim 17, wherein the output module (136) further emits the output key-value pairs into Hfiles in the FiFo order.

19. The reducing system (104) as claimed in claim 17, wherein the output module (136) further stores the output key-value pairs onto the distributed data storage (106).

20. The reducing system (104) as claimed in claim 17, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.

21. The reducing system (104) as claimed in claim 17, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.

22. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for further processing

23. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value;
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.
,TagSPECI:As Attached

Documents

Application Documents

# Name Date
1 SPEC FOR FILING.pdf 2018-08-11
2 FORM 5.pdf 2018-08-11
3 FORM 3.pdf 2018-08-11
4 FIG FOR FILING.pdf 2018-08-11
5 ABSTRACT1.jpg 2018-08-11
6 988-MUM-2014-Power of Attorney-130215.pdf 2018-08-11
7 988-MUM-2014-FORM 18.pdf 2018-08-11
8 988-MUM-2014-FORM 1(22-4-2014).pdf 2018-08-11
9 988-MUM-2014-Correspondence-130215.pdf 2018-08-11
10 988-MUM-2014-CORRESPONDENCE(22-4-2014).pdf 2018-08-11
11 988-MUM-2014-FER.pdf 2019-10-17
12 988-MUM-2014-OTHERS [16-04-2020(online)].pdf 2020-04-16
13 988-MUM-2014-FER_SER_REPLY [16-04-2020(online)].pdf 2020-04-16
14 988-MUM-2014-CLAIMS [16-04-2020(online)].pdf 2020-04-16
15 988-MUM-2014-US(14)-HearingNotice-(HearingDate-22-02-2023).pdf 2023-01-30
16 988-MUM-2014-Correspondence to notify the Controller [08-02-2023(online)].pdf 2023-02-08
17 988-MUM-2014-FORM-26 [17-02-2023(online)].pdf 2023-02-17
18 988-MUM-2014-Written submissions and relevant documents [03-03-2023(online)].pdf 2023-03-03
19 988-MUM-2014-PatentCertificate18-04-2023.pdf 2023-04-18
20 988-MUM-2014-IntimationOfGrant18-04-2023.pdf 2023-04-18

Search Strategy

1 STM_15-10-2019.pdf
2 searchTPO_15-10-2019.pdf

ERegister / Renewals

3rd: 11 May 2023

From 24/03/2016 - To 24/03/2017

4th: 11 May 2023

From 24/03/2017 - To 24/03/2018

5th: 11 May 2023

From 24/03/2018 - To 24/03/2019

6th: 11 May 2023

From 24/03/2019 - To 24/03/2020

7th: 11 May 2023

From 24/03/2020 - To 24/03/2021

8th: 11 May 2023

From 24/03/2021 - To 24/03/2022

9th: 11 May 2023

From 24/03/2022 - To 24/03/2023

10th: 11 May 2023

From 24/03/2023 - To 24/03/2024

11th: 14 Mar 2024

From 24/03/2024 - To 24/03/2025

12th: 19 Mar 2025

From 24/03/2025 - To 24/03/2026