Abstract: System(s) and method(s) for processing and storage of structured data onto a columnar database hosted on a distributed data storage are described. According to the present subject matter, a mapping system (102) and a reducing system (104) are described for processing and storage of structured data. The described systems implement methods that include receiving a segment of the structured data including at least one record. Further, the method includes determining a primary set of columns and a secondary set of columns from amongst a plurality of columns of each record. The method also includes re-arranging values of the secondary set of columns in a pre-defined order and generating an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, where the intermediate key and the intermediate value form an intermediate key-value pair.
CLIAMS:1. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage , the method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for processing to store onto the columnar database.
2. The method as claimed in claim 1, wherein the pre-determined order is based on lexicographic arrangement of column families and their associated qualifiers.
3. The method as claimed in claim 1, wherein the pre-defined order is based on nature of the columnar database to which the structured data is stored.
4. The method as claimed in claim 1, wherein values of the row key uniquely identifies the at least one record of the structured data.
5. The method as claimed in claim 4, wherein the row key is a composition of a list of values.
6. The method as claimed in claim 1, wherein the distributed data storage is a Hadoop distributed file system (HDFS) and the columnar database is HBase.
7. A method for processing structured data for storage onto a columnar database hosted on a distributed data storage, the method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to the intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in the pre-defined order for generating the set of output key-value pairs.
8. The method as claimed in claim 5, wherein the method further comprises emitting the output key-value pairs into a pre-defined format.
9. The method as claimed in claim 8, wherein the method further comprises emitting into Hfiles, the output key-value pairs in a First-in-First-out (FiFo) order.
10. The method as claimed in claim 7, wherein the method further comprises storing the output key-value pairs onto the distributed data storage.
11. The method as claimed in claim 7, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.
12. The method as claimed in claim 7, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.
13. A mapping system (102) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the mapping system (102) comprising:
a processor (112-1);
a communication module (126) coupled to the processor (112-1) to receive a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
a classification module (122) coupled to the processor (112-1) to determine, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of column families; and
a mapping module (124) coupled to the processor (112-1) to:
re-arrange, for each record, values of the secondary set of columns in a pre-defined order; and
generate, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair.
14. The mapping system (102) as claimed in claim 13, wherein the communication module (126) is further configured to transmit, for each record, the intermediate key-value pair for further processing.
15. The mapping system (102) as claimed in claim 13, wherein the mapping module (124) identifies the pre-determined order based on lexicographic occurrence of column families and their associated qualifiers.
16. The mapping system (102) as claimed in claim 13, wherein the distributed data storage (106) is a Hadoop distributed file system (HDFS).
17. A reducing system (104) for processing structured data for storage onto a columnar database hosted on a distributed data storage (106), the reducing system (104) comprising:
a processor (112-2);
a splitting module (134) coupled to the processor (112-2) to:
receive an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order; and
split, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value; and
an output module (136) coupled to the processor (112-2) to generate a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.
18. The reducing system (104) as claimed in claim 17, wherein the output module (136) further emits the output key-value pairs into Hfiles in the FiFo order.
19. The reducing system (104) as claimed in claim 17, wherein the output module (136) further stores the output key-value pairs onto the distributed data storage (106).
20. The reducing system (104) as claimed in claim 17, wherein a key corresponding to each of the output key-value pairs is an ImmutableByteWritable object.
21. The reducing system (104) as claimed in claim 17, wherein a value corresponding each of the output key-value pair is a Hbase key-value object.
22. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a segment of the structured data including at least one record, wherein the at least one record comprises values stored for a plurality of columns;
determining, for each record of the received segment, a primary set of columns and a secondary set of columns from amongst the plurality of columns, wherein the primary set of columns form row key of the at least one record, and wherein the secondary set of columns include values corresponding to a set of set of column families;
re-arranging, for each record, values of the secondary set of columns in a pre-defined order;
generating, for each record, an intermediate key based on the primary set of columns and a corresponding intermediate value comprising the re-arranged values of the secondary set of columns, wherein the intermediate key and the intermediate value form an intermediate key-value pair; and
transmitting, for each record, the intermediate key-value pair for further processing
23. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving an intermediate key-value pair comprising an intermediate key and a corresponding intermediate value, and wherein the intermediate value includes multiple values corresponding to a set of secondary set of columns, combined in a pre-defined order;
splitting, corresponding to an intermediate key, the intermediate value into multiple values based on at least one separator included in the intermediate value;
generating a set of output key-value pairs based on the intermediate key and the multiple values, wherein the multiple values are considered in a First-in-First-out (FiFo) order for generating the set of output key-value pairs.
,TagSPECI:As Attached
| # | Name | Date |
|---|---|---|
| 1 | SPEC FOR FILING.pdf | 2018-08-11 |
| 2 | FORM 5.pdf | 2018-08-11 |
| 3 | FORM 3.pdf | 2018-08-11 |
| 4 | FIG FOR FILING.pdf | 2018-08-11 |
| 5 | ABSTRACT1.jpg | 2018-08-11 |
| 6 | 988-MUM-2014-Power of Attorney-130215.pdf | 2018-08-11 |
| 7 | 988-MUM-2014-FORM 18.pdf | 2018-08-11 |
| 8 | 988-MUM-2014-FORM 1(22-4-2014).pdf | 2018-08-11 |
| 9 | 988-MUM-2014-Correspondence-130215.pdf | 2018-08-11 |
| 10 | 988-MUM-2014-CORRESPONDENCE(22-4-2014).pdf | 2018-08-11 |
| 11 | 988-MUM-2014-FER.pdf | 2019-10-17 |
| 12 | 988-MUM-2014-OTHERS [16-04-2020(online)].pdf | 2020-04-16 |
| 13 | 988-MUM-2014-FER_SER_REPLY [16-04-2020(online)].pdf | 2020-04-16 |
| 14 | 988-MUM-2014-CLAIMS [16-04-2020(online)].pdf | 2020-04-16 |
| 15 | 988-MUM-2014-US(14)-HearingNotice-(HearingDate-22-02-2023).pdf | 2023-01-30 |
| 16 | 988-MUM-2014-Correspondence to notify the Controller [08-02-2023(online)].pdf | 2023-02-08 |
| 17 | 988-MUM-2014-FORM-26 [17-02-2023(online)].pdf | 2023-02-17 |
| 18 | 988-MUM-2014-Written submissions and relevant documents [03-03-2023(online)].pdf | 2023-03-03 |
| 19 | 988-MUM-2014-PatentCertificate18-04-2023.pdf | 2023-04-18 |
| 20 | 988-MUM-2014-IntimationOfGrant18-04-2023.pdf | 2023-04-18 |
| 1 | STM_15-10-2019.pdf |
| 2 | searchTPO_15-10-2019.pdf |