Abstract: System(s) and method(s) for processing and storage of unstructured data onto a distributed data storage are described. According to the present subject matter, a mapping system (102) and a reducing system (104) are described for processing and storage of unstructured data. The described systems implement methods that include receiving a segment of the unstructured data including at least one record, where the at least one record comprises values for one or more qualifiers. The methods may also include identifying column family and at least one qualifier from amongst the one or more qualifiers corresponding to values of each record from amongst the at least one record. Further, the method includes determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record and generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record.
CLIAMS:1. A method for processing unstructured data for storage onto a distributed data storage, the method comprising:
receiving a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
identifying column family and at least one qualifier, from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record;
determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record;
generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record; and
transmitting, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.
2. The method as claimed in claim 1, wherein the transmitting is based on determination of a reducer, wherein the determination is based on the enhanced key.
3. The method as claimed in claim 1, wherein the determining the key qualifier is based on nature of the distributed data storage onto which the structured data is stored.
4. The method as claimed in claim 1, wherein the enhanced key comprises identified column family and the at least one qualifier in a pre-defined order.
5. The method as claimed in claim 1, wherein the distributed data storage is a Hadoop distributed file system (HDFS).
6. A method for processing unstructured data for storage onto a distributed data storage, the method comprising:
receiving a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value;
sorting the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order; and
processing each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.
7. The method as claimed in claim 6, wherein the sorting is based on lexicographic order of the enhanced keys corresponding to the plurality of intermediate key-value pairs.
8. The method as claimed in claim 6, wherein the method further comprises emitting into Hfiles, the output key-value pairs in a First-in-First-out (FiFo) order.
9. The method as claimed in claim 6, wherein the method further comprises storing the output key-value pairs onto the distributed data storage.
10. A mapping system (102) for processing unstructured data for storage onto a distributed data storage (106), the mapping system (102) comprising:
a processor (112-1);
a communication module (126) coupled to the processor (112-1) to receive a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
a classification module (122) coupled to the processor (112-1) to:
identify column family and at least one qualifier from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record; and
determine a key qualifier, for each record, from amongst the at least one qualifier identified for the each record; and
a mapping module (124) coupled to the processor (112-1) to generate an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record.
11. The mapping system (102) as claimed in claim 10, wherein the communication module (126) further transmits, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.
12. The mapping system (102) as claimed in claim 11, wherein the communication module (126) transmits the intermediate key-value pairs based on determination of a reducer, wherein the determination is based on the enhanced key.
13. The mapping system (102) as claimed in claim 11, wherein the mapping module (124) generates the enhanced key based on combination of identified column family and the at least one qualifier in a pre-defined order.
14. A reducing system (104) for processing unstructured data for storage onto a distributed data storage (106), the reducing system (104) comprising:
a processor (112-2);
a sorting module (134) coupled to the processor (112-2) to:
receive a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value; and
sort the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order.
an output module (136) to process each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.
15. The reducing system (104) as claimed in claim 14, wherein the pre-defined order of sorting is based on lexicographic order of the enhanced keys corresponding to the plurality of intermediate key-value pairs.
16. The reducing system (104) as claimed in claim 14, wherein output module (136) generates the output key-value pairs in a First-in-First-out (FiFo) order.
17. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
identifying column family and at least one qualifier from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record;
determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record;
generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record; and
transmitting, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.
18. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value;
sorting the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order; and
processing each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.
,TagSPECI:As Attached
| # | Name | Date |
|---|---|---|
| 1 | SPECIFICATION.pdf | 2018-08-11 |
| 2 | FORM 5.pdf | 2018-08-11 |
| 3 | FORM 3.pdf | 2018-08-11 |
| 4 | FIGURES.pdf | 2018-08-11 |
| 5 | ABSTRACT1.jpg | 2018-08-11 |
| 6 | 987-MUM-2014-Power of Attorney-130215.pdf | 2018-08-11 |
| 7 | 987-MUM-2014-FORM 18.pdf | 2018-08-11 |
| 8 | 987-MUM-2014-FORM 1(14-8-2014).pdf | 2018-08-11 |
| 9 | 987-MUM-2014-Correspondence-130215.pdf | 2018-08-11 |
| 10 | 987-MUM-2014-CORRESPONDENCE(14-8-2014).pdf | 2018-08-11 |
| 11 | 987-MUM-2014-FER.pdf | 2019-10-31 |
| 12 | 987-MUM-2014-FORM-26 [23-03-2020(online)].pdf | 2020-03-23 |
| 13 | 987-MUM-2014-OTHERS [29-04-2020(online)].pdf | 2020-04-29 |
| 14 | 987-MUM-2014-FER_SER_REPLY [29-04-2020(online)].pdf | 2020-04-29 |
| 15 | 987-MUM-2014-COMPLETE SPECIFICATION [29-04-2020(online)].pdf | 2020-04-29 |
| 16 | 987-MUM-2014-CLAIMS [29-04-2020(online)].pdf | 2020-04-29 |
| 17 | 987-MUM-2014-Correspondence to notify the Controller [27-10-2020(online)].pdf | 2020-10-27 |
| 18 | 987-MUM-2014-Written submissions and relevant documents [18-11-2020(online)].pdf | 2020-11-18 |
| 19 | 987-MUM-2014-US(14)-HearingNotice-(HearingDate-05-11-2020).pdf | 2021-10-03 |
| 20 | 987-MUM-2014-US(14)-ExtendedHearingNotice-(HearingDate-05-07-2022).pdf | 2022-06-10 |
| 21 | 987-MUM-2014-Correspondence to notify the Controller [14-06-2022(online)].pdf | 2022-06-14 |
| 22 | 987-MUM-2014-FORM-26 [04-07-2022(online)].pdf | 2022-07-04 |
| 23 | 987-MUM-2014-Written submissions and relevant documents [19-07-2022(online)].pdf | 2022-07-19 |
| 24 | 987-MUM-2014-PatentCertificate17-08-2022.pdf | 2022-08-17 |
| 25 | 987-MUM-2014-IntimationOfGrant17-08-2022.pdf | 2022-08-17 |
| 1 | SearchStrategy_A987MUM2014AE_14-07-2020.pdf |
| 2 | SearchStrategyMatrix_987MUM2014_31-10-2019.pdf |
| 3 | d2npljin2011_31-10-2019.pdf |