Sign In to Follow Application
View All Documents & Correspondence

Storage Of Unstructured Data Onto Distributed Data Storage

Abstract: System(s) and method(s) for processing and storage of unstructured data onto a distributed data storage are described. According to the present subject matter, a mapping system (102) and a reducing system (104) are described for processing and storage of unstructured data. The described systems implement methods that include receiving a segment of the unstructured data including at least one record, where the at least one record comprises values for one or more qualifiers. The methods may also include identifying column family and at least one qualifier from amongst the one or more qualifiers corresponding to values of each record from amongst the at least one record. Further, the method includes determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record and generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
24 March 2014
Publication Number
40/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
iprdel@lakshmisri.com
Parent Application
Patent Number
Legal Status
Grant Date
2022-08-17
Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. OMATHIL, Gelesh George
Omathil House, Meemutty PO, Kozhikode, Kerala 673580

Specification

CLIAMS:1. A method for processing unstructured data for storage onto a distributed data storage, the method comprising:
receiving a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
identifying column family and at least one qualifier, from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record;
determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record;
generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record; and
transmitting, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.

2. The method as claimed in claim 1, wherein the transmitting is based on determination of a reducer, wherein the determination is based on the enhanced key.

3. The method as claimed in claim 1, wherein the determining the key qualifier is based on nature of the distributed data storage onto which the structured data is stored.

4. The method as claimed in claim 1, wherein the enhanced key comprises identified column family and the at least one qualifier in a pre-defined order.

5. The method as claimed in claim 1, wherein the distributed data storage is a Hadoop distributed file system (HDFS).

6. A method for processing unstructured data for storage onto a distributed data storage, the method comprising:
receiving a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value;
sorting the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order; and
processing each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.

7. The method as claimed in claim 6, wherein the sorting is based on lexicographic order of the enhanced keys corresponding to the plurality of intermediate key-value pairs.

8. The method as claimed in claim 6, wherein the method further comprises emitting into Hfiles, the output key-value pairs in a First-in-First-out (FiFo) order.

9. The method as claimed in claim 6, wherein the method further comprises storing the output key-value pairs onto the distributed data storage.

10. A mapping system (102) for processing unstructured data for storage onto a distributed data storage (106), the mapping system (102) comprising:
a processor (112-1);
a communication module (126) coupled to the processor (112-1) to receive a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
a classification module (122) coupled to the processor (112-1) to:
identify column family and at least one qualifier from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record; and
determine a key qualifier, for each record, from amongst the at least one qualifier identified for the each record; and
a mapping module (124) coupled to the processor (112-1) to generate an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record.

11. The mapping system (102) as claimed in claim 10, wherein the communication module (126) further transmits, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.

12. The mapping system (102) as claimed in claim 11, wherein the communication module (126) transmits the intermediate key-value pairs based on determination of a reducer, wherein the determination is based on the enhanced key.

13. The mapping system (102) as claimed in claim 11, wherein the mapping module (124) generates the enhanced key based on combination of identified column family and the at least one qualifier in a pre-defined order.

14. A reducing system (104) for processing unstructured data for storage onto a distributed data storage (106), the reducing system (104) comprising:
a processor (112-2);
a sorting module (134) coupled to the processor (112-2) to:
receive a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value; and
sort the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order.
an output module (136) to process each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.

15. The reducing system (104) as claimed in claim 14, wherein the pre-defined order of sorting is based on lexicographic order of the enhanced keys corresponding to the plurality of intermediate key-value pairs.

16. The reducing system (104) as claimed in claim 14, wherein output module (136) generates the output key-value pairs in a First-in-First-out (FiFo) order.

17. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a segment of the unstructured data including at least one record, wherein the at least one record comprises values for one or more qualifiers;
identifying column family and at least one qualifier from amongst the one or more qualifiers, corresponding to values of each record from amongst the at least one record;
determining a key qualifier, for each record, from amongst the at least one qualifier identified for the each record;
generating an enhanced key, for each record, based on at least one of the key qualifier, the identified column family, and the at least one qualifier for each record; and
transmitting, for each record, an intermediate key-value pair, wherein the intermediate key-value pair includes the enhanced key and a value corresponding to the enhanced key.

18. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
receiving a plurality of intermediate key-value pairs, wherein each intermediate key-value pair from amongst the plurality of intermediate key-value pairs includes an enhanced key and a corresponding intermediate value;
sorting the plurality of intermediate key-value pairs based on the enhanced key in a pre-defined order; and
processing each intermediate key-value pair from amongst the plurality intermediate key-value pairs to generate a set of output key-value pair.
,TagSPECI:As Attached

Documents

Application Documents

# Name Date
1 SPECIFICATION.pdf 2018-08-11
2 FORM 5.pdf 2018-08-11
3 FORM 3.pdf 2018-08-11
4 FIGURES.pdf 2018-08-11
5 ABSTRACT1.jpg 2018-08-11
6 987-MUM-2014-Power of Attorney-130215.pdf 2018-08-11
7 987-MUM-2014-FORM 18.pdf 2018-08-11
8 987-MUM-2014-FORM 1(14-8-2014).pdf 2018-08-11
9 987-MUM-2014-Correspondence-130215.pdf 2018-08-11
10 987-MUM-2014-CORRESPONDENCE(14-8-2014).pdf 2018-08-11
11 987-MUM-2014-FER.pdf 2019-10-31
12 987-MUM-2014-FORM-26 [23-03-2020(online)].pdf 2020-03-23
13 987-MUM-2014-OTHERS [29-04-2020(online)].pdf 2020-04-29
14 987-MUM-2014-FER_SER_REPLY [29-04-2020(online)].pdf 2020-04-29
15 987-MUM-2014-COMPLETE SPECIFICATION [29-04-2020(online)].pdf 2020-04-29
16 987-MUM-2014-CLAIMS [29-04-2020(online)].pdf 2020-04-29
17 987-MUM-2014-Correspondence to notify the Controller [27-10-2020(online)].pdf 2020-10-27
18 987-MUM-2014-Written submissions and relevant documents [18-11-2020(online)].pdf 2020-11-18
19 987-MUM-2014-US(14)-HearingNotice-(HearingDate-05-11-2020).pdf 2021-10-03
20 987-MUM-2014-US(14)-ExtendedHearingNotice-(HearingDate-05-07-2022).pdf 2022-06-10
21 987-MUM-2014-Correspondence to notify the Controller [14-06-2022(online)].pdf 2022-06-14
22 987-MUM-2014-FORM-26 [04-07-2022(online)].pdf 2022-07-04
23 987-MUM-2014-Written submissions and relevant documents [19-07-2022(online)].pdf 2022-07-19
24 987-MUM-2014-PatentCertificate17-08-2022.pdf 2022-08-17
25 987-MUM-2014-IntimationOfGrant17-08-2022.pdf 2022-08-17

Search Strategy

1 SearchStrategy_A987MUM2014AE_14-07-2020.pdf
2 SearchStrategyMatrix_987MUM2014_31-10-2019.pdf
3 d2npljin2011_31-10-2019.pdf

ERegister / Renewals

3rd: 07 Sep 2022

From 24/03/2016 - To 24/03/2017

4th: 07 Sep 2022

From 24/03/2017 - To 24/03/2018

5th: 07 Sep 2022

From 24/03/2018 - To 24/03/2019

6th: 07 Sep 2022

From 24/03/2019 - To 24/03/2020

7th: 07 Sep 2022

From 24/03/2020 - To 24/03/2021

8th: 07 Sep 2022

From 24/03/2021 - To 24/03/2022

9th: 07 Sep 2022

From 24/03/2022 - To 24/03/2023

10th: 07 Sep 2022

From 24/03/2023 - To 24/03/2024

11th: 14 Mar 2024

From 24/03/2024 - To 24/03/2025

12th: 19 Mar 2025

From 24/03/2025 - To 24/03/2026