Sign In to Follow Application
View All Documents & Correspondence

Optimizing Gene Sequence Alignment In Cloud

Abstract: Methods and systems for gene sequencing are described. An input sequence file including an input gene sequence is received. The input sequence file is segmented into a plurality of segmented sequence files (114) having portions of the input gene sequence (116). The plurality of segmented sequence files (114) are analyzed using gene sequencing techniques with respect to a plurality of reference gene sequences to generate an alignment score each of the plurality of segmented sequence files. Based on the alignment score, a plurality of reduced documents having the alignment score of the one or more segmented sequence files having a common portion of the input gene sequence are generated. Based on the plurality of reduced documents, a final document indicative of a degree of alignment of the input gene sequence is determined.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
22 June 2012
Publication Number
28/2014
Publication Type
INA
Invention Field
MICRO BIOLOGY
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2022-01-13
Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building  9th Floor  Nariman Point  Mumbai  Maharashtra 400021

Inventors

1. Vijayakumar  Senthilkumar
TATA Consultancy Services LTD  ITPB  Pioneer Building  12th Floor  Whitefield Road  Bangalore 560066
2. Bhargavi  Anjani
TATA Consultancy Services LTD  ITPB  Pioneer Building  12th Floor  Whitefield Road  Bangalore 560066
3. Ahamed  Syed Azar
TATA Consultancy Services LTD  ITPB  Pioneer Building  12th Floor  Whitefield Road  Bangalore- 560066
4. Praseeda  Uma
TATA Consultancy Services LTD  ITPB  Pioneer Building  12th Floor  Whitefield Road  Bangalore- 560066

Specification

DESC:GENE SEQUENCING ,CLAIMS:1. A method for gene sequencing, the method comprising:
receiving, by a system (108), an input gene sequence file having a input gene sequence;
segmenting, by the system (108), the input gene sequence file into a plurality of segmented sequence files (114), each of the plurality of segmented sequence files having a portion of the input gene sequence (116), wherein the input gene sequence file is segmented based on at least one of a query optimization technique and a query plan;
analyzing, by the system (108), the plurality of segmented sequence files (114) with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files (114);
generating, by the system (108), a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files (114) and the portion of the input gene file included in the plurality of segmented gene files (114), wherein each of the plurality of reduced documents includes the alignment score of one or more segmented sequence files (114) having a common portion of the input gene sequence (116); and
determining, by the system (108), a final document indicative of a degree of alignment of the input gene sequence (116) based on the plurality of reduced documents.
2. The method as claimed in claim 1, wherein the determining further comprising:
computing, by the system (108), an aggregate of the alignment score based on alignment scores of each of the plurality of reduced documents to generate final alignment score;
merging, by the system (108), the portions of the input gene sequence (116) in each of the plurality of reduced documents to regenerate the input gene sequence (116); and
generating, by the system (108), the final document based on the final alignment score and the input gene sequence.
3. The method as claimed in claim 1, wherein the analyzing, by the system (108), further comprising generating a plurality of mapped documents (118) from the plurality of segmented sequence files (114) based on a first Key-Value pair generation framework, wherein the Key in the first Key-Value pair generation framework is indicative of a location of a corresponding segmented sequence files (114) in the system (108) and the Value in the first Key-Value pair generation framework is indicative of the portion of the input gene sequence (116) in the corresponding segmented sequence files (114).
4. The method as claimed in claim 3, wherein the analyzing further comprising:
comparing, by the system (108), for each of the plurality of the mapped documents (118), the portion of the input gene sequence (116), in a corresponding map document, with the plurality of reference gene sequences;
generating the alignment score of the portion of the input gene sequence (116) based on the comparison; and
generating an intermediate document for each of the plurality of mapped documents (118) using a second Key-Value pair generation framework based on the alignment score, wherein the Key in the second Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding map document and the Value in the second Key-Value pair generation framework is indicative of the alignment score of the portion of the input gene sequence in the corresponding map document.
5. The method as claimed in claim 4, wherein the analyzing further comprising:
identifying, by the system (108), common portions of the input gene sequence in the plurality of intermediate documents; and
computing, by the system (108), an aggregate of the alignment score based on the alignment scores of the identified common portions of the input gene sequence in the plurality of intermediate documents to generate the plurality of reduced documents.
6. The method as claimed in claim 5, wherein the plurality of reduced documents are generated using a third Key-Value pair generation framework, wherein the Key in the third Key-Value pair generation framework is indicative of the common portions of the input gene sequence and the Value in the third Key-Value pair generation framework is indicative of the aggregated alignment score of the common portions of the input gene sequence.
7. The method as claimed in claim 1, wherein the query optimization technique and the query plan are implemented by the system (108).
8. The method as claimed in claim 1, wherein the analyzing the plurality of segmented sequence files (114) is based on a sequence search algorithm.
9. A system (108) for gene sequence alignment over a Cloud computing network, the system (108) comprising:
a massive parallel processing (MPP) database (104) to:
receive an input gene sequence file having at least one input gene sequence;
segment the input gene sequence file into a plurality of segmented sequence files (114), each of the plurality of segmented sequence files (114) having a portion of the input gene sequence (116), wherein the input gene sequence file is segmented based on a query optimization technique and a query plan; and
a distributed file system (106) to:
analyze the plurality of segmented sequence files (114) with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files (114); and
generate a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files (114) and the portions of the input gene file included in the plurality of segmented gene files, wherein each of the plurality of reduced documents includes the alignment score of the one or more segmented sequence files (114) having a common portion of the input gene sequence.
10. The system (108) as claimed in claim 9, wherein the MPP database (104) is further configured to:
compute an aggregate of the alignment score based on alignment scores of the portion of the input gene sequence in each of the plurality of reduced documents to generate a final alignment score;
merge the portions of the input gene sequence in each of the plurality of reduced documents to regenerate the input gene sequence; and
generate a final document indicative of a degree of alignment of the input gene sequence based on the final alignment score and the input gene sequence.
11. The system (108) as claimed in claim 9, wherein the distributed file system (106) is further configured to generate a plurality of map documents from the plurality of segmented sequence files based on a first Key-Value pair generation framework, wherein the Key in the first Key-Value pair generation framework is indicative of a location of a corresponding segmented sequence file and the Value in the first Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding segmented sequence file (114).
12. The system (108) as claimed in claim 11, wherein the distributed file system (106) is further configured to:
compare the portion of the input gene sequence in a corresponding map document with the plurality of reference gene sequences;
generating the alignment score of the portion of the input gene sequence based on the comparison;
generating an intermediate document for each of the plurality of map documents using a second Key-Value pair generation framework based on the alignment score, wherein the Key in the second Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding map document and the Value in the second Key-Value pair generation framework is indicative of the alignment score of the portion of the input gene sequence in the corresponding map document.
13. The system (108) as claimed in claim 12, wherein the distributed file system (106) is further configured to:
identify common portions of the input gene sequence in the plurality of intermediate documents; and
compute an aggregate of the alignment score based on the alignment scores of the identified common portions of the input gene sequence in the plurality of intermediate documents to generate the plurality of reduced documents.
14. The system (108) as claimed in claim 13, wherein the distributed file system (106) is further configured to generate the plurality of reduced documents using a third Key-Value pair generation framework, wherein the Key in the third Key-Value pair generation framework is indicative of the common portions of the input gene sequence and the Value in the third Key-Value pair generation framework is indicative of the aggregated alignment score of the common portions of the input gene sequence.
15. The system (108) as claimed in claim 9, wherein the MPP database (104) is implemented over a cluster of servers.
16. The system (108) as claimed in claim 15, wherein the MPP database (104) is a Greenplum database.
17. The system (108) as claimed in claim 9, wherein the distributed file system (106) is implemented over a cluster of servers.
18. The system (108) as claimed in claim 17, wherein the distributed file system (106) is a Hadoop distributed file system.
19. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, cause a computing system to:
receiving an input gene sequence file having at least one input gene sequence;
segmenting the input gene sequence file into a plurality of segmented sequence files, each of the plurality of segmented sequence files having a portion of the input gene sequence, wherein the input gene sequence file is segmented based on a query optimization technique and a query plan;
analyzing the plurality of segmented sequence files with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files;
generating a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files and the portions of the input gene file included in the plurality of segmented gene files, wherein each of the plurality of reduced documents includes the alignment score of the one or more segmented sequence files having a common portion of the input gene sequence; and
determining a final document indicative of a degree of alignment of the input gene sequence based on the plurality of reduced documents.

Documents

Application Documents

# Name Date
1 1810-MUM-2012-RELEVANT DOCUMENTS [26-09-2023(online)].pdf 2023-09-26
1 Form-2(Online).pdf 2018-08-11
2 1810-MUM-2012-IntimationOfGrant13-01-2022.pdf 2022-01-13
2 Drawings.pdf 2018-08-11
3 ABSTRACT1.jpg 2018-08-11
3 1810-MUM-2012-PatentCertificate13-01-2022.pdf 2022-01-13
4 1810-MUM-2012-Written submissions and relevant documents [24-12-2021(online)].pdf 2021-12-24
4 1810-MUM-2012-POWER OF ATTORNEY(16-8-2012).pdf 2018-08-11
5 1810-MUM-2012-FORM-26 [07-12-2021(online)].pdf 2021-12-07
5 1810-MUM-2012-FORM 5(2-7-2014).pdf 2018-08-11
6 1810-MUM-2012-FORM 1(2-11-2012).pdf 2018-08-11
6 1810-MUM-2012-Correspondence to notify the Controller [19-11-2021(online)].pdf 2021-11-19
7 1810-MUM-2012-US(14)-HearingNotice-(HearingDate-16-12-2021).pdf 2021-11-15
7 1810-MUM-2012-CORRESPONDENCE(2-7-2014).pdf 2018-08-11
8 1810-MUM-2012-CORRESPONDENCE(2-11-2012).pdf 2018-08-11
8 1810-MUM-2012-CLAIMS [17-03-2020(online)].pdf 2020-03-17
9 1810-MUM-2012-COMPLETE SPECIFICATION [17-03-2020(online)].pdf 2020-03-17
9 1810-MUM-2012-CORRESPONDENCE(16-8-2012).pdf 2018-08-11
10 1810-MUM-2012-FER.pdf 2019-09-18
10 1810-MUM-2012-FER_SER_REPLY [17-03-2020(online)].pdf 2020-03-17
11 1810-MUM-2012-FER.pdf 2019-09-18
11 1810-MUM-2012-FER_SER_REPLY [17-03-2020(online)].pdf 2020-03-17
12 1810-MUM-2012-COMPLETE SPECIFICATION [17-03-2020(online)].pdf 2020-03-17
12 1810-MUM-2012-CORRESPONDENCE(16-8-2012).pdf 2018-08-11
13 1810-MUM-2012-CLAIMS [17-03-2020(online)].pdf 2020-03-17
13 1810-MUM-2012-CORRESPONDENCE(2-11-2012).pdf 2018-08-11
14 1810-MUM-2012-CORRESPONDENCE(2-7-2014).pdf 2018-08-11
14 1810-MUM-2012-US(14)-HearingNotice-(HearingDate-16-12-2021).pdf 2021-11-15
15 1810-MUM-2012-Correspondence to notify the Controller [19-11-2021(online)].pdf 2021-11-19
15 1810-MUM-2012-FORM 1(2-11-2012).pdf 2018-08-11
16 1810-MUM-2012-FORM 5(2-7-2014).pdf 2018-08-11
16 1810-MUM-2012-FORM-26 [07-12-2021(online)].pdf 2021-12-07
17 1810-MUM-2012-POWER OF ATTORNEY(16-8-2012).pdf 2018-08-11
17 1810-MUM-2012-Written submissions and relevant documents [24-12-2021(online)].pdf 2021-12-24
18 ABSTRACT1.jpg 2018-08-11
18 1810-MUM-2012-PatentCertificate13-01-2022.pdf 2022-01-13
19 Drawings.pdf 2018-08-11
19 1810-MUM-2012-IntimationOfGrant13-01-2022.pdf 2022-01-13
20 Form-2(Online).pdf 2018-08-11
20 1810-MUM-2012-RELEVANT DOCUMENTS [26-09-2023(online)].pdf 2023-09-26

Search Strategy

1 1810_mum_2012AE_26-10-2021.pdf
1 2019-09-0917-45-30_11-09-2019.pdf
2 1810_mum_2012AE_26-10-2021.pdf
2 2019-09-0917-45-30_11-09-2019.pdf

ERegister / Renewals

3rd: 17 Jan 2022

From 22/06/2014 - To 22/06/2015

4th: 17 Jan 2022

From 22/06/2015 - To 22/06/2016

5th: 17 Jan 2022

From 22/06/2016 - To 22/06/2017

6th: 17 Jan 2022

From 22/06/2017 - To 22/06/2018

7th: 17 Jan 2022

From 22/06/2018 - To 22/06/2019

8th: 17 Jan 2022

From 22/06/2019 - To 22/06/2020

9th: 17 Jan 2022

From 22/06/2020 - To 22/06/2021

10th: 17 Jan 2022

From 22/06/2021 - To 22/06/2022

11th: 17 Jan 2022

From 22/06/2022 - To 22/06/2023

12th: 14 Jun 2023

From 22/06/2023 - To 22/06/2024

13th: 10 Jun 2024

From 22/06/2024 - To 22/06/2025

14th: 12 Jun 2025

From 22/06/2025 - To 22/06/2026