Optimizing Gene Sequence Alignment In Cloud

Abstract: Methods and systems for gene sequencing are described. An input sequence file including an input gene sequence is received. The input sequence file is segmented into a plurality of segmented sequence files (114) having portions of the input gene sequence (116). The plurality of segmented sequence files (114) are analyzed using gene sequencing techniques with respect to a plurality of reference gene sequences to generate an alignment score each of the plurality of segmented sequence files. Based on the alignment score, a plurality of reduced documents having the alignment score of the one or more segmented sequence files having a common portion of the input gene sequence are generated. Based on the plurality of reduced documents, a final document indicative of a degree of alignment of the input gene sequence is determined.

Patent Information

Application #

Filing Date

22 June 2012

Publication Number

28/2014

Publication Type

INA

Invention Field

MICRO BIOLOGY

Status

Email

Parent Application

Patent Number

Legal Status

Grant Date

2022-01-13

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building 9th Floor Nariman Point Mumbai Maharashtra 400021

Inventors

1. Vijayakumar Senthilkumar

TATA Consultancy Services LTD ITPB Pioneer Building 12th Floor Whitefield Road Bangalore 560066

2. Bhargavi Anjani

TATA Consultancy Services LTD ITPB Pioneer Building 12th Floor Whitefield Road Bangalore 560066

3. Ahamed Syed Azar

TATA Consultancy Services LTD ITPB Pioneer Building 12th Floor Whitefield Road Bangalore- 560066

4. Praseeda Uma

TATA Consultancy Services LTD ITPB Pioneer Building 12th Floor Whitefield Road Bangalore- 560066

Specification

DESC:GENE SEQUENCING ,CLAIMS:1. A method for gene sequencing, the method comprising:
receiving, by a system (108), an input gene sequence file having a input gene sequence;
segmenting, by the system (108), the input gene sequence file into a plurality of segmented sequence files (114), each of the plurality of segmented sequence files having a portion of the input gene sequence (116), wherein the input gene sequence file is segmented based on at least one of a query optimization technique and a query plan;
analyzing, by the system (108), the plurality of segmented sequence files (114) with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files (114);
generating, by the system (108), a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files (114) and the portion of the input gene file included in the plurality of segmented gene files (114), wherein each of the plurality of reduced documents includes the alignment score of one or more segmented sequence files (114) having a common portion of the input gene sequence (116); and
determining, by the system (108), a final document indicative of a degree of alignment of the input gene sequence (116) based on the plurality of reduced documents.
2. The method as claimed in claim 1, wherein the determining further comprising:
computing, by the system (108), an aggregate of the alignment score based on alignment scores of each of the plurality of reduced documents to generate final alignment score;
merging, by the system (108), the portions of the input gene sequence (116) in each of the plurality of reduced documents to regenerate the input gene sequence (116); and
generating, by the system (108), the final document based on the final alignment score and the input gene sequence.
3. The method as claimed in claim 1, wherein the analyzing, by the system (108), further comprising generating a plurality of mapped documents (118) from the plurality of segmented sequence files (114) based on a first Key-Value pair generation framework, wherein the Key in the first Key-Value pair generation framework is indicative of a location of a corresponding segmented sequence files (114) in the system (108) and the Value in the first Key-Value pair generation framework is indicative of the portion of the input gene sequence (116) in the corresponding segmented sequence files (114).
4. The method as claimed in claim 3, wherein the analyzing further comprising:
comparing, by the system (108), for each of the plurality of the mapped documents (118), the portion of the input gene sequence (116), in a corresponding map document, with the plurality of reference gene sequences;
generating the alignment score of the portion of the input gene sequence (116) based on the comparison; and
generating an intermediate document for each of the plurality of mapped documents (118) using a second Key-Value pair generation framework based on the alignment score, wherein the Key in the second Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding map document and the Value in the second Key-Value pair generation framework is indicative of the alignment score of the portion of the input gene sequence in the corresponding map document.
5. The method as claimed in claim 4, wherein the analyzing further comprising:
identifying, by the system (108), common portions of the input gene sequence in the plurality of intermediate documents; and
computing, by the system (108), an aggregate of the alignment score based on the alignment scores of the identified common portions of the input gene sequence in the plurality of intermediate documents to generate the plurality of reduced documents.
6. The method as claimed in claim 5, wherein the plurality of reduced documents are generated using a third Key-Value pair generation framework, wherein the Key in the third Key-Value pair generation framework is indicative of the common portions of the input gene sequence and the Value in the third Key-Value pair generation framework is indicative of the aggregated alignment score of the common portions of the input gene sequence.
7. The method as claimed in claim 1, wherein the query optimization technique and the query plan are implemented by the system (108).
8. The method as claimed in claim 1, wherein the analyzing the plurality of segmented sequence files (114) is based on a sequence search algorithm.
9. A system (108) for gene sequence alignment over a Cloud computing network, the system (108) comprising:
a massive parallel processing (MPP) database (104) to:
receive an input gene sequence file having at least one input gene sequence;
segment the input gene sequence file into a plurality of segmented sequence files (114), each of the plurality of segmented sequence files (114) having a portion of the input gene sequence (116), wherein the input gene sequence file is segmented based on a query optimization technique and a query plan; and
a distributed file system (106) to:
analyze the plurality of segmented sequence files (114) with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files (114); and
generate a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files (114) and the portions of the input gene file included in the plurality of segmented gene files, wherein each of the plurality of reduced documents includes the alignment score of the one or more segmented sequence files (114) having a common portion of the input gene sequence.
10. The system (108) as claimed in claim 9, wherein the MPP database (104) is further configured to:
compute an aggregate of the alignment score based on alignment scores of the portion of the input gene sequence in each of the plurality of reduced documents to generate a final alignment score;
merge the portions of the input gene sequence in each of the plurality of reduced documents to regenerate the input gene sequence; and
generate a final document indicative of a degree of alignment of the input gene sequence based on the final alignment score and the input gene sequence.
11. The system (108) as claimed in claim 9, wherein the distributed file system (106) is further configured to generate a plurality of map documents from the plurality of segmented sequence files based on a first Key-Value pair generation framework, wherein the Key in the first Key-Value pair generation framework is indicative of a location of a corresponding segmented sequence file and the Value in the first Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding segmented sequence file (114).
12. The system (108) as claimed in claim 11, wherein the distributed file system (106) is further configured to:
compare the portion of the input gene sequence in a corresponding map document with the plurality of reference gene sequences;
generating the alignment score of the portion of the input gene sequence based on the comparison;
generating an intermediate document for each of the plurality of map documents using a second Key-Value pair generation framework based on the alignment score, wherein the Key in the second Key-Value pair generation framework is indicative of the portion of the input gene sequence in the corresponding map document and the Value in the second Key-Value pair generation framework is indicative of the alignment score of the portion of the input gene sequence in the corresponding map document.
13. The system (108) as claimed in claim 12, wherein the distributed file system (106) is further configured to:
identify common portions of the input gene sequence in the plurality of intermediate documents; and
compute an aggregate of the alignment score based on the alignment scores of the identified common portions of the input gene sequence in the plurality of intermediate documents to generate the plurality of reduced documents.
14. The system (108) as claimed in claim 13, wherein the distributed file system (106) is further configured to generate the plurality of reduced documents using a third Key-Value pair generation framework, wherein the Key in the third Key-Value pair generation framework is indicative of the common portions of the input gene sequence and the Value in the third Key-Value pair generation framework is indicative of the aggregated alignment score of the common portions of the input gene sequence.
15. The system (108) as claimed in claim 9, wherein the MPP database (104) is implemented over a cluster of servers.
16. The system (108) as claimed in claim 15, wherein the MPP database (104) is a Greenplum database.
17. The system (108) as claimed in claim 9, wherein the distributed file system (106) is implemented over a cluster of servers.
18. The system (108) as claimed in claim 17, wherein the distributed file system (106) is a Hadoop distributed file system.
19. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, cause a computing system to:
receiving an input gene sequence file having at least one input gene sequence;
segmenting the input gene sequence file into a plurality of segmented sequence files, each of the plurality of segmented sequence files having a portion of the input gene sequence, wherein the input gene sequence file is segmented based on a query optimization technique and a query plan;
analyzing the plurality of segmented sequence files with respect to reference gene sequences to generate an alignment score for each of the plurality of segmented sequence files;
generating a plurality of reduced documents based on the alignment score for each of the plurality of segmented sequence files and the portions of the input gene file included in the plurality of segmented gene files, wherein each of the plurality of reduced documents includes the alignment score of the one or more segmented sequence files having a common portion of the input gene sequence; and
determining a final document indicative of a degree of alignment of the input gene sequence based on the plurality of reduced documents.

Documents

Application Documents

#	Name	Date
1	1810-MUM-2012-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26
1	Form-2(Online).pdf	2018-08-11
2	1810-MUM-2012-IntimationOfGrant13-01-2022.pdf	2022-01-13
2	Drawings.pdf	2018-08-11
3	ABSTRACT1.jpg	2018-08-11
3	1810-MUM-2012-PatentCertificate13-01-2022.pdf	2022-01-13
4	1810-MUM-2012-Written submissions and relevant documents [24-12-2021(online)].pdf	2021-12-24
4	1810-MUM-2012-POWER OF ATTORNEY(16-8-2012).pdf	2018-08-11
5	1810-MUM-2012-FORM-26 [07-12-2021(online)].pdf	2021-12-07
5	1810-MUM-2012-FORM 5(2-7-2014).pdf	2018-08-11
6	1810-MUM-2012-FORM 1(2-11-2012).pdf	2018-08-11
6	1810-MUM-2012-Correspondence to notify the Controller [19-11-2021(online)].pdf	2021-11-19
7	1810-MUM-2012-US(14)-HearingNotice-(HearingDate-16-12-2021).pdf	2021-11-15
7	1810-MUM-2012-CORRESPONDENCE(2-7-2014).pdf	2018-08-11
8	1810-MUM-2012-CORRESPONDENCE(2-11-2012).pdf	2018-08-11
8	1810-MUM-2012-CLAIMS [17-03-2020(online)].pdf	2020-03-17
9	1810-MUM-2012-COMPLETE SPECIFICATION [17-03-2020(online)].pdf	2020-03-17
9	1810-MUM-2012-CORRESPONDENCE(16-8-2012).pdf	2018-08-11
10	1810-MUM-2012-FER.pdf	2019-09-18
10	1810-MUM-2012-FER_SER_REPLY [17-03-2020(online)].pdf	2020-03-17
11	1810-MUM-2012-FER.pdf	2019-09-18
11	1810-MUM-2012-FER_SER_REPLY [17-03-2020(online)].pdf	2020-03-17
12	1810-MUM-2012-COMPLETE SPECIFICATION [17-03-2020(online)].pdf	2020-03-17
12	1810-MUM-2012-CORRESPONDENCE(16-8-2012).pdf	2018-08-11
13	1810-MUM-2012-CLAIMS [17-03-2020(online)].pdf	2020-03-17
13	1810-MUM-2012-CORRESPONDENCE(2-11-2012).pdf	2018-08-11
14	1810-MUM-2012-CORRESPONDENCE(2-7-2014).pdf	2018-08-11
14	1810-MUM-2012-US(14)-HearingNotice-(HearingDate-16-12-2021).pdf	2021-11-15
15	1810-MUM-2012-Correspondence to notify the Controller [19-11-2021(online)].pdf	2021-11-19
15	1810-MUM-2012-FORM 1(2-11-2012).pdf	2018-08-11
16	1810-MUM-2012-FORM 5(2-7-2014).pdf	2018-08-11
16	1810-MUM-2012-FORM-26 [07-12-2021(online)].pdf	2021-12-07
17	1810-MUM-2012-POWER OF ATTORNEY(16-8-2012).pdf	2018-08-11
17	1810-MUM-2012-Written submissions and relevant documents [24-12-2021(online)].pdf	2021-12-24
18	ABSTRACT1.jpg	2018-08-11
18	1810-MUM-2012-PatentCertificate13-01-2022.pdf	2022-01-13
19	Drawings.pdf	2018-08-11
19	1810-MUM-2012-IntimationOfGrant13-01-2022.pdf	2022-01-13
20	Form-2(Online).pdf	2018-08-11
20	1810-MUM-2012-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26

Search Strategy

1	1810_mum_2012AE_26-10-2021.pdf
1	2019-09-0917-45-30_11-09-2019.pdf
2	1810_mum_2012AE_26-10-2021.pdf
2	2019-09-0917-45-30_11-09-2019.pdf