Encoding And Decoding Of Rna Data

< Back

Encoding And Decoding Of Rna Data

Abstract: ENCODING AND DECODING OF RNA DATA Encoding of RNA data comprising a nucleotide sequence string (165) and a structure string (170) for an RNA molecule, and decoding the encoded data are described. In an example, in the structure string (170), one or more contiguous structure stretches (210) of each of a plurality of character types may be identified. Each contiguous structure stretch (210) may include one or more structural characters of same character type. Further, for each of the one or more contiguous structure stretches (210), a start position and an end position of a contiguous structure stretch (210) may be determined to identify a corresponding nucleotide stretch (208) in the nucleotide sequence string (165). For each of the one or more contiguous structure stretches (210), the structural character of a character type indicated by a contiguous structure stretch (210) may be appended to the corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 March 2014

Publication Number

46/2015

Publication Type

INA

Invention Field

COMMUNICATION

Status

Email

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2021-08-17

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. MANDE, Sharmila S.

TCS Innovation Labs, Tata Research Design and Development Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra

2. BOSE, Tungadri

TCS Innovation Labs, Tata Research Design and Development Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra

3. DUTTA, Anirban

TCS Innovation Labs, Tata Research Design and Development Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra

4. HAQUE, Mohammed Monzoorul

TCS Innovation Labs, Tata Research Design and Development Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra

5. GANDHI, Hemang

TCS Innovation Labs, Tata Research Design and Development Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra

Specification

CLIAMS:1. A computer implemented method for encoding (Ribonucleic acid) RNA data comprising a nucleotide sequence string (165) and a structure string (170), the method comprising:
obtaining, by a processor (110), the nucleotide sequence string (165) and the structure string (170) for an RNA molecule, the nucleotide sequence string (165) comprising a plurality of nucleotide characters indicating order of nucleotides in the RNA molecule, and the structure string (170) including a structural character corresponding to each of the plurality of nucleotide characters in the nucleotide sequence string, wherein the structural character is indicative of a structural attribute of a corresponding nucleotide character;
identifying, by the processor (110), in the structure string (170), one or more contiguous structure stretches (210) of each of a plurality of character types, wherein each of the one or more contiguous structure stretches (210) may include one or more structural characters of same character type, and wherein the structural attribute is defined by a character type of the structural character;
determining, by the processor (110), for each of the one or more contiguous structure stretches (210), a start position and an end position of a contiguous structure stretch (210) to identify a corresponding contiguous nucleotide stretch (208) in the nucleotide sequence string (165); and
for each of the one or more contiguous structure stretches (210), affixing, by the processor (110), the structural character of a character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212).
2. The computer implemented method as claimed in claim 1, wherein the affixing further comprises one of:
prefixing the structural character of the character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain the encoded string (212); and
suffixing the structural character of the character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain the encoded string (212).
3. The computer implemented method as claimed in claim 1, wherein the method further comprises removing, by the processor (110), from the encoded string (212), one or more redundant structural characters of an unpaired character type to generate a modified encoded string (214), and wherein the one or more redundant structural characters are identified based on positioning of a structural character with respect to positioning of adjacent structural characters in the encoded string (212).
4. The computer implemented method as claimed in claim 3, wherein the method further comprises:
identifying, by the processor (110), a first set of nucleotide characters and a second set of nucleotide characters forming a stem region (180) in the modified encoded string (214), wherein the first set of nucleotide characters is identified based on a position of a structural character indicating opening of the stem region (180), and the second set of nucleotide characters is identified based on a position of a corresponding structural character indicating closing of the stem region (180); and
removing, by the processor (110), one of the first set of nucleotide characters and the second set of nucleotide characters from the modified encoded string (214), and retaining other set of nucleotide characters to obtain encoded RNA data (226), the encoded RNA data (226) including sequence information and secondary structural information pertaining to the RNA molecule.
5. The computer implemented method as claimed in claim 4, wherein the method further comprises:
ascertaining, by the processor (110), for each of the retained set of nucleotide characters, whether each of the nucleotide characters in the retained set of nucleotide characters follow standard base pairing rules;
inserting, by the processor (110), a structural character indicating non-standard base-pairing in the retained set of nucleotide characters after a nucleotide character, which does not follow the standard base pairing rules; and
appending, by the processor (110), a non-standard complementary nucleotide character to the nucleotide character after the structural character indicating non-standard base-pairing to obtain the encoded RNA data (226).
6. A computer implemented method for decoding encoded (Ribonucleic acid) RNA data (226), the method comprising:
identifying, by a processor (110), contiguous nucleotide stretches (208) satisfying unpaired structural character insertion criteria, the unpaired structural character criteria being based on positioning of a structural character indicating closing or opening of a stem region (180);
for each of the identified contiguous nucleotide stretches (208), ascertaining, by the processor (110), whether an identified contiguous nucleotide stretch is affixed to a structural character of unpaired structural character type, wherein the affixing is preceding when a contiguous structure stretch (210) is prefixed to a corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212) during encoding of RNA data, and the affixing is suffixing when the contiguous structure stretch (210) is suffixed to the corresponding contiguous nucleotide stretch (208) to obtain the encoded string (212) during encoding of RNA data;
appending, by the processor (110), the unpaired structural character to an identified contiguous nucleotide stretch (208) to obtain a decoded string, based on the ascertaining;
identifying, by the processor (110), in the decoded string, corresponding pairs of structural characters indicating opening and closing of the stem region (180), based on the standard nucleotide positioning rules;
for each pair of structural characters indicating opening and closing of the stem region (180), determining, by the processor (110), a set of nucleotide characters associated with the structural character indicating one of opening of the stem region (180) and closing of the stem region (180), based on the set that was retained while encoding;
for each of the determined set of nucleotide characters, inserting, by the processor (110), a set of complementary nucleotide characters to generate a modified decoded string, wherein the set of complementary nucleotide characters is appended to the structural character indicating one of the closing of a corresponding stem region (180) and opening of the corresponding stem region (180), based on the determining of the set of nucleotide characters;
determining, by the processor (110), a number of the nucleotide characters associated with each of the structural characters in the modified decoded string; and
generating, by the processor (110), decoded RNA data comprising a decoded structure string (170) and a decoded sequence string (165), based on the number of the nucleotide characters associated with each of the structural characters in the modified decoded string, wherein,
for each instance of a structural character in the modified decoded string, the structural character is repeated in the decoded structure string (170) the same number of times as the number of nucleotide characters associated with the structural character in the modified decoded string; and
the structural characters are removed from the modified decoded string to provide the decoded sequence string.
7. The computer implemented method as claimed in claim 6, wherein inserting the set of complementary nucleotide characters further comprises:
ascertaining, by the processor (110), for each of the sets of nucleotide characters, whether the set of nucleotide characters includes a structural character indicating a non-standard base-pairing pattern;
removing, by the processor (110), the non-standard base-pairing pattern structural character from the decoded string, based on the ascertaining;
deleting, by the processor (110), non-standard complementary nucleotide character preceded by a structural character indicating non-standard base-pairing pattern from the set of the nucleotide characters; and
inserting, by the processor (110), the non-standard complementary nucleotide character in the set of complementary nucleotide characters at a position corresponding to the nucleotide character, which violated the standard base-pairing rule.
8. A computing system (100) for encoding (Ribonucleic acid) RNA data comprising a nucleotide sequence string (165) and a structure string (170), the computing system (100) comprising:
a processor (110);
a data gathering module (128) coupled to the processor (110) to obtain, the nucleotide sequence string (165) and the structure string (170) for an RNA molecule, the nucleotide sequence string (165) comprising a plurality of nucleotide characters indicating order of nucleotides in the RNA molecule, and the structure string (170) including a structural character corresponding to each of the plurality of nucleotide characters in the nucleotide sequence string (165), wherein the structural character is indicative of a structural attribute of a corresponding nucleotide character; and
an encoding module (130) coupled to the processor (110), the encoding module (130) comprising a structure information removal (IR) module (190) to:
identify, in the structure string (170), one or more contiguous structure stretches (210) of each of a plurality of character types, wherein each of the one or more contiguous structure stretches (210) may include one or more structural characters of same character type, and wherein the structural attribute is defined by a character type of the structural character;
determine, for each of the one or more contiguous structure stretches (210), a start position and an end position of a contiguous structure stretch (210) to identify a corresponding contiguous nucleotide stretch (208) in the nucleotide sequence string (165); and
for each of the one or more contiguous structure stretches (210), affixes the structural character of a character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212).
9. The computing system (100) as claimed in claim 8, wherein the structure IR module (190) to affix the structural character performs one of:
prefixes the structural character of the character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain the encoded string (212); and
suffixes the structural character of the character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain the encoded string (212).
10. The computing system (100) as claimed in claim 8, wherein the structure information removal (IR) module (190) further removes, from the encoded string (212), one or more redundant structural characters of unpaired character type to generate a modified encoded string (214), and wherein the one or more redundant structural characters are identified based on positioning of a structural character with respect to positioning of adjacent structural characters in the encoded string (212).
11. The computing system (100) as claimed in claim 10, wherein the encoding module (130) further comprises a sequence IR module (195) to:
identify a first set of nucleotide characters and a second set of nucleotide characters forming a stem region (180) in the modified encoded string (214), wherein the first set nucleotide characters is identified based on a position of a structural character indicating opening of the stem region (180), and the second set of nucleotide characters is identified based on a position of a corresponding structural character indicating closing of the stem region (180); and
remove one of the first set of nucleotide characters and the second set of nucleotide characters from the modified encoded string (214), and retaining other set of nucleotide characters to obtain encoded RNA data, the encoded RNA data including of sequence information and secondary structural information pertaining to the RNA molecule.
12. The computing system (100) as claimed in claim 11, wherein the sequence IR module (195) further:
ascertains, for each of the retained set of nucleotide characters, whether each of the nucleotide characters in the retained set of nucleotide characters follow standard base pairing rules;
inserts a structural character indicating non-standard base-pairing in the retained set of nucleotide characters after a nucleotide character, which does not follow the standard base pairing rules; and
appends a non-standard complementary nucleotide character with the nucleotide character after the structural character indicating non-standard base-pairing to obtain the encoded RNA data.
13. A computing system (100) for decoding encoded RNA data (226), the computing system (100) comprising:
a processor (110); and
a decoding module (135) coupled to the processor (110) to:
identify contiguous nucleotide stretches (208) satisfying unpaired structural character insertion criteria, the unpaired structural character criteria being based on positioning of a structural character indicating closing or opening of a stem region (180);
for each of the identified contiguous nucleotide stretches (208), ascertains whether the identified contiguous nucleotide stretch (208) is followed by a structural character of unpaired structural character type, when a contiguous structure stretch (210) is suffixed to a corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212) during encoding of RNA data;
suffixes the unpaired structural character to an identified nucleotide stretch to obtain a decoded string, based on the ascertaining;
identifies, in the decoded string, corresponding pairs of structural characters indicating opening and closing of the stem region (180), based on standard nucleotide positioning rules;
for each pair of structural characters indicating opening and closing of the stem region (180), determines a set of nucleotide characters preceding the structural character indicating one of opening of the stem region (180) and closing of the stem region (180), based on the set that was retained while encoding;
generates a modified decoded string, based on a set of complementary nucleotide characters corresponding to each of the sets of the nucleotide characters, wherein for each of the determined set of nucleotide characters, a set of complementary nucleotide characters is prefixed to the structural character indicating one of the closing of a corresponding stem region (180) and the opening of corresponding stem region (180), based on the determining of the set of nucleotide characters;
determines a number of the nucleotide characters preceding each of the structural characters in the modified decoded string; and
generates decoded RNA data comprising a decoded structure string (170) and a decoded sequence string, based on the number of the nucleotide characters preceding each of the structural characters in the modified decoded string, wherein,
for each instance of a structural character in the modified decoded string, the structural character is repeated in the decoded structure string (170) the same number of times as the number of nucleotide characters preceding the structural character in the modified decoded string; and
the structural characters are removed from the modified decoded string to provide the decoded sequence string.
14. The computing system (100) as claimed in claim 13, wherein the decoding module (135) to generate the modified decoded string:
ascertains, for each of the sets of nucleotide characters, whether the set of nucleotide characters includes a structural character indicating a non-standard base-pairing pattern;
removes the non-standard base-pairing pattern structural character from the decoded string, based on the ascertaining;
deletes non-standard complementary nucleotide character preceded by a structural character indicating non-standard base-pairing pattern from the set of the nucleotide characters; and
inserts the non-standard complementary nucleotide character in the set of complementary nucleotide characters at a position corresponding to the nucleotide character, which violated the standard base-pairing rule.
15. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, perform a method for encoding (Ribonucleic acid) RNA data comprising a nucleotide sequence string (165) and a structure string (170), the method comprising:
obtaining, by a processor (110), the nucleotide sequence string (165) and the structure string (170) for an RNA molecule, the nucleotide sequence string (165) comprising a plurality of nucleotide characters indicating order of nucleotides in the RNA molecule, and the structure string (170) including a structural character corresponding to each of the plurality of nucleotide characters in the nucleotide sequence string, wherein the structural character is indicative of a structural attribute of a corresponding nucleotide character;
identifying, by the processor (110), in the structure string (170), one or more contiguous structure stretches (210) of each of a plurality of character types, wherein each of the one or more contiguous structure stretches (210) may include one or more structural characters of same character type, and wherein the structural attribute is defined by a character type of the structural character;
determining, by the processor (110), for each of the one or more contiguous structure stretches (210), a start position and an end position of a contiguous structure stretch (210) to identify a corresponding contiguous nucleotide stretch (208) in the nucleotide sequence string (165); and
for each of the one or more contiguous structure stretches (210), affixing, by the processor (110), the structural character of a character type indicated by the contiguous structure stretch (210) to the corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212).
16. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, perform a method for decoding encoded (Ribonucleic acid) RNA data, comprising:
identifying, by a processor (110), contiguous nucleotide stretches (208) satisfying unpaired structural character insertion criteria, the unpaired structural character criteria being based on positioning of a structural character indicating closing or opening of a stem region (180);
for each of the identified contiguous nucleotide stretches (208), ascertaining whether an identified contiguous nucleotide stretch (208) is preceded by a structural character of unpaired structural character type, when a contiguous structure stretch (210) is prefixed to a corresponding contiguous nucleotide stretch (208) to obtain an encoded string (212) during encoding of RNA data;
prefixing the unpaired structural character to an identified nucleotide stretch to obtain a decoded string, based on the ascertaining;
identifying, in the decoded string, corresponding pairs of structural characters indicating opening and closing of the stem region (180), based on the standard nucleotide positioning rules;
for each pair of structural characters indicating opening and closing of the stem region (180), determining a set of nucleotide characters following the structural character indicating one of opening of the stem region (180) and closing of the stem region (180), based on the set that was retained while encoding;
for each of the determined set of nucleotide characters, inserting a set of complementary nucleotide characters to generate a modified decoded string, wherein the set complementary nucleotide characters is inserted after the structural character indicating one of the closing of a corresponding stem region (180) and opening of corresponding stem region (180), based on the determining of the set of nucleotide characters;
determining a number of the nucleotide characters following each of the structural characters in the modified decoded string; and
generating decoded RNA data comprising a decoded structure string (170) and a decoded sequence string, based on the number of the nucleotide characters following each of the structural characters in the modified decoded string, wherein,
for each instance of a structural character in the modified decoded string, the structural character is repeated in the decoded structure string (170) the same number of times as the number of nucleotide characters following the structural character in the modified decoded string; and
the structural characters are removed from the modified decoded string to provide the decoded sequence string.
,TagSPECI:As Attached

Documents

Application Documents

#	Name	Date
1	920-MUM-2014-Request For Certified Copy-Online(18-09-2014).pdf	2014-09-18
2	SPEC IN.pdf	2018-08-11
3	Sequence Listing.txt	2018-08-11
4	PD011652IN-SC_Request for Priority Documents-PCT.pdf	2018-08-11
5	FORM 5.pdf	2018-08-11
6	FORM 3.pdf	2018-08-11
7	FIG IN.pdf	2018-08-11
8	ABSTRACT1.jpg	2018-08-11
9	920-MUM-2014-Power of Attorney-130215.pdf	2018-08-11
10	920-MUM-2014-FORM 18.pdf	2018-08-11
11	920-MUM-2014-FORM 1(22-4-2014).pdf	2018-08-11
12	920-MUM-2014-Correspondence-130215.pdf	2018-08-11
13	920-MUM-2014-CORRESPONDENCE(22-4-2014).pdf	2018-08-11
14	920-MUM-2014-FER.pdf	2019-06-24
15	920-MUM-2014-PETITION UNDER RULE 137 [04-12-2019(online)].pdf	2019-12-04
16	920-MUM-2014-FORM 3 [04-12-2019(online)].pdf	2019-12-04
17	920-MUM-2014-Information under section 8(2) (MANDATORY) [06-12-2019(online)].pdf	2019-12-06
18	920-MUM-2014-OTHERS [24-12-2019(online)].pdf	2019-12-24
19	920-MUM-2014-FER_SER_REPLY [24-12-2019(online)].pdf	2019-12-24
20	920-MUM-2014-COMPLETE SPECIFICATION [24-12-2019(online)].pdf	2019-12-24
21	920-MUM-2014-CLAIMS [24-12-2019(online)].pdf	2019-12-24
22	920-MUM-2014-Response to office action [16-08-2021(online)].pdf	2021-08-16
23	920-MUM-2014-PatentCertificate17-08-2021.pdf	2021-08-17
24	920-MUM-2014-IntimationOfGrant17-08-2021.pdf	2021-08-17
25	920-MUM-2014-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26

Search Strategy

1	2019-06-2117-46-05_21-06-2019.pdf

ERegister / Renewals

3rd: 19 Aug 2021

From 20/03/2016 - To 20/03/2017

4th: 19 Aug 2021

From 20/03/2017 - To 20/03/2018

5th: 19 Aug 2021

From 20/03/2018 - To 20/03/2019

6th: 19 Aug 2021

From 20/03/2019 - To 20/03/2020

7th: 19 Aug 2021

From 20/03/2020 - To 20/03/2021

8th: 19 Aug 2021

From 20/03/2021 - To 20/03/2022

9th: 15 Feb 2022

From 20/03/2022 - To 20/03/2023

10th: 15 Mar 2023

From 20/03/2023 - To 20/03/2024

11th: 14 Mar 2024

From 20/03/2024 - To 20/03/2025

12th: 13 Mar 2025

From 20/03/2025 - To 20/03/2026