Abstract: ABSTRACT The present invention relates to use of novel bio informatics approach for predicting and identifying Scaffold/Matrix attachment regions (S/MARs) from different genomic database.
FIELD OF THE INVENTION
The present invention relates to use of novel bioinformatics approach for predicting and identifying Scaffold/Matrix attachment regions (S/MARs) from different genomic database.
BACKGROUND AND PRIOR ART OF THE INVENTION
A variety of patterns have been observed on the DNA sequences and proteins that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns, these patterns are of great interest. Among these S/MARs (Scaffold/Matrix attachment regions, abbreviated as S/MARs) is one of the most important DNA sequences. In the nucleus of eukaryotic cells specific regions of the DNA are attached to the nuclear matrix. These regions are called S/MARs. It is believed that there are tens of thousands of S/MARs in the genome of higher organisms (Boulikas, T. 1995). They are believed to be responsible for attachment of chromatin loops to the nuclear scaffold or matrix (Heng et. al. 2004). These sequences are involved in chromatin remodeling and subsequent transcriptional activation and also protection of transgenes from position effect (Widak, W. and Widlak, P. 2004, Cockerill et. al. 1987 and Walter et. al. 1998). They also have a strong effect on the level of expression of transgenes as shown by Allen, GC. et. al. in 2000. Insertion of these sequences into the vector backbone has been shown to enhance the expression of therapeutics proteins (Girod, PA. and Mermod, N. 2003).
One of the major constraints with experimental detection of S/MARs is that it exhibits variation in length and nucleotide sequence, this trait is yet to be explored. So experimental detection is not suitable for large-scale screening of genomic sequences and thus bioinformatics approach is a prerequisite for the analysis of whole genomes.
Several bioinformatics methods of S/MAR prediction have been developed as a result of considerable amount of research. The MAR-Finder method scores sub-sequences of DNA by the abundance of DNA-motifs thought to be correlated with S/MARs (Singh et. al. 1997). SMARTest (Frisch et. al. 2002) and ChrClass (Glazko et. al. 2001) are two different methods which used a training set in predicting motifs. Basis
of Mar-Wiz rule in predicting S/MAR is that a long run of bases that do not contain a G binds to the matrix (Dickinson et. al. 1992). Kieffer et al. calculated free energy to predict S/MARs(Thermodyn).. In addition, experimental groups have suggested particular motifs: the MAR recognition signature (MRS) consisting of two consensus sequences (van Drunen et. al. 1999) and a "consensus" sequence by Wang et. al. in 1995. Recently researchers at Selexis SA and The University of Lausanne have reported identification of MARs using a novel bioinformatics approach, called SMARScan (Girod et. al. 2007), which suggests that S/MAR sequences adopt a curved DNA structure and binds specific transcription factors.
MAR-Finder
The MAR-Finder method utilizes the pattern-density on DNA sequence as the basis for predicting the occurrence of Matrix Association Regions or MARs. It uses a set of DNA-sequence motifs that have been biologically known to be present in S/MARs. In a window of fixed length the number of occurrences of each motif is determined and compared to the expected number of occurrences in a random DNA sequence of the same length as the window. Using statistical algorithm MAR-potential is calculated which is average of the score for both positive and negative strand. This step is repeated for each window along the sequence and those windows that have a MAR-potential above a given threshold are predicted to contain a putative S/MAR.MAR-Finder gives a sensitivity of 32% and a precision of 80%.
MAR-Wiz Rule
It has been found that a long run of bases that do not contain a G binds to the matrix [14]. Computational approach to find MARs in MAR-Wiz is based upon the co¬occurrence of 20 DNA patterns that have been known to occur in the neighborhood of MARs. These motifs are used to define higher order rules that are in-tum defined using the various combinations in which the patterns have been known to co-occur. The mathematical density of the rule occurrences in a region is assumed to imply the presence of a MAR in that region.
MRS Signature
MAR recognition signature, is a bipartite sequence that consists of two individual sequences AATAAYAA and AWWRTAANNWWGNNNC .It has been suggested to be an indicator for the presence of S/MAR, where Y = C or T, W = A or T, R = A or G, and N = A or C or G or T. It has been suggested that these motifs should appear within about 200 bp of each other independent of strand and order and could even be overlapping.
SMARTest
This approach is based on a library of S/MAR-associated, AT-rich patterns derived from comparative sequence analysis of experimentally defined S/MAR sequences. Initially by using experimentally defined S/MAR sequences as the training set and a library of new S/MAR-associated, AT-rich patterns described as weight matrices was generated. Then performing a density analysis based on the S/MAR matrix library, potential S/MARs were identified. Currently, proprietary library of 97 S/MAR-associated weight matrices are used to test genomic DNA sequences for the occurrence of potential regions of S/MARs. S/MAR predictions were also evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%.
SMARScan
SMARScan works on the hypothesis, which involves activation of gene expression by MARs, which may require sequences determining structural properties of the DNA, such as DNA curvature, as well as motifs serving as binding sites for transcription factors. The SMARScan I program was assembled to automatically compute structural features of DNA using the GeneExpress algorithms designed to predict the melting temperature, curvature, major grove depth and minor grove width of the DNA and later SMARScan I was coupled to the prediction of potential transcription factor binding sites, resulting in SMARScan II.
ChrClass
Multivariate linear discriminant analysis revealed significant differences between frequencies of simple nucleotide motifs in S/MAR sequences and in sequences extracted directly from various nuclear matrix elements, such as nuclear lamina, cores of rosette-like structures, synaptonemal complex. Based on this result ChrClass was developed for the prediction of the regions associated with various elements of the nuclear matrix in a query sequence.
Stress-induced destabilization
Stress-induced destabilization (SIDD) calculations predict where the DNA strands can easily separate: it has been suggested that this is an indication of the presence of an S/MAR (Benham et. al. 1997). It has been shown by computational analysis that S/MARs conform to a specific design whose essential attribute is the presence of stress-induced base-unpairing regions (BURs). SIDD profiles are calculated later using a previously developed statistical mechanical procedure in which the superhelical deformation is partitioned between strand separation, twisting within denatured regions, and residual superhelicity.
Consensus sequence
The consensus sequence consisted of concatemerized repeats of a 25-base pair SATBl recognition sequence (TCTTTAATTTCTAATATATTTAGAA), which is derived from the core unwinding element of the MAR downstream of the mouse immunoglobulin heavy chain enhancer.
Thermodyn
Thermodyn is a calculation of the free energy of strand separation derived from summing the contributions of each doublet in a window to the thermodynamic quantities AH and AS.
AT-percentage
A simple measure of AT-percentage was also used for predicting S/MARs .AT percentage was calculated as the proportion of bases that are A or T in a sliding window of 300 bases.
Comparing studies between different methods (Evans et. al. 2007) has suggested that that existing methods can definitely pick out few really true positive S/MARs, however, it is also clear that there is a need of a new bioinformatics approach, which will identify S/MARs with good precision. In contrast to previous algorithms developed for prediction of S/MARs that were based on pattern and density analysis, a new approach based on gene expression levels has been developed. In this study, a genome scale analysis of expression level to predict the intergenic S/MAR elements has been undertaken. Experimentally defined S/MAR sequences were used as the training set and a library of new S/MAR-associated sequences has been generated based on higher and constitutive gene expression. This approach is independent of sequence context and is suitable for the analysis of complete chromosomes. These findings will open new perspectives for the identification of S/MARs, which will help in understanding the importance of S/MARs in gene regulation. References:
1. Boulikas, T. Int Rev Cytol. 162A, 279-388 (1995)
2. Heng, HHQ. et. al. J Cell Sci. 117, 999-1008 (2004)
3. Widak, W. and Widlak, P. Cell Mol Biol Lett. 9,123-133 (2004)
4. Cockerill, PN. et. al. J Biol Chem. 262, 5394-5397 (1987)
5. Walter, WR. et. al. Biochem Biophys Res Commun. 242, 419-422 (1998)
6. Allen, GC. et. al. Plant Molecular Biology. 43, 361-176 (2000)
7. Girod, PA. and Mermod, N. Gene Transfer and Expression in Mammalian Cells, Elsevier Sciences, 359-379 (2003)
8. Singh, GB. et. al. NAR. 25, 1419-1425 (1997)
9. Frish, M. et. al. Genom. Biol. 12, 349-354 (2002)
10. Glazko, GV. et. al. Biochim Biophys Acta. 1517, 351-364 (2001)
11. Dickinson, LA. et. al. Cell. 70, 631-645 (1992)
12. van Drunnen, CM. et. al. NAR. 27, 2924-2930 (1999)
13. Wang, B. et. al. J Biol Chem. 270, 23239-23242 (1995)
14. Girod, PA. et. al. Nature Mehtods. 4, 747-753 (2007)
15. Benham, C. et. al. J Mol Biol. 274, 181-196 (1997)
16. Evans, K. et. al. BMC Bioinformatics. 8, 71-99 (2007)
OBJECTS OF THE INVENTION
The main object of the present invention is to develop a method for identifying Scaffold/Matrix attachment region(S/MAR) sequence.
Another object of the present invention is to obtain a Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof.
Yet another object of the present invention is to use (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] for increased protein production through enhanced expression of genes.
STATEMENT OF THE INVENTION
Accordingly, the present invention relates to a method for identifying Scaffold/Matrix attachment region(S/MAR) sequence, said method comprising steps of (a) generating a library of subset of genes based on higher and constitutive gene expression predicted from datasets derived from human autonomic gene expression library; and (b) assessing 5' UTR intergenic sequences for the subsets to identify the MAR sequence; and a Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof.
BREIF DESCRIPTION OF ACCOMPANYING FIGURE
Figure 1: Work-Flow for In Silico Identification of human S/MARs
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates to a method for identifying Scaffold/Matrix attachment region(S/MAR) sequence, said method comprising steps of:
a) generating a library of subset of genes based on higher and constitutive gene expression predicted from datasets derived from human autonomic gene expression library; and
b) assessing 5' UTR intergenic sequences for the subsets to identify the MAR sequence.
In another embodiment of the present invention, the intergenic sequence was retrieved within a defined region of the genome using Ensembl Slice.
In still another embodiment of the present invention, the MAR sequence is selected from a group comprising structural motifs, DNA-unwinding motif, replication initiator protein sites, homo-oligonucleotide repeats, hexanucleotides motifs, stretches of either T or A residues, SATBI recognition sequence, kinked DNA, intrinsically curved DNA and motif TTTAAA.
In still another embodiment of the present invention, the S/MAR sequence was identified by assessing 5' UTR intergenic region using perl program.
The present invention relates to a Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof
In still another embodiment of the present invention, the MAR sequences are selected from a group comprising structural motifs, DNA-unwinding motif, replication initiator protein sites, homo-oligonucleotide repeats, hexanucleotides motifs, stretches of either T or A residues, SATBI recognition sequence, kinked DNA, intrinsically curved DNA and motif TTTAAA.
In still another embodiment of the present invention, the Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] increase protein production through enhanced expression of genes.
The present invention relates to use of novel bioinformatics approach for predicting and identifying MARs from different genomic database (Figure 1). This object has been achieved by providing novel bioinformatics approach in identifying new S/MARs from different genomic database nucleotide sequences. In contrast to existing algorithms that were developed for the prediction of S/MARs, which were based on pattern and density analysis. This approach based on gene expression levels has been developed. In addition to this, the use of this approach has been validated by identifying some novel sequences with good sensitivity.
In silico genome wide analysis for identification of scaffold - matrix attachment regions were performed in humans based on gene expression levels. For the prediction two different expression datasets were used. These datasets were derived from Human anatomic gene expression library which is composed of expression patterns of transcriptional products generated in practical ten tissue categories based on tissue-specific expression data from seven experimental platforms which includes iAFLP, PCR-based quantitative expression profiling, long oligomers cDNA arrays, short oligomers cDNA arrays, nylon cDNA macroarrays, glass slide cDNA microarrays, SAGE, EST, and MPSS. The tissue categories included in the dataset was neural, blood/spleen, dermal connective tissue, muscle/heart, placenta/testis/ovary, stomach/colon, liver, lungs, kidney, endocrine, exocrine.
Constitutive expression dataset: For the selection of constitutive genes, 100 subset of genes were picked, which were expressed in all 10 tissues categories and average expression values greater than 0.12. This filtering removed genes with low expression values and which were not expressed in all the ten tissue categories.
Higher expression dataset: For the selection of highly expressed genes, 100 subset of genes were picked, which had average high expression profiles (< 1) out of the entire gene transcripts. This filtering removed genes with low expression values that may not allow differentiating between expressed and non-expressed genes.
The 5'UTR was assessed for the resulting subsets of genes for identification of S/MARs. For assessment, intergenic sequence was retrieved within a defined region of the genome between the two genes using Ensembl Slice. Promoter mapping was performed using the Eukaryotic Promoter database, which is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Using Perl program, MAR recognition signatures were identified by scanning 5'UTR intergenic region.
MAR recognition signatures was divided into following:
Structural motifs, includes higher AT-content (60-70%), bipartite sequence AATAAYAA and AWWRTAANNWWGNNNC within 200 bp, DNA-unwinding motif (AATATATTAATATT), replication initiator protein sites (ATTA and, ATTTA), homo-oligonucleotide repeats (A-box AATAAAYAAA, T-box TTWTWTTWTT), hexanucleotides motifs ATATTT/ATATAT, stretches of either T or A residues AAAAAA/TTTTTT, ATTTTTATAAAAA, SATBl recognition sequence (TCTTTAATTTCTAATATATTTAGAA), kinked DNA which include copies of the dinucleotide TG, CA or TA that are separated by 2—4 or 9~12 nucleotides, intrinsically curved DNA includes repeats of the motif, AAAAn7AAAn7AAAA as well as the motif TTTAAA.
MAR functional elements that uniquely bind to the MARs includes high mobility group HMGIjY protein TATTATATAA binding sites, topoisomerase II RNYNNCNNGYNGKTNYNY, transcription factors such as H-box (A/r25), MTAATA, Y-box (CCAAT) and CTAT repeats-binding proteins, CDP (CCAAT displacement protein).
Comparative sequence analysis was performed using experimentally defined 7 prototype human S/MAR sequences collected on the basis of literature evidences. The final set of S/MAR sequences that are identified by the above procedures will be sent to the lab for validating the enhancement of gene expression of genes downstream of these S/MAR sequences.
We Claim;
1) A method for identifying Scaffold/Matrix attachment region(S/MAR)
sequence, said method comprising steps of:
a) generating a library of subset of genes based on higher and constitutive gene expression predicted from datasets derived from human autonomic gene expression library; and
b) assessing 5' UTR intergenic sequences for the subsets to identify the MAR sequence.
2) The method as claimed in claim 1, wherein the intergenic sequence was retrieved within a defined region of the genome using Ensembl Slice.
3) The method as claimed in claim 1, wherein the MAR sequence is selected from a group comprising structural motifs, DNA-unwinding motif, replication initiator protein sites, homo-oligonucleotide repeats, hexanucleotides motifs, stretches of either T or A residues, SATBl recognition sequence, kinked DNA, intrinsically curved DNA and motif TTTAAA.
4) The method as claimed in claim 1, wherein the MAR sequence was identified by assessing 5' UTR intergenic region using perl program.
5) A Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof.
6) The MAR sequences as claimed in claim 5, wherein the MAR sequences are selected from a group comprising structural motifs, DNA-unwinding motif, replication initiator protein sites, homo-oligonucleotide repeats, hexanucleotides motifs, stretches of either T or A residues, SATBl recognition sequence, kinked DNA, intrinsically curved DNA and motif TTTAAA.
7) The Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] as claimed in claim 5,
wherein said sequence[s] increase protein production through enhanced expression of genes.
8) The method and the scaffold/matrix attachment region (S/MAR) sequences as substantially herein described with accompanying examples and figures.
| # | Name | Date |
|---|---|---|
| 1 | 1413-che-2008 form-5.pdf | 2011-09-03 |
| 1 | 1413-CHE-2008-AbandonedLetter.pdf | 2019-04-29 |
| 2 | 1413-CHE-2008-FER.pdf | 2018-10-25 |
| 2 | 1413-che-2008 form-3.pdf | 2011-09-03 |
| 3 | 1413-che-2008 form-2.pdf | 2011-09-03 |
| 3 | 1413-CHE-2008 CORRESPONDENCE OTHERS 11-06-2012.pdf | 2012-06-11 |
| 4 | 1413-CHE-2008 FORM-18 11-06-2012.pdf | 2012-06-11 |
| 4 | 1413-che-2008 form-1.pdf | 2011-09-03 |
| 5 | 1413-che-2008 drawings.pdf | 2011-09-03 |
| 5 | 1413-che-2008 abstract.pdf | 2011-09-03 |
| 6 | 1413-che-2008 description-(provisional).pdf | 2011-09-03 |
| 6 | 1413-che-2008 claims.pdf | 2011-09-03 |
| 7 | 1413-che-2008 description (complete).pdf | 2011-09-03 |
| 7 | 1413-che-2008 correspondence-others.pdf | 2011-09-03 |
| 8 | 1413-che-2008 description (complete).pdf | 2011-09-03 |
| 8 | 1413-che-2008 correspondence-others.pdf | 2011-09-03 |
| 9 | 1413-che-2008 description-(provisional).pdf | 2011-09-03 |
| 9 | 1413-che-2008 claims.pdf | 2011-09-03 |
| 10 | 1413-che-2008 abstract.pdf | 2011-09-03 |
| 10 | 1413-che-2008 drawings.pdf | 2011-09-03 |
| 11 | 1413-CHE-2008 FORM-18 11-06-2012.pdf | 2012-06-11 |
| 11 | 1413-che-2008 form-1.pdf | 2011-09-03 |
| 12 | 1413-che-2008 form-2.pdf | 2011-09-03 |
| 12 | 1413-CHE-2008 CORRESPONDENCE OTHERS 11-06-2012.pdf | 2012-06-11 |
| 13 | 1413-CHE-2008-FER.pdf | 2018-10-25 |
| 13 | 1413-che-2008 form-3.pdf | 2011-09-03 |
| 14 | 1413-CHE-2008-AbandonedLetter.pdf | 2019-04-29 |
| 14 | 1413-che-2008 form-5.pdf | 2011-09-03 |
| 1 | 1413CHE2008searchstrategy_25-10-2018.pdf |