Sign In to Follow Application
View All Documents & Correspondence

Genome Wide Identification Of Human Scaffold/Matrix Attachment Regions That Enhances Gene Expression Levels

Abstract: The present invention relates to use of novel bioinformatics approach for predicting and identifying Scaffold/Matrix attachment regions (S/MARs) from different genomic database.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
07 December 2009
Publication Number
23/2011
Publication Type
INA
Invention Field
BIOTECHNOLOGY
Status
Email
Parent Application

Applicants

AVESTHAGEN LIMITED
'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD,BANGALORE - 560 066.

Inventors

1. PATELL, VILLOO MORAWALA
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
2. MAITY, SUNIT
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
3. GOPALAKRISHNAN, CHELLAPPA
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
4. GUZDER, SAMI NOSIR
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
5. WARARAJ, NATTESH
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
6. PRASAD, ABHIMANYU
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.
7. PAKKIRISAMY, BARANIDHARAN
C/O AVESTHAGEN LIMITED, 'DISCOVERER', 9TH FLOOR, INTERNATIONAL TECH PARK, WHITEFIELD ROAD, BANGALORE - 560 066.

Specification

Genome-wide identification of Iiuman scaffold/matrix attacliment regions tliat
enliances gene expression levels
FIELD OF THE INVENTION
The present invention relates to a novel method for predicting and identifying Scaffold/Matrix attachment regions (S/MARs) from different genomic database and the use of these characterized S/MAR sequences in enhancing the gene expression.
BACKGROUND OF THE INVENTION
A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataloged and maintained within internationally shared databases. Furthermore, the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence.
What are S/MARs
The interphase nucleus contains a 3-dimensional filamentous protein network referred to as nuclear matrix (also called as scaffold or skeleton). The nuclear matrix is a complex structure consisting of various elements. It provides a framework to maintain the overall size and shape of the nucleus. MARs are defined as the genomic DNA sequences at which the chromatin is anchored to the nuclear matrix proteins during interphase. The matrix acts as a structural attachment site for the DNA loops during the interphase. Evolutionarily highly conserved 300-1000 bp long DNA sequences, referred to as SARs (Scaffold Associated Regions), have been identified that define the base of DNA loops, anchoring them to specific proteins. Scaffold/Matrix-associated region (S/MAR) sequences are DNA regions that are attached to the nuclear matrix, and participate in many cellular processes, such as it help to organize chromosomes, localize genes, and regulate DNA transcription and replication within the nucleus, thereby they subdivide the eukaryotic genome into structural and functional domains.
Location of S/MARs
They are found at the base of the chromatin loops into which the eukaryotic genome appears to be organized.

Functions of S/MARs
S/MARs occur exclusively in eukaryotic genomes. Functionally, S/MARs are very important as they participate in many cellular processes. They typically augment transcription rates in a highly context dependent manner (Schubeler et al., 1996). A few of their functions are listed below.
A. MAR-mediated transcriptional regulation
A proposed major function of S/MARs is the coordination of the expression of gene loci. Attachment of a genomic segment to the nuclear matrix places a gene in close proximity to its transcription factors, providing an essential step to expression (Bode et al. 1995, 2000; Boulikas 1995). S/MARs form the anchor points of loop domains with domain si^s ranging from a few kb to more than 100 kb (Bode et al. 1996).
S/MARs can activate enhancer regions (Cockerill, et al, 1987). S/MARs determine which one of a class of genes to transcribe [Walter WR et al, 1998]. S/MARs have a strong effect on the level of expression of transgenes. [Allen GC et al; 2000, Girod PA, et al; 2005].They are major determinants of locus control of gene expression and can shield gene expression from position effects. [Phi-Van, L et al. 1990]
B. MARs and loop domain organization
The organization of chromatin into discrete looped domains is believed to contribute structurally to the packaging of DNA in the nucleus and functionally to the regulation of expression and replication of the genome. The metabolic activities of the matrix proteins acting at MARs constrain the chromatin fiber and bring about the morphological changes of chromosomes by close packing of pre-existing or de novo formed loops. Thus, within individual chromosome territories, the chromatin fiber appears highly contorted, looping back and forth between the interior and periphery bringing together distant loop domains into close proximity.
C. S/MARs function as origins of replication
An additional feature of MARs is their function as origins of replication in combination with other genetic elements. MAR AT-rich sequences were reported to facilitate dissociation of the two DNA strands, and may thereby open chromatin and allow interaction with factors of the DNA replication machinery (Bode, et al, 1992). This has allowed the construction of episomally replicating expression vectors for mammalian cells (Piechaczek, c et al, 1999). MAR association with elements of the nuclear matrix was found to be essential for stable plasmid maintenance during mitosis [Baiker et al, 2000].
3

Characteristics of S/MARs
Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. It has been estimated that the human genome contains approximately 100,000 S/MARs (Boulikas et al. 1995; Bode et al. 1996), which demonstrates the fimctional importance of S/MARs,
The model of loop-domain organization of eukaryotic chromosomes is now generally accepted. (Bodnar, J. W. (1988), Gasser et al (1989), Blasquez et al, (1989)). According to this model, topologically independent chromatin loops are attached to the nuclear matrix/scaffold. A number of proteins of the nuclear matrix/scaffold presumably participating in the loop organization of chromosomes have been identified and some of their characteristics are known (Gasser, et al (1989), Breyne, P., et al (1994)), Regions of attachment of the chromosomal DNA to the nuclear scaffold/matrix (S/MARs) were identified as those involved in very important cellular processes: transcription, replication and recombination.
The interaction of MARs with the matrix is gene- and cell-type specific, and that this interaction is somehow linked with the expression status of adjacent genes. For example, the use of chicken lysozyme MAR (cLysMAR) elements cotransfected with the erythropoietin gene allowed regulated and long-term expression in undifferentiated C2CI2 myoblasts. However, the erythropoietin gene became silenced in most C2CI2 cells when they were committed to myotube differentiation (M. Imh of and N, Mermod, unpublished data).
Examination of the distribution of MARs in long- genomic DNA sequences of plants and animals revealed that it is similar to the average gene density, implying that each gene has its own MAR [Van Drunen,etal 1999].
MARs were generally found to lie close to enhancers and promoters or in the first intron. Furthermore, the MAR-mediated transcriptional effect was found to depend on its distance from the promoter and the direction of transcription, with the gene proximal to the MAR being strongly expressed whereas those located more distally being weakly expressed (Jackson et al, 1998; Van Drunen et al, 1999; Chernov et al, 2002; Frisch et al, 2002; Bode et al, 2000),
S/MARs are non-coding sites containing putative regulatory elements and binding sites of DNA-topoisomerase II.

S/MARs are known to have a minimum sequence length of 200 to 300 base pairs (Mielke et al. 1990). AT-rich patterns are present in S/MARs, and the number of these motifs will determine the stable and specific binding of S/MARs to the nuclear matrix (Romig et al. 1994).
It is generally accepted that DNA replication is associated with the nuclear matrix. It has also been shown that S/MARs and the origins of replication share tiie ATTA, ATTTA and ATTTTA motifs. Intiinsically curved DNA has been revealed at or near several S/MARs. Curved DNA is thought to play an important role in many molecular processes that involve the interaction of DNA and proteins, such as recombination, replication and transcription. Significant curvature can be expected for sequences with repeats of two motifs: AAAA(N)7AAA(N)7AAAA and TTTAAA.
A majority of the S/MARs identified in intergenic regions, suggest their structural role, i.e., delimitation of chromatin domains (Chernov IP et al 2002).
AT islands are involved in the organization of the genomic DNA on the nuclear matrix by acting as scaffold/matrix attachment regions, S/MARs. DNA duplexes of AT islands are unusually flexible and prone to base unpairing, which are crucial MAR attributes. (Woynarowski JM, 2004 Mar)
S/MAR Matrix Library:
Most S/MAR-associated patterns that have been published are defined solely as lUPAC descriptions (Sander and Hsieh 1985; Cockerill and Garrard 1986; Gasser and Laemmli 1986; Spitzner and Muiler 1988; Mielke et al. 1990; Boulikas 1993,1995; Bode et al. 1995; van Drunen et al. 1997).
S/MARtDB:
S/MARt DB collects information about scaffold/ matrix attached regions and the nuclear matrix proteins tiiat are supposed to be involved in tiie interaction of these elements with the nuclear matiix. It covers the whole range from yeast to human.
This background information is provided for the purpose of making known information believed by the applicant to be of possible relevance to the present invention. No admission is necessarily intended, nor should be construed, that any of the preceding information constitutes prior art against the present invention.

OBJECTS OF THE INVENTION
The main object of the present invention is to develop a method for identifying human Scaffold/Matrix attachment region (S/MAR) sequence.
Another object of the present invention is to obtain a Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof
Yet another object of the present invention is to use (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] for increased protein production through enhanced expression of genes.
Further object of this invention is to identify S/MAR regions in the complete human genome with the objective to use them to increase expression levels of genes of interest.
SUMMARY OF THE INVENTION
The present invention relates to a method for identifying Scaffold/Matrix attachment region (S/MAR) sequence from eukaryotic genome, said method comprising steps of (a) generating a library of subset of genes based on higher and constitutive gene expression predicted from datasets derived from human autonomic gene expression library using eukaryotic exon sequences as negative controls; and (b) assessing the enrichment ratio of the motifs known to be associated with sequences modulating expression levels (i.e motifs in known S/MAR sequences) and eukaryotic exon sequences and identifying intergenic sequences for the subsets to identify the MAR sequence; and a Scaffold/Matrix attachment region (S/MAR) sequence[s] or its complementary sequence[s], variant[s] and fragment[s] thereof
Though the said method has been employed to identify S/MAR sequences in the whole human genome, however, it can be successfully applied to identify S/MAR sequences in any eukaryotic genome.

SUMMARY OF THE ACCOMPANYING DRAWINGS
FIGURE 1: shows the percentage of sequence coverage by positively regulating DNA motifs in the
selected 2151 400-base length S/MAR sequences arranged in descending order of the coverage
percentage.
FIGURE 2: shows the percentage of sequence coverage by positively regulating DNA motifs in the
selected 800 800-base length S/MAR sequences arranged in descending order of the coverage percentage.
FIGURE 3: shows the flowchart for determining the enrichment values of DNA motifs in known S/MAR
sequences.
FIGURE 4: shows the flowchart for determining the enrichment and significance values of DNA motifs
in human intergenic sequences
FIGURE 5: shows the general flowchart for DNA motif enrichment calculation in known S/MAR
sequences
FIGURE 6: shows the general flow for identifying eukaryotic S/MAR sequences using significance of
DNA motif enrichment values
DETAILED DESCRIPTION OF THE INVENTION
Scaffold/matrix attachment regions (S/MARs) are operationally defined as DNA elements that bind specifically to the nuclear matrix or as DNA fi-agments that co purify with the nuclear matrix. S/MARs are sequences in the DNA of eukaryotic chromosomes where the nuclear matrix attaches. These elements constitute anchor points of the DNA for the chromatin scaffold and serve to organize the chromatin into structural domains. These are found at the base of the chromatin loops into which the eukaryotic genome appears to be organized.
These regions are about 300 bp to several kb in length and are present in all higher eukaryotes, including mammals and plants (Bode et al., 1996; Allen et al., 2000). S/MARs are notable for their AT richness and likely narrowing of the minor groove (Gasser et al., 1989; Bode et al., 1995, 1996). They belong to non coding sites in the genome. Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells.
Functionally MARs are very important as they participate in many cellular processes. They typically augment transcription rates in a highly context dependent manner (Schubeler et al., 1996) but are separable from enhancer sequences on the basis of transient expression analyses (Bode et al., 1995). S/MAR act independent of orientation and independent of distance, provided it is at least several kilo
7

bases. They can activate enhancer regions (Cockerill et al., 1987) and determine which one of a class of genes to transcribe (Walter et al., 1998). They also have a strong effect on the level of expression of transgenes (Allen et al., 2000; Girod et al., 2005).
The promoter-S/MAR distance is an important factor in the correct functioning of the S/MAR. (Mlynarova et al., 1995; Schubeler et al., 1996). In addition to the S/MAR-associated enhancement of gene expression, S/MARs have a proposed role in the negative regulation of gene expression. Such negative regulation is the proposed defauh mode of action for S/MARs both closely associated with the promoter sequence or when appearing downstream of the promoter (Schubeler et al., 1996). Such S/MARs would block progression by RNA polymerase II, so they may be either nonftmctional in vivo or have a regulated matrix-binding activity (Schubeler et al., 1996).
An additional feature of MARs is their function as origins of replication in combination with other genetic elements. MAR AT-rich sequences were reported to facilitate dissociation of the two DNA strands, and may thereby open chromatin and allow interaction with factors of the DNA replication machinery. This has allowed the construction of episomally replicating expression vectors for mammalian cells. EHie to these features of S/MAR, they are of intrinsic interest for the understanding of gene regulation, which will help to enhance gene expression and increased protein production in eukaryotic cells. But MARs exhibits lots of variations in length and nucleotide sequence, which is still unexplored and so experimental detection is not suitable for large-scale screening of genomic sequences. Hence bioinformatics approach is a prerequisite for the analysis of whole genomes. A great deal of research work has been focused on computer prediction of S/MARs. A number of methods have been proposed to predict S/MAR as MAR-fmder (Singh et al., 1997), H rule (Dickinson et al., 1992), MRS signature, SMARtest (Frisch et al., 2002), Duplex Destabilization and Thermodyne etc. Evans et al compared them and from their study they concluded that all the methods have little predictive power and a simple rule based on A-T percentage is generally competitive with other methods (Evans et al, 2007)
Instant invention is focused on "in silico Prediction of Human Scaffold/Matrix Attachment Regions specifically enhancing gene expression". Expression data and sequence information obtained from UniGene and Ensembl respectively. The potential sequences are identified afler screening the sequences for specific S/MAR features based on the algorithm developed by the applicants. Further, the identified S/MAR sequences will be used for construction of episomally replicating high expression vectors for mammalian cells.
8

The present invention employs bioinformatics programs that contains methods and algorithms to calculate the nuclear scaffold / matrix binding potential of the human intergenic sequences and their fragments using DNA motif enrichment and significance values.
The present invention relates to using DNA motifs, listed in TABLE 1 and 2, known to be associated with sequences modulating expression levels and also use eukaryotic exon sequences as sequences that do not contain any S/MAR features. TABLE 1: DNA Patterns and motifs that are known to positively regulate gene expression levels.

Motif name Pattern References
Core unwinding motifs (CUEs) ATATTT / ATATAT / AATATATTT / AATATATTAATATT 2.3,4
HMG- lA^ protein binding sites TATTATATAA / TAATAAAATTTT 2,37
H- box (A/T 25) [ATC]{25,} 5
T-Box TT[AT]T[AT]TT[AT]TT 3.2
A-Box AATAAA[TC]AAA 3.2
Topoisomerase II binding sites [AG][ATGC][TC][ATGC][ATGC]C[ATGC][ATGC]G [TC][ATGC]G[GT]T[ATGC][TC][ATGC][TC] / GT[ATGC][AT]A[CT]ATT[ATGC]AT[ATGC][ATGC ][AG] 2,3,6
Origin of replication ATTA/ATTTA 1,2
CTAT repeats-binding proteins regions CTAT 2
Y-box CCAAT 2
MAR recognition signature AATAA[TC]AA and A[AT][AT][AG]TAA[ATGC][ATGC][AT][AT]G[AT GC][ATGC][ATGC]C within 200 bp 2
SAF-A binding region [A{3,}/T{3,} pattern] A{4,}|T{4.}{1,35}A{4,}|T{4.} 9
Arabidopsis S/MARs TA[AT]A[AT] [AT] [AT] [ATGC] [ATGC]A[AT] [AT][AG]TAA[ATGC] [ATGC] [AT] [AT]G 6

TABLE 2: DNA motifs that are enriched in known S/MAR sequences

Motif name Pattern References
Core unwinding motifs (CUES) ATATTT / ATATAT / AATATATTT / AATATATTAATATT 2,3.4
HMG- VY protein binding sites TATTATATAA / TAATAAAATTTT 2,37
H- box (A/r 25) [ATC]{25, } 5
Origin of replication ATTA/ATTTA 1.2
SAF-A binding region [A{3,}/T{3,} pattern] A{4,}|T{4,}{1,35}A{4,}|T{4,} 9
CTAT repeats-binding proteins regions CTAT 2
T-Box TT[AT]T[AT]TT[AT]TT 3,2
A-Box AATAAA[TC]AAA 3,2
Topoisomerase 11 binding sites [AG][ATGC][TC][ATGC][ATGC]C[ATGC][ATGC]G [TC][ATGC]G[GT]T[ATGC][TC][ATGC3[TC] /GT[ATGC][AT]A[CT]ATT[ATGC]AT[ATGC][ATG C][AG] 2,3.6
A DNA sequence whose binding potential to the nuclear scaffold / matrix is unknown, and which is to be identified, is hereinafter referred to as a test sequence.
This method relies on calculating the enrichment levels of the DNA motifs, listed in TABLE 1, in known S/MAR sequences as compared to eukaryotic exon sequences and calculation of enrichment level with that of the enrichment levels of the DNA motifs in fiill-length human intergenic sequences and their fragments and thus arriving at the significance values.
For the intergenic sequences, the S/MAR potential for the fiiU-length sequence, 400 base fragments and 800 base fragments has been calculated. The 400 base fragments were generated with an overlapping window of 200 base starting from the upstream gene side and 800 base fragments were generated with an overlapping window of 400 base starting from the upstream gene side. The method as disclosed herein can generate fragments of any length from a given sequence and with any amount of overiaps among the
10

fragments. A genome-wide identification of the S/MAR sequences in the human genome has been done using the above approach.
DNA motifs that are known to be involved in modulation of gene expression levels are listed in TABLE 1 and TABLE 4. The enrichment levels of these motifs have been calculated in known S/MAR sequences, eukaryotic exon sequences and in full-length human intergenic sequences and their fragments. All the DNA motifs listed in TABLE 1 are known to enhance the gene expression levels and hence has been treated as positively regulating motifs while the motifs such as SATBl, CDP and ARBP/MeCP2, listed in TABLE 4 are known to repress the gene expression hence has been referred to as negatively regulating motifs. Identification of CpG islands is done using the EMBOSS CpGplot program.
TABLE 4: DNA Patterns and motifs that are known to negatively regulate gene expression levels.

SATBl binding site TATTA[GCA]{l,2}TAATAA / AA[TA]TTCTAATAT 10
CDP binding sites AT[CT]GAT[TCA]A[ATGC][T/C] / [CT]GAT[TCA]A[ATGC][TC] 11,12,13
ARBP/MeCP2 binding regions GGTGT 14,15
Algorithm for identifying S/MAR sequences
The process to identify S/MAR sequences comprises of four stages
A. Determining enrichment of DNA motifs, listed in TABLE 1 and TABLE 4, in known S/MAR
sequences
B. Determining enrichment of DNA motifs, listed in TABLE 1 and TABLE 4, and their significance
values in intergenic sequences and their fragments
C. Initial selection of potential S/MAR sequences
D. Selecting the best S/MAR sequences
11

A. Determining enrichment of DNA motifs in Imown S/MAR sequences
GO ontology binding proteins help to maintain the overall size and shape of the nucleus. Hence, the genes for these proteins need to be constitutively expressed. Also, the transcription regulating genes need to be modulated according to the stage and type of the cell. Since S/MARs are sequences that are known for both stable expression and also regulating the expression levels of genes, the upstream intergenic regions of the genes coding for binding proteins and transcription regulating proteins is expected to be enriched with S/MAR motifs. Hence we have analyzed these upstream sequences for the presence of specific S/MAR motifs and found their enrichment in relation to exon sequences that is not expected to contain any S/MAR motifs as S/MARs are seen only in the non-coding region of the genome.
The method to calculate the enrichment of each of the DNA motifs associated with sequences known to modulate expression levels are as follows:
Calculating counts for each of the DNA motifs in sequences known to bind nuclear scaffold / matrix is
i. Obtain DNA sequences known to bind nuclear scaffold / matrix (Total length of all
sequences = SL) ii. For each DNA motif. Mi, where i = 1 to n, scan each known S/MAR sequence and get the
count for the motif, Mi, in that sequence, iii. Total count for the motif Mi in all the sequence known to bind nuclear scaffold / matrix is, SMi= Y, CMi of all sequences.
Calculating counts for each of the DNA motifs in eukaryotic exon sequences is
iv. Obtain human exon sequences. The total length of the exon sequences (NSL) should be
equal to SL, the total length of all sequences known to bind nuclear scaffold / matrix. V. For each DNA motif, Mi, where i = 1 to n, scan each of the exon sequence and get the
count for the motif. Mi, in that sequence, vi. Total count for the motif Mi in human exon sequences of length NSL is, NSMi = ^ CMi of
all human exon sequences considered.
Enrichment of each DNA motif Mi in DNA sequences known to bind nuclear scaffold / matrix is ESi = SMi/NSMi
12

C. Initial selection of potential S/MAR sequences
Since three of the 12 positively regulating motifs are in fewer counts in known S/MAR sequences, only the following 9 DNA motifs are used for selecting the potential S/MAR sequences from the human full-length intergenic sequences and their fragments. They are CUEs, HMG- VY protein binding sites, H-box, T-box, A-box, Topoisomerase II binding sites. Origin of replication, CTAT repeats-binding proteins regions and SAF-A binding region.
Also, among these 9 DNA motifs, we have observed in many known S/MAR sequences that, the T-box, A-box, Topoisomerase II binding sites appear mutually exclusively. Hence these three DNA motifs have been clubbed together as a group motif and the criteria is that any one of the three motifs has to be present. Hence from the above, there are totally 7 DNA motifs, 6 individual motifs and 1 group motif The 6 individual motifs are listed in TABLE 2 and the mutually exclusive motifs are listed in TABLE 3.
TABLE 3: DNA motifs that appear mutually exclusively.

Motif name Pattern References
T-Box TT[AT]T[AT]TT[AT]TT 3,2
A-Box AATAAA[TC]AAA 3,2
Topoisomerase II binding sites [AG][ATGC][TC][ATGC][ATGC]C[ATGC][ATGC]G[TC][ ATGC]G[GT]T[ATGC][TC][ATGC][TC] / GT[ATGC][AT]A[CT]ATT[ATGC]AT[ATGC][ATGC][AG] 2.3,6
The criterion to select potential S/MAR sequences is that, any full-length intergenic sequences or its fragments should have a significance value >= 0.9 for atleast 6 of the above 7 positively regulating motifs.
D. Selecting the best S/MAR sequences
Of the sequences and fragments that have passed the initial selection criteria, the following filters are
applied in the order as they are given below to get the best S/MAR sequences, i. Sequences and fragments that have AT% => 89
ii. Sequences and fragments that have value < 0.9 for all the three negatively regulating motifs iii. Sequences and fragments that do NOT have values below 0.9 for SAF-A binding region or HMG- I/Y protein binding sites
13

Significance of the motifs
HMG- VY protein binding sites - The cooperative binding of the high mobility group protein HMG-ITY, which associates with histone acetyltransferases such as the general co-activator p300, may lead to the acetylation of nucleosome core histones and to the displacement of histone HI. In this respect, it is interesting to note that the cLysMAR has nucleosome positioning sequences and that it is bordered by HMG-LY potential binding sites.
SAF-A binding region: Tracts of homopolymeric stretches (n3) of adenines (A-patch) and thymines (T-patch) are a characteristic landmark for MARs.
Statistics for genome-wide analysis of S/MAR sequences in human genome
The statistics for the count of human intergenic sequences and fragments generated and also the final
count of sequences that have passed all the selection criteria are as follows:
Human intergenic sequences taken for analysis (on 24 chromosomes): 31053
400 and 800 base fragments generated from the above intergenic sequences: 39636640
Best S/MAR sequences with 75% AT content and of 400 and 800 base lengths: 2951
Coverage of tlie intergenic sequence fragments by the DNA motifs
For the best S/MAR sequences with 75% AT content and of 400-base and 800-base lengths, the percentage of coverage of these sequences by the DNA motifs listed in TABLE 2 and TABLE 3 are listed in TABLE 5 and TABLE 6.
TABLE 5: Coverage of the identified 400-base length S/MAR sequences by positively regulating DNA motifs

Coverage of the 400 base S/MAR sequences by the positively regulating DNA motifs Coverage %
Maximum 100
Minimum 26.40
Average 62.43
14

TABLE 6: Coverage of the identified 800-base length S/MAR sequences by positively regulating DNA motifs

Coverage of the 800 base S/MAR sequences by the positively regulating DNA motifs Coverage %
Maximum 100
Minimum 38.75
Average 63.55
The maximum and minimum percentage of coverage of the identified 400-base S/MAR sequences by the DNA motifs is 100 and 26.4 respectively. The average coverage percentage for 2151 400-base sequences is 62.43. FIG 1 shows the percentage of sequence coverage by DNA motifs in 400 base lengths S/MAR sequences arranged in descending order of the coverage percentage.
The maximum and minimum percentage of coverage of the identified 800-base S/MAR sequences by the DNA motifs is 100 and 38.75 respectively. The average coverage percentage for 800 800-base sequences is 63.55. FIG 2 shows the percentage of sequence coverage by DNA motifs in 800 base lengths S/MAR sequences arranged in descending order of the coverage percentage.
EXAMPLES Example 1
Determining enrichment of DNA motifs, listed in TABLE 1 and TABLE 4, in known S/MAR sequences
Obtaining knowledge from known S/MAR sequences
• Get experimentally proved vertebrate S/MAR sequences from NCBI,
• Calculate the total length of the S/MAR sequences.
• Calculate the occurrence of each of the motifs in each of the sequence and tabulate them.
• For a particular motif, get the total number of times it is appearing in all the sequences.
Lets for example, say that the S/MARl, S/MAR2 S/MAR3, S/MAR4 and S/MAR5 are knovra S/MAR sequences with the total length 10 KB. And the motifs 1,2,3 and 4 in them are as given below.
15

Seq Motif 1 Motif 2 Motifs Motif 4
S/MARl 3 6 3 1
S/MAR2 5 2 6 4
S/MAR3 1 0 3 2
S/MAR4 8 4 3 0
S/MAR5 4 3 8 2
Total 21 15 23 9
Obtaining knowledge from Non-S/MAR sequences
Get exon sequences such that the total length of the exons taken equal the total length of S/MAR
sequences considered above.
Calculate the occurrence of each of the motifs in each of the sequence and tabulate them.
For a particular motif, get the total number of times it is appearing in all the sequences.
Lets for example, say that the Non-S/MARl, Non-S/MAR2, Non-S/MAR3, Non-S/MAR4 and Non-S/MAR5 are exon sequences with the total length 10 KB. And the motifs 1, 2, 3 and 4 in them are as given below.

Seq Motif 1 Motif2 Motif 3 Motif 4
Non-S/MARl 1 0 2 1
Non-S/MAR2 0 1 3 0
Non-S/MAR3 1 2 1 1
Non-S/MAR4 2 0 0 0
Non-S/MAR5 2 1 3 0
Total 6 4 8 2
Lets say that the length of sequences considered for S/MAR and exon sequences (non-S/MAR sequences) are 10,000 bp long. Since the length of sequences considered is the same, dividing the number of times a motif appears in S/MAR sequences by number of times the same motif appears in exon sequences (non-S/MAR sequences), gives the number of times a motif is enriched in S/MAR sequences than in non-S/MAR or exon sequences.
16

So in the above, the number of times each of the motif is enriched in S/MAR sequences when compared
to non-S/MAR sequences is,
Motif 1=21/6 = 3.5
Motif2= 15/4 = 3.75
Motif 3 = 23/8 = 2.875
Motif4 = 9/2 = 4.5
So, motifs 1, 2, 3 and 4 are likely to be represented 3.5, 3.75, 2.875 and 4.5 times more likely to be present in S/MAR sequences than non-S/MAR or exon sequences. So any sequence that contains any of the motifs at or above these thresholds is a potential candidate to be a S/MAR sequence.
Example 2
Determining enrichment of DNA motifs, listed in TABLE 1 and TABLE 4, and their significance values in intergenic sequences and their fragments
We take the eukaryotic intergenic sequences and calculate the occurrence of each of the motifs in these sequences. For each sequence, we calculate the motif occurrences by three ways:
• Complete intergenic sequence
• For each of the intergenic sequence, starting from the upstream gene side, make 400 bases fragments, with an overlapping window of 200 bases
• For each of the intergenic sequence, starting from the upstream gene side, make 800 bases fragments, with an overlapping window of 400 bases
The number of times that the motifs are appearing in the intergenic sequence or fragment will be normalized to the total length of the exon sequences taken (same as the total length of the S/MAR sequences taken) so that we have the count of motifs in equal lengths of the test sequence, the exon sequences and known S/MAR sequences. The enrichment values for each of the DNA motifs in the test sequence is calculated by comparing the counts of that motif in the test sequence with the count of that motif in the exon sequences. Finally, the significance of the sequence is checked by comparing the enrichment level of each motif in the test sequence with the enrichment level of that motif in known S/MAR sequences.
17

For the intergenic sequences, we have calculated the S/MAR potential for the full-length sequence. And, to find a region in this complete sequence that can be a S/MAR sequence, we calculate the enrichment of each the motifs in the fi-agments of different lengths with specific overlaps. We consider overlapping sequences because if we just split the sequences for a specific length without overlaps, there could be a potential S/MAR sequence that could be split between two fi-agments.
Lets say we have the following sequence, where the S/MAR region is highlighted as 'S/MARsequence'
S/MARsequence
Now, if we just split the sequence at specified lengths, it could be split as
S/MARs equence
Since S/MARs are known to have a minimum sequence length of 200 to 300 base pairs, we started with 400 base fi-agments. We generated 400 and 800 base fi-agments. The 400 base fi-agments were generated with an overlapping window of 200 bases starting fi-om the upstream gene side. The 800 base fi-agments were generated with an overlapping window of 400 bases starting fixjm the upstream gene side. Our bioinformatics programs can generate fi-agments of any length fi-om a given sequence and with any amount of overlaps among the fi-agments. And our methodology can be applied to a DNA sequence of any length.
Lets take a 2.0 KB sequence. This sequence is analyzed as.
Complete intergenic sequence:
400 base fi-agments with 200 base overlaps
18

800 base fragments with 400 base overlaps
Calculating the occurrence of each of the motifs in the complete sequence and the 400 and 800 base fragments

Sequence Motifl Motif 2 Motifs Motif 4

Complete 6 2 3 4

400 base fragments
Impart 1 0 0 1
2"" part 0 0 1 0
3"* part 2 1 1 0
4* part 1 0 0 1
5* part 2 I 1 2

800 base fragments
1" overlap 1 0 1 1
2"" overlap 2 1 2 0
3"* overlap 3 1 1 1
4* overlap 3 2 1 3
19

Motif enrichment in the complete sequence:
Motif 1 is appearing 6 times in 2kb. Therefore for a lOkb length, it will appear 30 times. So the
enrichment of the number of motif 1 in this sequence when compared to non-MAR sequence is
30/6 = 5 [Note: 6 is the number of times motif 1 is appearing in non-S/MAR sequence for 10 KB, from
Example 1]
Likewise, motifs 2, 3 and 4 appear with an enrichment of 2.5,1.875 and 10 respectively.
Note: The enrichment for motifs 1 - 4 in known S/MAR sequences is 3.5, 3.75, 2.875 and 4.5 times respectively. Here motifs 1 and 4 are enriched in the test sequence more than in known S/MAR sequences.
Calculating the significance values for each motif:
The significance of the enrichment values for each of the DNA motifs in the test sequence is calculated by dividing the enrichment value of that DNA motif in the test sequence divided by the enrichment value of that DNA motif in known S/MAR sequences.
From the above, the significance values for motifs 1 - 4 in the complete sequence is Significance value for motif 1 is 5/3.5 = 1.42 Significance value for motif 1 is 2.5/3.75 = 0.66 Significance value for motif 1 is 1.875/2.875 = 0.65 Significance value for motif 1 is 10/4.5 = 2.22
Motif enrichment in 400 base fragments:
For the first 400 base fragments, motif 1 is appearing 1 time. So when it is normalized to 10KB, it will contain
10000/400 * 1 = 25 times.
Likewise, the 1" 400 base sequences will contain the motifs 2,3 and 4,0,0 and 25 times respectively.
20

The complete TABLE for all the 400 bp fragments is given below

Fragment Motif 1 Motif 2 Motif 3 Motif 4
r'part 25 0 0 25
2"" part 0 0 25 0
3"* part 50 25 25 0
4* part 25 0 0 25
5" part 50 25 25 50
A 10 KB non-MAR fragment has 6, 4, 8 and 2 times of motifs 1,2,3 and 4 respectively, from Example 1.

Fragment Motif 1 enrichment Motif 2 enrichment Motif 3 enrichment Motif 4 enrichment
l"part 4.16 0 0 12.5
2"" part 0 0 3.125 0
3'" part 8.3 6.25 3.125 0
4* part 4.16 0 0 12.5
5" part 8.3 6.25 3.125 25
The enrichment for motifs 1-4 calculated from knoAvn sequences is 3.5, 3.75, 2.875 and 4.5 times respectively.
Motif enrichment in 800 base fragments
For the first 800 bp fragment, motif 1 is appearing 1 time. So when it is normalized to 10KB (length of
exon sequences taken), it will contain
10000/800* 1 = 12.5 times
Likewise, the 1" 800 base sequence will contain the motifs 2,3 and 4,0,12.5 and 12.5 times respectively.
The complete TABLE for all the 800 base fragments is given below

Fragment Motif 1 Motif 2 Motif 3 Motif 4
1" overlap 12.5 0 12.5 12.5
2"" overiap 25 12.5 25 0
3"" overiap 37.5 12.5 12.5 12.5
4* overlap 37.5 25 12.5 37.5
21

AlO KB non-MAR fragment has 6,4, 8 and 2 times of motifs 1,2,3 and 4 respectively, from Example 1.

Fragment Motif 1 enrichment Motif 2 enrichment Motif 3 enrichment Motif 4 enrichment
r' overlap 2.08 0 1.5625 6.25
2'"'overiap 4.16 3.125 3.125 0
3"'overiap 6.25 3.125 1.5625 6.25
4* overlap 6.25 6.25 1.5625 18.75
The enrichment for motifs 1 - 4 calculated from known sequences is 3.5, 3.75, 2.875 and 4.5 times respectively.
Calculating the significance values for each motif:
The significance of the enrichment values for each of the DNA motifs in each of the fragments is calculated in the same way as was done for the complete sequence, by dividing the enrichment value of that DNA motif in the fragment divided by the enrichment value of that DNA motif in known S/MAR sequences.
REFERENCES:
1. Bode et al, Transcriptional Augmentation; Modulation of Gene Expression by Scaffold/Matrix Attached Regions (S/MAR Elements), Crit Rev Eukaryot Gene Expr. 2000; 10(1): 73-90
2. Pierre-Alain Girod et al., Use of scaffold/matrix-attachment regions for protein production. Chapter 10, S.C. Makrides (Ed.) Gene Transfer and Expression in Mammalian Cells
3. Liebich et al, Evaluation of sequence motifs found in scaffold/matrix-attached regions (S/MARs), Nucleic Acids Res. 2002 August 1; 30(15): 3433-3442
4. Frischet et al., In Silico Prediction of Scaffold/Matrix Attachment Regions in Large Genomic Sequences, Genome Research 12:349-354
5. Pierre Rollini et al. Identification and characterization of nuclear matrix-attachment regions in the human serpin gene cluster at 14q32.1, Nucleic Acids Research 27:3779-3791 (1999)
6. Cornells M. van Drunen et al, Analysis of the chromatin domain organisation around the plastocyanin gene reveals an MAR-specific sequence element in Arabidopsis thaliana, Nucleic Acids Research, 1997, Vol. 25, No. 19
22

7. Loc Phi-Van and Wolf H.Stratling, The matrix attachment regions of the chicken lysozyme gene co-map with the boundaries of the chromatin domain, The EMBO Journal 1988, vol.7 no.3 pp.65 5-664
8. Bode et al, Transcription-promoting genomic sites in mammalia: their elucidation and architectural principles, Gene Ther Mol Biol Vol 1, 551-580. March 1998
9. Ken Tsutsui, Synthetic concatemers as artificial MAR: importance of a particular configuration of short AT-tracts for protein recognition, Gene Ther Mol Biol Vol 1, 581-590. March, 1998
10. Prabhat Kumar Purbey et al, PDZ domain-mediated dimerization and homeodomain-directed specificity are required for high-affinity DNA binding by SATBl, Nucleic Acids Research, 2008, Vol. 36, No. 7 2107-2122
11. Andres et al, A new bipartite DNA-binding domain: cooperative interaction between the cut repeat and homeo domain of the cut homeo proteins, Genes Dev. 1994 8:245-257
12. Scott Pattison et al, CCAAT Displacement Protein, a Regulator of Differentiation-Specific Gene Expression, Binds a Negative Regulatory Element within the 59 End of tiie Human Papillomavirus Type 6 Long Control Region, J. Virology, Mar. 1997, p. 2013-2022
13. Nongnit Teerawatanasuk et al., CCAAT Displacement Protein (CDP/Cut) Binds a Negative Regulatory Element in the Human Tryptophan Hydroxylase Gene, J. Neurochem. Vol. 72, No. 1, 1999
14. Buhrmester et al. Chicken MAR-Binding Protein ARBP Is Homologous to Rat Methyl-CpG-Binding Protein MeCP2, Molecular and Cellular Biology, Sept. 1997, Vol. 17, No. 9
15. Schubeler D, Mielke C, Maass K, Bode J, Scaffold/matrix-attached regions act upon transcription in a context-dependent manner, Biochemistry. 1996 Aug 27; 35(34): 11160-9
16. Amelia K. Linnemann, Adrian E. Platts and Stephen A. Krawetz, Differential nuclear scaffold/matrix attachment marks expressed genes. Human Molecular Genetics, 2009, Vol. 18, No. 4 645-654
17. Mielke C, Maass K, TUmmler M, Bode J., Anatomy of highly expressing chromosomal sites targeted by retroviral vectors, Biochemistry. 1996 Feb 20; 35(7): 2239-52
18. Gautam B. Singh, Jeffrey A. Kramer and Stephen A. Krawetz, Mathematical model to predict regions of chromatin attachment to the nuclear matrix. Nucleic Acids Research, 1997, Vol. 25, No. 7 1419-1425
19. Dickinson LA, Joh T, Kohwi Y, Kohwi-Shigematsu T„ A tissue-specific MAR/SAR DNA-binding protein with imusual binding site recognition. Cell. 1992 Aug 21; 70(4): 631-45
23

20. CM van Drunen, RG Sewalt, RW Oosterling, PJ Weisbeek, SC Smeekens and R van Driel, A bipartite sequence element associated with matrix/scaffold attachment regions, Nucleic Acids Research, Vol 27, Issue 14 2924-2930
21. Stephen Rudd, Matthias Frisch, Korbinian Grote, Blake C. Meyers, Klaus Mayer, and Thomas Werner, Genome-Wide in Silico Mapping of Scaffold/Matrix Attachment Regions in Arabidopsis Suggests Correlation of Intragenic Scaffold/Matrix Attachment Regions with Gene Expression, Plant Physiology, June 2004, Vol. 135, pp. 715-722
22. Mark H. Kaplan, Rui-Ting Zong, Richard F. Herrscher, Richard H. Scheuermann, and Philip W. Tucker, Transcriptional Activation by a Matrix Associating Region-binding Protein, The Journal of Biological hemistry, Vol. 276, No. 24, Issue of June 15, pp. 21325-21330,2001
23. Luis a. Fema'ndez, Michael Winkler, and Rudolf grosschedl. Matrix Attachment Region-Dependent Function of the Immunoglobulin m Enhancer Involves Histone Acetylation at a Distance without Changes in Enhancer Occupancy, Molecular and Cellular Biology, January 2001, Vol. 21, No. l,p. 196-208
24. Joost H. A. Martens, Matty Verlaan, Eric Kalkhoven, Josephine C. Dorsman, and Alt Zantema, Scaffold/Matrix Attachment Region Elements Interact with a p300-Scaffold
25. Buhrmester et al. Nuclear matrix protein ARBP recognizes a novel DNA sequence motif with high affinity, Biochemistry 1995 Mar 28;34(12):4108-117
26. Buhrmester et al, Attachment Factor A Complex and Are Bound by Acetylated Nucleosomes, Molecular and Cellular Biology, Apr. 2002, p. 2598-2606
27. Matthias C. Huber, Gudrun Krllger and Constanze Bonifer, Genomic position effects lead to an inefficient reorganization of nucleosomes in the 54-reg;ulatory region of the chicken lysozyme locus in transgenic mice. Nucleic Acids Research, 1996, Vol. 24, No. 8 1443-1452
28. US patent application publication US 2007/0178469 Al - High efficiency gene transfer and expression in mammalian cells by a multiple transfection procedure for MAR sequences.
29. Ian de Belle, Shutao Cai, and Terumi Kohwi-Shigematsu, The Genomic Sequences Boxmd to Special AT-rich Sequence-binding Protein 1 (SATBl) In Vivo in Jurkat T Cells Are Tightly Associated with the Nuclear Matrix at the Bases of the Chromatin Loops, The Journal of Cell Biology, Volume 141, Number 2, April 20,1998 335-348
Web site references
SMARTest: http://www.genomatix.de/online help/help gems/SMARTest.html
S/MARt DB: http://www.bioinfo.de/isb/gcb99/poster/liebich/
http://smartdb.bioinfmed.uni-goettingen.de/cgi-bin/SMARtDB/smar.cgi
24

We Claim:
1. An identified, purified and isolated DNA sequence that contains
a. Atleast 6 of the 7 positively regulating motifs listed in TABLE 1, in any combination
and each of the DNA motifs appearing any number of times
b. Where the positively regulating DNA motifs considered for initial selection are
CUEs, HMG- I/Y protein binding sites, H-box, Origin of replication, CTAT repeats-
binding proteins regions and SAF-A binding region and any one of the three DNA
motifs, T-box, A-box and Topoisomerase 11 binding sites.
c. Atleast 74 percentage of the sequence is comprised of the nucleotides A and T.
2. An isolated nucleic acid comprising one or more sequences selected from the group consisting of SEQ ID NO: 1-100, complements, variants, and functional fragments thereof and sequences being at least 70% homologous thereto.
3. The isolated nucleic acid of claim 2, wherein the one or more sequences comprise S/MAR sequences.
4. The isolated nucleic acid of claim 3, wherein the one or more S/MAR sequences increase expression of a biomolecule when said sequences are used in an expression system.
5. The isolated nucleic acid of claim 3, wherein the one or more S/MAR sequences increase
expression of the biomolecule in both orientations of said sequences in the expression
system.
6. The isolated nucleic acid of claim 3, wherein the one or more S/MAR sequences contain
one or more nucleotide sequence motifs.
7. The isolated nucleic acid of claim 3, wherein the one or more nucleotide sequence motifs includes at least one AT-rich nucleotide motif
8. A method for constructing an expression vector having increased expression efficiency, the method comprising inserting the isolated nucleic acid of claim 1 into an expression vector.

9. The method according to claim 8, wherein the expression vector is a mammalian expression system or any other expression system including vectors, cell lines, etc.
10. A vector comprising the isolated nucleic acid of claim 2.
11. A method for producing a recombinant host cell, the method comprising infroducing the isolated nucleic acid of claim 1 or the expression vector of claim 9 into a host cell.
12. The method according to claim 11, wherein the isolated nucleic acid or the expression
vector is introduced by way of transfection.
54

13. The method according to claim 11, wherein the isolated nucleic acid gets integrated with the genome of the recombinant host cell upon transfection.
14. A host cell produced according to the method of claim 11.
15. The host cell according to claims 14, wherein said host cell is a eukaryotic cell.
16. The host cell according to claims 14, wherein said host cell is a mammalian cell.
17. An expression vector comprising a nucleic acid molecule that comprises (a) a sequence encoding a protein operably linked to one or more expression control elements and (b) and one or more S/MAR sequences selected from the group consisting of SEQ ID NO: 1-100, complements, variants, and functional fragments thereof and sequences being at least 70% homologous thereto.
18. The expression vector of claim 10, wherein the one or more expression control elements comprise at least one of transcriptional promoter, transcriptional enhancer, transcriptional termination sequence, transcriptional repressor, polyadenylation site, origin of replication site, translation initiation signal and translation termination signal.
19. The expression vector of claim 18, wherein the one or more S/MAR sequences are located upstream of the transcriptional promoter.
20. The expression vector of claim 18, wherein the one or more S/MAR sequences are located downstream of the transcriptional promoter.
21. The expression vector of claim 18, wherein the one or more S/MAR sequences are located downstream of the translation termination signal.
22. The vector of claim 18, wherein the one or more S/MAR sequences are located upstream of the transcriptional promoter and downstream of the translation termination signal.

23. The vector of claim 18, wherein the one or more S/MAR sequences are located downstream of the transcriptional promoter and of the translation termination signal.
24. The expression vector of claim 18, wherein the one or more S/MAR sequences are located at a distance of 0 to 10 KB from the sequence encoding biomolecule.
25. The expression vector of claim 18, wherein the one or more S/MAR sequences are located at a distance of 0 to 10 KB from the origin of replication site.
26. A method for producing a protein, the method comprising the steps of (a) transfecting a mammalian cell with an expression vector comprising (I) a sequence encoding the protein and (II) one or more S/MAR sequences selected from the group consisting of SEQ ID NO: 1-100, complements, variants and functional fragments thereof and sequences being at least
55

70% homologous thereto; (b) culturing the transfected mammalian cell under conditions suitable for expression of the protein; and (c) isolating the expressed protein.
27. A factor which influences the activity of one or more S/MAR sequences, wherein the one or more S/MAR sequences comprises one or more sequences selected from the group consisting of SEQ ID NO: 1-100, complements, variants, and functional fragments thereof and sequences being at least 70% homologous thereto.
28. The factor of claim 27, wherein said factor is at least one of a genetic factor, or an epigenetic factor.
29. A method for identification of S/MAR sequences using bioinformatics programs comprising computing values for each of the DNA motifs listed in TABLE 1 and TABLE 4.
30. The method of claim 29, wherein said bioinformatics programs contains algorithms to calculate the nuclear scaffold / matrix binding potential of the intergenic sequence and their fragments.
31. The method of claim 30, wherein said algorithm is based on
a. Scanning for the presence of each of the DNA motifs listed in TABLE 1 and TABLE 4,
in DNA sequences known to bind nuclear scaffold / matrix region, human exon
sequences and intergenic sequence or their fragments. A region of the sequence might
contain only of one DNA motif listed in TABLE 1 and TABLE 4 or more than one
DNA motif may share the same region of the sequence.
b. Computing enrichment values for each of the DNA motifs listed in TABLE 1 and
TABLE 4, in DNA sequences ioiown to bind nuclear scaffold / matrix in comparison to
human or any eukaryotic exon sequences.
c. Computing enrichment values for each of the DNA motifs listed in TABLE 1 and
TABLE 4, in intergenic sequence or their fragments in comparison to human or any
eukaryotic exon sequences.
d. Calculating the significance of the enrichment of each of the DNA motifs in the
intergenic sequence or their fragments by comparing the enrichment values of each of
the DNA motifs listed in TABLE 1 and TABLE 4, between the intergenic sequences
and fragments and the known S/MAR sequences.
32. The DNA motifs of claim 30 are divided into two categories as, positively regulating
motifs and negative regulating motifs.
a. The positive regulating motifs are listed in TABLE 1
b. The negative regulating motifs are listed in TABLE 4
33. The method of claim 30, wherein the method to calculate the enrichment of each of the DNA motifs listed in TABLE 1 and TABLE 4 in DNA sequences known to bind nuclear scaffold / matrix is a. DNA sequences known to bind nuclear scaffold / matrix (Total length = SL)
56

b. Motif scan for each of the DNA motifs listed in TABLE 1 and TABLE 4, Mi
c. Total number of motif Mi in all the sequence, SMi= J CMi of all sequences.
d. Obtain human exon sequences (Total length = NSL, and NSL is equal to SL]
e. Motif scan for each DNA motifs listed in TABLE 1 and TABLE 4, Mi
f. Total number of each DNA motif Mi in human exon sequences, NSMi == X CMi of all
sequences
Enrichment for each motif Mi in DNA sequences known to bind nuclear scaffold / matrix is ESi = SMi/NSMi
34. The method of claim 30, wherein the method to calculate the enrichment of each of the
DNA motifs listed in TABLE 1 and TABLE 4, in intergenic sequence or its fragments is
a. Intergenic sequence (Length = TL)
b. For each motifMi,TMi = Count of motif
c. For each motifs Mi, Normalized count Ni = NSL ♦ TMi / TL
d. For each motifs Mi, Enrichment, Ei = Ni / NSMi
35. The method of claim 30, wherein the method to calculate the significance of each of the DNA motifs listed in TABLE 1 and TABLE 4, in the intergenic sequence or its fragments is For each DNA motif Mi, Significance, Si = Ei / ESi
36. The method of claim 30, wherein the intergenic sequence or fragment has a significance value >= 0.9 for all 6 of the positively regulating DNA motifs listed in TABLE 2 and atleast one of positively regulating DNA motif listed in TABLE 3.
37. The method of claim 30, wherein the intergenic sequence or fragment has significance value < 0.9 for all the three negatively regulating motifs listed in TABLE 4.
38. The method of claim 30, wherein the intergenic sequence or fragment does NOT have value below 0.9 for SAF-A binding region and HMG- W protein binding site DNA motifs.
39. The identified, purified and isolated DNA sequence of claim 1, with atleast 26.5 % of length covered by the DNA motifs listed in TABLE 2 and TABLE 3.
40. A method for identifying S/MAR sequences by using or by modifying any of the claims 30 - 39, either singly or in part or in combination thereof or variants.
41. A method of claim 30, wherein the method for identifying S/MAR sequences comprises using one or more eukaryotic exon sequences as non-S/MAR sequence, either independently or in combination.
42. A method of claim 30, wherein the method for identifying S/MAR sequences comprises using enrichment of any DNA motifs in intergenic sequences or its fragments in comparison to one or more eukaryotic exon sequences, either independently or in combination.

43. A method of claim 30, wherein the method for identifying S/MAR sequences comprises
using significance values obtained by comparing enrichment of DNA motifs in intergenic
sequences or its fragments with the enrichment of DNA motifs in known scaffold / matrix
binding sequences.
44. The method of claim 30, wherein the said bioinformatics programs contains algorithms
and methods to
a. Obtain overlapping DNA fragments of specified length
b. Scan for the presence of each of the DNA motifs mentioned in TABLE 1 and TABLE
4 in nucleotide sequences.
c. Scan each of the nucleotide sequences using one or more sequence patterns for each
of the DNA motifs mentioned in TABLE 1 and TABLE 4.
d. Compute the enrichment values and significance values of each of the DNA motifs
mentioned in TABLE 1 and TABLE 4 in each of the nucleotide sequences.
e. Calculate the percentages of different nucleotides in a nucleotide sequence.
45. The method of claim 32, which can be applied to any eukaryotic intergenic sequences
46. The method of claim 32, which can be applied to a DNA sequence of any length.
47. Claim any nucleic acid sequence combination of modified or unmodified nucleotide(s) and / or other chemical modifications that may be employed for binding to S/MAR sequence for regulating expression of the biomolecule.

Documents

Application Documents

# Name Date
1 3007-che-2009 abstract 07-12-2009.pdf 2009-12-07
1 3007-che-2009 form-5 07-12-2009.pdf 2009-12-07
2 3007-che-2009 claims 07-12-2009.pdf 2009-12-07
2 3007-che-2009 form-3 07-12-2009.pdf 2009-12-07
3 3007-che-2009 description(complete) 07-12-2009.pdf 2009-12-07
3 3007-che-2009 form-2 07-12-2009.pdf 2009-12-07
4 3007-che-2009 drawings 07-12-2009.pdf 2009-12-07
4 3007-che-2009 form-1 07-12-2009.pdf 2009-12-07
5 3007-che-2009 correspondence others 07-12-2009.pdf 2009-12-07
6 3007-che-2009 drawings 07-12-2009.pdf 2009-12-07
6 3007-che-2009 form-1 07-12-2009.pdf 2009-12-07
7 3007-che-2009 description(complete) 07-12-2009.pdf 2009-12-07
7 3007-che-2009 form-2 07-12-2009.pdf 2009-12-07
8 3007-che-2009 claims 07-12-2009.pdf 2009-12-07
8 3007-che-2009 form-3 07-12-2009.pdf 2009-12-07
9 3007-che-2009 abstract 07-12-2009.pdf 2009-12-07
9 3007-che-2009 form-5 07-12-2009.pdf 2009-12-07