Abstract: The present invention relates to a nucleotide sequence having SNP associated with virulence, infectivity and/or latency for all infectious diseases, more particularly tuberculosis. The present invention also includes method for the identification and selection of polymorphisms associated with the virulence and /or infectivity in infectious diseases more particularly in tuberculosis by a comparative genomic analysis of the sequences of different clinical isolates/strains of infectious organisms. Figs.l to 3
FIELD OF THE INVENTION
The instant invention pertains to novel single nucleotide polymorph(s) corresponding to gene sequences associated with the virulence in members of the genus Mycobacteria and particularly in members of the mycobacterial complex Mycobacterium tuberculosis. The invention also provides nucleotide fragment(s) corresponding to the genomic and/or coding regions of these genes, which comprise at least one polymorphic site per fragment. Further, the instant invention also includes method for the identification and selection of polymorphisms associated with the virulence and /or infectivity in infectious diseases by a comparative genomic analysis of the sequences of different clinical isolates/strains of infectious organisms. More particularly, the invention relates to 477 polymorphs for 46 spoligotyped strains. The invention further relates to diagnostic and therapeutic methods for applying these novel single nucleotide polymorph(s) to the diagnosis, treatment, and/or prevention of various diseases and/or disorders, particularly tuberculosis and other related symptoms. The invention also relates to identification of virulence factors of M. tuberculosis strains and other infectious organisms to be included in a diagnostic DNA chip allowing identification of the strain, typing of the strain and finally giving indication of its potential degree of virulence. The regions of polymorphisms, can also act as potential drug targets and vaccine targets.
BACKGROUND OF THE INVENTION
Tuberculosis (TB), a global epidemic, is still a major cause of death worldwide. There were an estimated 8.8 million new TB cases in 2005, 7.4 million in Asia and sub-Saharan Africa. A total of 1.6 million people died of TB, including 195 000 patients infected with HIV. TB prevalence and death rates have probably been falling globally for several years. In 2005, the TB incidence rate was stable or in decline in all six WHO regions, and had reached a peak worldwide. However, the total number of new TB cases was still rising slowly, because the caseload continued to grow in the African, Eastern Mediterranean and South-East Asia regions.
Tuberculosis is caused by infection with Mycobacterium tuberculosis wherein the tubercle bacilli are inhaled and then ingested by alveolar macrophages. Several species of Mycobacteria which are known to be pathogenic to humans and/or animals are non-
spore forming, rod-shaped, acid-fast, aerobic bacilli. Although the disease has worldwide presence and has been identified as global health problem, yet little is known about the molecular basis of tuberculosis pathogenesis. As is the case with most pathogens, infection with M tuberculosis does not always result in disease. The infection is often arrested by developing cell-mediated immunity (CMI) resulting in the formation of microscopic lesions, or tubercles, in the lung. If CMI does not limit the spread of M tuberculosis, caseous necrosis, bronchial wail erosion, and pulmonary cavitations may occur. The factors that determine whether infection with M. tuberculosis results in disease is not well understood and therefore, determining factors associated with their virulence is of prime importance.
The tuberculosis complex is a group of four mycobacterial species that are so closely related genetically that it has been proposed that they be combined into a single species. Three important members of the complex are Mycobacterium tuberculosis, the major cause of human tuberculosis; Mycobacterium africanum, a major cause of human tuberculosis in some populations; and Mycobacterium bovis, the cause of bovine tuberculosis. None of these mycobacteria is restricted to being pathogenic for a single host species. For example, M bovis causes tuberculosis in a wide range of animals including humans in which it causes a disease that is clinically indistinguishable from that caused by M tuberculosis. Human tuberculosis is a major cause of mortality throughout the world, particularly in less developed countries, while the Bovine tuberculosis, as well as causing a small percentage of these human cases, is a major cause of animal suffering and large economic costs in the animal industries.
These microbial pathogens are known to use a variety of complex strategies to subvert host cellular functions to ensure their multiplication and survival. Some pathogens that have co-evolved or have had a long-standing association with their hosts utilize finely tuned host-specific strategies to establish a pathogenic relationship. During infection, pathogens encounter different conditions, and respond by expressing virulence factors that are appropriate for the particular environment, host, or both.
The main focus in tackling this disease has been on making the best use of currently available tools for diagnosis, treatment and prevention of TB and the improved tools that are likely to become available in future through research to develop new diagnostics, drugs and vaccines. The goal of eliminating TB by 2050 depends on the
development of new diagnostics, drugs and vaccines. WHO is working in close collaboration with the Stop TB Partnership to enable and promote program-based operational research to develop new diagnostics, drugs and vaccines.
The limitations of the existing tools for diagnosis and treatment of TB - smear microscopy test and "short-course" chemotherapy, make standard TB care demanding for both patients and care providers. A truly effective vaccine is lacking. The need to rely on these tools has substantially hindered the pace of progress in global TB control.
Although antibiotics have been effective tools in treating infectious disease, the emergence of drug resistant pathogens is becoming problematic in the clinical setting. New antibiotic or antipathogenic molecules are therefore needed to combat such drug resistant pathogens. Accordingly, there is a need in the art for screening methods aimed not only at identifying and characterizing potential antipathogenic agents, but also for identifying and characterizing the virulence factors that enable pathogens to infect and debilitate their hosts.
Antibiotic treatment of tuberculosis is very expensive and requires prolonged administration of a combination of several anti-tuberculosis drugs. Treatment with single antibiotics is not advisable as tuberculosis organisms can develop resistance to the therapeutic levels of all antibiotics that are effective against them. Strains of M tuberculosis that are resistant to one or more anti-tuberculosis drugs are becoming more frequent and treatment of patients infected with such strains is expensive and difficult. In a small but increasing percentage of human tuberculosis cases the tuberculosis organisms have become resistant to the two most useful antibiotics, isoniazid and rifampicin. Treatment of these patients presents extreme difficulty and in practice is often unsuccessful. In the current situation there is clearly an urgent need to develop new methods for detecting virulent strains of mycobacteria and to develop tuberculosis therapies.
Multidrug resistance and human immunodeficiency virus (HIV-1) infections are factors, which have had a profound impact on the tuberculosis problem. An increase in the frequency of Mycobacterium tuberculosis strains resistant to one or more anti-mycobacterial agents has been reported, Block, et al., (1994). Immunocompromised HIV-1 infected patients not infected with M tuberculosis are frequently infected with M. avium complex (MAC) or M avium-M. intracellulare (MAI) complex. These
mycobacteria species are often resistant to the drugs used to treat M tuberculosis. These factors have re-emphasized the importance for the accurate determination of drug sensitivities and mycobacteria species identification.
In HIV-1 infected patients, the correct diagnosis of the mycobacterial disease is essential since treatment of M tuberculosis infections differs from that called for by other mycobacteria infections, Hoffner, S. E. (1994). Non-tuberculosis mycobacteria commonly associated with HIV-1 infections include M kansasii, M xenopi, M fortuitum, M. avium and M. intracellular, Wolinsky, E., (1992), Shafer, R. W. and Sierra, M. F. Additionally, 13% of new cases (HIV-1 infected and non-infected) of M tuberculosis are resistant to one of the primary anti-tuberculosis drugs (isoniazid [INH], rifampin [RIF], streptomycin [STR], ethambutol [EMB] and pyrazinamide [PZA] and 3.2% are resistant to both RIF and INH, Block, et al., (1994). Consequently, mycobacterial species identification and the determination of drug resistance have become central concerns during the diagnosis of mycobacterial diseases.
There is a recognized vaccine for tuberculosis, which is an attenuated form of M bovis known as BCG. This is very widely used but it provides incomplete protection. The development of BCG was completed in 1921 but the reason for its avirulence was and has continued to remain unknown. Methods of attenuating tuberculosis strains to produce a vaccine in a more rational way have been investigated but have not been successful for a variety of reasons. However, in view of the evidence that dead M bovis BCG was less effective in conferring immunity than live BCG, there exists a need for attenuated strains of mycobacteria that can be used in the preparation of vaccines.
A variety of compounds have been proposed as virulence factors for tuberculosis but, despite numerous investigations, good evidence to support these proposals is lacking. Nevertheless, the discovery of a virulence factor or factors for tuberculosis is very important and is an active area of current research. Such a discovery would not only enable the possible development of a new generation of tuberculosis vaccines but might also provide a target for the design or discovery of new or improved anti-tuberculosis drugs or therapies.
Present methods for the identification and characterization of mycobacteria in samples from human and animal diseases are by Zeil-Neilson staining, in-vitro and in vivo culture, biochemical testing and serological typing. These methods are generally slow
and do not readily discriminate between closely related mycobactenai strains ana species particularly, for example, Mycobacterium paratuberculosis and Mycobacterium avium. Mycobacteria are widespread in the environment, and rapid methods do not exist for the identification of specific pathogenic strains from amongst the many environmental strains, which are generally non-pathogenic. Difficulties with existing methods of mycobacterial identification and characterization have increased relevance for the analysis of microbial isolates from Crohn's disease (Regional Ileitis) in humans and Johne's disease in animals (particularly cattle, sheep and goats) as well as for M avium strains from AIDS patients with mycobacterial superinfections. Although recognition of the causative agents of human leprosy and tuberculosis are clear, clinico-pathological forms of each disease exist, such as the tuberculoid form of leprosy, in which mycobacterial tissue abundance is low and identification correspondingly difficult. Improvements in the specific recognition and characterization of mycobacteria may also increase in relevance if current evidence linking diseases such as rheumatoid arthritis to mycobacterial antigens is substantiated. Emerging drug resistance to mycobacteria including M avium isolates from AIDS patients, any Mycobacterium tuberculosis from TB patients is an increasing problem.
There is no data or technical information in the prior art, which permits to select specifically potential new targets and protective antigens for new drugs and vaccine compositions to treat and prevent infectious diseases, particularly tuberculosis. Furthermore, there is a need for the development of new tools for the selection of genes which encode for essential proteins or regulatory nucleotidic sequences in the survival or infecdon of mycobacterium species and useful for the design of anti-tuberculosis drugs and vaccines based on the knowledge of comparative mycobacterial genomics.
A method of using DNA probes for the precise identification of mycobacteria and discrimination between closely related mycobacterial strains and species by genotype characterization is essential. The method of genotypic analysis is further applicable to the rapid identification of phenotypic properties such as drug resistance and pathogenicity. The invention aids in fulfilling these needs in the art. The method according to the invention has the advantage to reduce drastically the number of potential new targets and protective antigens by giving for the first time an exhaustive description of conserved SNPs in the tuberculosis. The isolated polynucleotides
described in the present invention, which are highly conserved in genomic sequences of both virulent and avirulent, are by this characteristic essential for the survival or the virulence of these mycobacteria in the host. The identification of antigens and potentially therapeutic targets has been made by a method of comparative genomic analysis.
Methods used to detect, and to identify Mycobacterium species vary considerably. For detection of Mycobacterium tuberculosis, microscopic examination of acid-fast stained smears and cultures are still the methods of choice in most microbiological clinical laboratories. However, culture of clinical samples is hampered by the slow growth of mycobacteria. A mean time of four weeks is required before sufficient growth is obtained to enable detection and possible identification. Recently, two more rapid methods for culture have been developed involving a radiometric, Stager, C. E. et al., (1991) J. Clin. Microbiol. 29:154-157, and a biphasic (broth/agar) system Sewell, etal., (1993) J. Clin. Microbiol. 29:2689-2472. Once grown, cultured mycobacteria can be analyzed by lipid composition, the use of species specific antibodies, species specific DNA or RNA probes and PCR-based sequence analysis of 16S rRNA gene (Schirm, et al. (1995) J. Clin. Microbiol. 33:3221-3224; Kox, et al. (1995) J. Clin. Microbiol. 33:3225-3233) and IS6110 specific repetitive sequence analysis (For a review see, e.g., Small et al., P. M. and van Embden, J. D. A. (1994) Am. Society for Microbiology, pp. 569-582). The analysis of 16S rRNA sequences (RNA and DNA) has been the most informative molecular approach to identify Mycobacteria species (Jonas, et al., J. Clin. Microbiol. 31:2410-2416 (1993)). However, to obtain drug sensitivity information for the same isolate, additional protocols (culture) or alternative gene analysis is necessary.
To determine drug sensitivity information, culture methods are still the protocols of choice. Mycobacteria are judged to be resistant to particular drugs by use of either the standard proportional plate method or minimal inhibitory concentration (MIC) method. However, given the inherent lengthy times required by culture methods, approaches to determine drug sensitivity based on molecular genetics have been recently developed.
Diagnostic assays are frequently performed on samples removed from patients. Preferably, these samples are obtained in a minimally invasive manner, for example serum or urine samples. However, such assays can only provide information concerning the state of the marker in the particular sample. They are not able to provide
direct information concerning the exact location of metastases and/or the degree of tumor shrinkage, for example.
The ability to establish differences between DNA samples from two different sources or from the same source but under different developmental or environmental conditions is very important. Subtle differences in the genetic material can often yield valuable information, which can help understand physiological processes as well as can provide powerful techniques with wide applications. The approach has broad applications in areas such as forensic science, determination of predisposition of individuals to certain diseases, tissue typing, molecular taxonomy etc. DNA fingerprinting is already being used for a variety of purposes. Single nucleotide polymorphism (SNP) screening promises to be yet another powerful tool intended for use in some of these applications.
Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers). For the above reasons, it would be desirable to have simpler methods, which identify and characterize microorganisms, such as Mycobacteria, both at the phenotypic and genotypic level. This invention fulfills that and related needs.
PRIOR ART OF THE INVENTION
Patent application WO 02074903 describes a method of selection of purified nucleotidic sequences or polynucleotides encoding proteins or part of proteins carrying at least an essential function for the survival or the virulence of mycobacterium species by a comparative genomic analysis of the sequence of the genome of M tuberculosis aligned on the genome sequence of M leprae and M tuberculosis and M. leprae marker polypeptides of nucleotides encoding the polypeptides, and methods for using the nucleotides and the encoded polypeptides are disclosed.
us patent no. 6,228,575 provides oligonucleotide based arrays and methods for speciating and phenotyping organisms, for example, using oligonucleotide sequences based on the Mycobacterium tuberculosis, rpoB gene. The groups or species to which an organism belongs may be determined by comparing hybridization patterns of target nucleic acid from the organism to hybridization patterns in a database.
Patent application no. W09954487 and US patent no.6,492,506 describes a method for isolating a polynucleotide of interest that is present or is expressed in a genome of a first mycobacterium strain and that is absent or altered in a genome of a second mycobacterium strain which is different from the first mycobacterium strain using a bacterial artificial chromosome (BAC) vector. This invention further relates to a polynucleotide isolated by this method and recombinant BAC vector used in this method. In addition the present invention comprises method and kit for detecting the presence of mycobacteria in a biological sample.
US patent no. 5,783,386 describes polynucleotides associated with virulence in mycobacteria, and particularly a fragment of DNA isolated from M bovis that contains a region encoding a putative sigma factor. Also provided are methods for a DNA sequence or sequences associated with virulence determinants in mycobacteria, and particularly in M. tuberculosis and M. bovis. In addition, the invention provides a method for producing strains with altered virulence or other properties, which can themselves be used to identify and manipulate individual genes.
US patent no. 5,955,077 relates to novel antigens from mycobacteria capable of evoking early (within 4 days) immunological responses from T-helper cells in the form of gamma-interferon release in memory immune animals after rechallenge infection with mycobacteria of the tuberculosis complex. The antigens of the invention are believed useful especially in vaccines, but also in diagnostic compositions, especially for diagnosing infection with virulent mycobacteria. Also disclosed are nucleic acid fragments encoding the antigens as well as methods of immunizing animals/humans and methods of diagnosing tuberculosis.
US patent no. 6,596,281 describes two genes for proteins of M. tuberculosis have been sequenced. The DNAs and their encoded polypeptides can be used for immunoassays and vaccines. Cocktails of at least three purified recombinant antigens, and cocktails of
at least three DNAs encoding them can be used for improved assays and vaccines for bacterial pathogens and parasites.
US patent no. 5,700,683 provides specific genetic deletions that result in an avirulent phenotype of a mycobacterium. These deletions may be used as phenotypic markers of providing a means for distinguishing between disease-producing and non-disease producing mycobacteria.
US Patent no. 5,225,324 relates to a family of DNA insertion sequences (ISMY) of mycobacterial origin and other DNA probes which may be used a probes in assay methods for the identification of mycobacteria and the differentiation between closely related mycobacterial strains and species. The use of ISMY, and of proteins and peptides encoded by ISMY, in vaccines, pharmaceutical preparations and diagnostic test kits is also disclosed.
WO0066157 patent application provides for polypeptides encoded by open reading frames present in the genome of Mycobacterium tuberculosis but absent from the genome of BCG and diagnostic and prophylactic methodologies using these polypeptides.
US 6,458,366 discloses compounds and methods for diagnosing tuberculosis. The compounds provided include polypeptides that contain at least one antigenic portion of one or more M. tuberculosis proteins, and DNA sequences encoding such polypeptides. Diagnostic kits containing such polypeptides or DNA sequences and a suitable detection reagent may be used for the detection of M. tuberculosis infection in patients and biological samples. Antibodies directed against such polypeptides are also provided.
S. T. Cole has sequences the complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv. The sequence has been analyzed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. [Nature 393, 537 -544(1998)]
In a multicomponent analysis to determine the association of polymorphism to the degree of virulence and infectivity is in progress. These polymorphisms constitute a set of putative virulence markers that are being validated in 120 clinical isolates of
tuberculosis. The study results in a set of virulence markers, which could be used in predicting the degree of virulence and infectivlty of Mycobacterium infections.
There is no data or technical information in the prior art, which permits to select specifically potential new targets and protective antigens for new drugs and vaccine compositions to treat and prevent infectious diseases including mycobacterial diseases, particularly tuberculosis and leprosy.
OBJECTS OF THE INVENTION
The main object of the present invention is to obtain a nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP) and a corresponding polypeptide thereof
Another main object of the present invention is to identify SNP(s) associated with virulence in bacterial species, preferably members of genus Mycobacteria.
Yet another object of the present invention is to obtain a kit for diagnosing plurality of bacterial diseases, preferably tuberculosis.
Still another object of the present invention is to develop a method to identify SNP(s) in a nucleotide sequence.
Still another object of the present invention is to obtain a vector comprising nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP),
Still another object of the present invention is to obtain a drug composition or a vaccine comprising SEQ ID No. 1, preferably nanotechnology enabled composition and/or vaccine or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers.
Still another object of the present invention is to obtain antibody raised against polypeptide sequence encoded by SEQ ID No. 1 for the treatment of plurality of bacterial diseases, preferably Tuberculosis.
Still another object of the present invention is to develop a method of treating plurality of bacterial diseases, preferably Tuberculosis, said method comprising administering to a subject in need thereof, a therapeutically effective amount of a drug composition or a vaccine comprising SEQ ID No.l or transcribed RNA or polypeptides encoded by SEQ ID No.l or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers or antibody raised against polypeptide sequence encoded by SEQ ID No. 1.
Still another object of the present invention is to provide novel nucleotide sequences that are associated with virulence in members of the genus Mycobacteria and particularly in members of the Mycobacterial complex.
Still another object of the present invention is to provide for the identification of strains including mycobacterium in disease samples.
STATEMENT OF THE INVENTION
Accordingly, the present invention relates to a nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP) and a corresponding polypeptide thereof; a kit for diagnosing plurality of bacterial diseases, preferably tuberculosis, said kit comprising (a) SEQ ID No. 1 or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1; and (b) buffer components; a method to identify SNP(s) in a nucleotide sequence of SEQ ID No.l, said method comprising steps of: (a) aligning genomic sequences of different species to select highly conserved polynucleotide sequence (homologous sequence) amongst the virulent strains; and (b) comparing the highly conserved polynucleotide sequence with homologous sequence in avirulent strain to identify said SNP(s); a vector comprising nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP); a drug composition or a vaccine comprising SEQ ID No. I, preferably nanotechnology enabled composition and/or vaccine or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers; an antibody raised against polypeptide sequence encoded by SEQ ID No. 1 for the treatment of plurality of bacterial diseases, preferably Tuberculosis; and a method of treating plurality of bacterial diseases, preferably Tuberculosis, said method
comprising administering to a subject in need thereof, a therapeutically effective amount of a drug composition or a vaccine comprising SEQ ID No.l or transcribed RNA or polypeptides encoded by SEQ ID No.l or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers or antibody raised against polypeptide sequence encoded by SEQ ID No. 1.
BRIEF DESCRIPTION OF ACCOMPANYING FIGURES
Figure I: Clustal-W alignment showing a SNP C in BCG and T in H37Rv. The SNPs identified across rest of the spoligotyped strains are also shown. The alignments also picked up the indels and long polymorphs.
Figure 2: Gene ontology mapping; Locus id: 4094022; Gene name: dnaQ, Rv37Ilc; GO: 0003677, GO: 0003887, GO: 0004527
Figure 3: shows the multiple alignment of the sequences derived from public domain namely BCG, H37Rv and CDC 1551 and those derived from wet lab namely AGTH37Rv, BS, SI, NIR and WIS and 45 spoligotyped strains. The figure provides evidence of occurrence of SNP at position 4094022 defined in BCG coordinate.
SEQ ID No. 1 refers to virulent locus 4094022 having SNP as indicated in Figure 3. The virulent locus 4094022 corresponds to gene dnaQ, Rv371 Ic.
Table 1: Results of the strain analysis showing SNPs across different spoligotyped
strains
Table 2: Results showing the virulent SNP(s)
Table 3: Results of mapping the virulent SNP(s)
Table 4: Results showing the unique counts of SNP alongwith other SNPs
Table 5: The other SNP(s) validated alongwith the SNP of the invention
DETAILED DESCRPTION OF THE INVENTION
The present invention relates to a nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP) and a corresponding polypeptide thereof.
In another embodiment of the present invention, the SNP is associated with virulence in bacterial species selected from a group comprising gram-positive and gram-negative bacteria, preferably members of genus Mycobacteria.
The present invention also relates to a kit for diagnosing plurality of bacterial diseases, preferably tuberculosis, said kit comprising,
a) SEQ ID No. 1 or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1; and
b) buffer components.
The present invention also relates to a method to identify SNP(s) in a nucleotide sequence of SEQ ID No.l, said method comprising steps of:
a) aligning genomic sequences of different species to select highly conserved polynucleotide sequence (homologous sequence) amongst the virulent strains; and
b) comparing the highly conserved polynucleotide sequence with homologous sequence in avirulent strain to identify said SNP(s).
In still another embodiment of the present invention, the strains are bacterial species, preferably members of genus Mycobacteria.
The present invention also relates to a vector comprising nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP).
The present invention also relates to a drug composition or a vaccine comprising SEQ ID No. 1, preferably nanotechnology enabled composition and/or vaccine or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers.
The present invention also relates to an antibody raised against polypeptide sequence encoded by SEQ ID No. 1 for the treatment of plurality of bacterial diseases, preferably Tuberculosis.
The present invention also relates to a method of treating plurality of bacterial diseases, preferably Tuberculosis, said method comprising administering to a subject in need
thereof, a therapeutically effective amount of a drug composition or a vaccine comprising SEQ ID No.l or transcribed RNA or polypeptides encoded by SEQ ID No.l or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers or antibody raised against polypeptide sequence encoded by SEQ ID No. 1.
More particularly, the invention is directed to identifying virulence in M. tuberculosis & other infectious diseases, using both strands of DNA, with the virulence, allowing identification of the strain, typing of the strain and finally giving indication to its potential degree of virulence, infectivity and/or latency.
The present invention provides for the identification of strains including mycobacterium in disease samples for the specific recognition of pathogenic strains, for precisely distinguishing closely related strains including mycobacterial strains and for defining virulence and resistance patterns.
The invention relates to diagnostic and therapeutic methods for applying these novel single nucleotide polymorph(s) for diagnosis, treatment, and/or prevention of various diseases and/or disorders, particularly tuberculosis and other related symptoms for early and rapid detection of virulent strains of Mycobacterium tuberculosis.
As described herein, the nature of SNPs in the coding regions of Mycobacterium tuberculosis genes has been explored. SNP(s) was identified in virulent locus 4094022 which corresponds to gene dnaQ, Rv3711c relevant to virulence/infectivity, by screening an average of 956 independent polymorphic regions using screening methods. To ensure high accuracy, the reported SNP(s) was confirmed by DNA sequencing. The identified SNP(s) was correlated with the real time infectivity in patients with Tuberculosis.
The protocol involved identifying SNP(s) between the avirulent Mycobacterium tuberculosis BCG and virulent Mycobacterium tuberculosis H37Rv and CDC 1551. Part of this SNP(s) was then extrapolated across 5 strains and the number of SNP(s) was further scaled down and again extrapolated onto 46 spoligotyped strains. The method included screening the SNP(s) already amplified across 5 strains for functions that may
be associated with virulence. These regions or SNP(s) was then amplified across 46 spoligotyped patient samples with prior drug resistance data across 4 drugs namely *Rifampicin, *Isoniazid, *Ethambutol and *Streptomycin and spoligotyped family details available. These information coupled with an in-silico genomic subtraction technique where SNP(s) that is unique to virulent genome was identified. This SNP(s) was also functionally annotated and a unique pathway or activity namely oxido-reductase was identified. Being that the analysis was done on samples isolated from patients already infected with tuberculosis and identifying the SNP(s) in them strongly validates this whole method or process thereby rendering it novel.
The invention relates to genes, which comprises a single nucleotide polymorphism at a specific location. In a particular embodiment the invention relates to the DNA having a single nucleotide polymorphism, which differs from a reference DNA by one nucleotide at the site (s) identified in Figure 3. Complements of these nucleic acid segments are also included.
The invention further provides a method of analyzing a nucleic acid from different strains of Mycobacterium tuberculosis. The method determines which base is present at any one of the polymorphic sites shown in Figure 3. Optionally, a set of bases occupying a set of the polymorphic sites shown in Figure 3 is determined. This type of analysis can be performed on a number of strains, which are tested for the presence of virulent SNP(s). The presence or absence of virulent SNP(s) is then correlated with a base or set of bases present at the polymorphic site or sites in the strains tested.
The invention enables the early detection of virulent M. tuberculosis strains that would otherwise take weeks to identify. Added advantage would be a larger number of SNPs alongwith SNP of the invention that have been validated for virulence from a larger number/diverse array of bacterial strains would increase the probability of detecting uncommon virulent strains in a diverse patient set.
The method according to the invention has the advantage to reduce drastically the number of potential new targets and protective antigens by giving for the first time an exhaustive description of conserved SNP(s) in different M tuberculosis strains, which cause tuberculosis. The isolated nucleotides described in the present invention, which are highly conserved in genomic sequences of virulent strains are essential for the survival or the virulence of these strains, in particular mycobacteria, in the host. The
identification of antigens and potentially therapeutic targets has been made by a method of comparative genomic analysis.
The invention relates to the identification and analysis of Non-synonymous SNP(s) to predict conservative and non-conservative amino acid substitutions. The effect of the substitution on the function of the proteins encoded provided a powerful insight in predicting SNP(s) correlating with virulence and infectivity in infectious diseases for example M tuberculosis.
The invention further relates to proteins, RNA, DNA and metabolites encoded by the region carrying the polymorphisms in tuberculosis and other infectious disease causing organisms; which can be utilized for developing drugs and vaccines effective against tuberculosis and other infectious diseases, plays an important role in gene therapy, RNAi technology and imaging.
The invention is also directed to a process for the production of recombinant polypeptides and chimeric polypeptides comprising them, antibodies generated against these polypeptides, immunogenic or vaccine compositions comprising at least one polypeptide useful as protective antigens or capable to induce a protective response in vivo or in vitro against mycobacterium infections, immunotherapeutic compositions comprising at least such a polypeptide according to the invention, and the use of such nucleic acids and polypeptides in diagnostic methods, vaccines, kits, or antimicrobial therapy. Diagnostic markers are important for early diagnosis of many diseases, as well as predicting response to treatment, monitoring treatment and determining prognosis of such diseases.
The instant invention provides a method for the identification and selection of polymorphisms associated with the virulence and/or infectivity in infectious diseases by a comparative genomic analysis of the sequences of different clinical isolates/strains of infectious organisms. The method comprising the steps of aligning the genomic sequences of different Mycobacteria species to
a. Select a polynucleotide sequence highly conserved amongst the virulent strains and
corresponding to an essential gene for the survival or the virulence of mycobacterium
species
b. Select polymorphisms between virulent and avirulent strains to identify genes and
regions conferring virulence to the former strains
c. And optionally, testing the polynucleotide selected for its capacity of virulence or
involved in the survival of a mycobacterium species said testing being based on the
activation or inactivation of said polynucleotide in a bacterial host or said testing being
based on the activity of the product of expression of said polynucleotide in vivo or in
vitro.
d. Identify polymorphisms, such as identical nucleotide in virulent strains/species, but a
different nucleotide in avirulent strains/species at the same position and or difference of
virulent strains in the nucleotide sequence at specific positions and sharing the
nucleotide sequence (flanking regions) with that of avirulent strains which has potential
to be used as reagents and in diagnostics, drug and vaccine development for infectious
diseases.
The availability of complete genomic sequences of various organisms promises to significantly advance our understanding of various fundamental aspects of biology. It also promises to provide unparalleled applied benefits such as understanding genetic basis of certain diseases, providing new targets for therapeutic intervention, developing a new generation of diagnostic tests etc. However, new and improved tools will be needed to harvest and fully realize the potential of genomics research.
The ability to establish differences between DNA samples from two different sources or from the same source but under different developmental or environmental conditions is very important. Subtle differences in the genetic material can often yield valuable information, which can help understand physiological processes as well as can provide powerful techniques with wide applications. The approach has broad applications in areas such as forensic science, determination of predisposition of individuals to certain diseases, tissue typing, molecular taxonomy etc. DNA fingerprinting is already being used for a variety of purposes. Single nucleotide polymorphism (SNP) screening promises to be yet another powerful tool intended for use in some of these applications.
Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different
forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers).
"Polymorphism" as used herein, refers to the occurrence of two or more genetically determined alternative sequences. A polymorphic marker or site is the locus at which divergence occurs. A polymorphic locus may be as small as one base pair.
A 'single nucleotide polymorphism" usually arises due to substitution of one nucleotide for another at the polymorphic site. A single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences. The site is usually preceded by and followed by highly conserved sequences of the allele.
The present invention provides nucleotide sequences that are associated with virulence in member of the genus mycobacteria, and particularly in members of the mycobacterial complex. The Mycobacterium tuberculosis complex consists of six species-M tuberculosis, M. bovis, M. canotti, M. microtii and M. africanum. Of these, the genomes of two different strains of M tuberculosis, which are virulent and infective to humans, have been completely sequenced, while the complete genome of M bovis BCG, which is non-virulent and non-infective has also been sequenced. Only partial sequences are available for the other species. To perform a comprehensive survey, as described herein, a collection of genes which are known to play important roles in the virulence/infectivity of M tuberculosis gene sequences were obtained from NCBI, EMBL, GENBANK, Sanger and TIGR databases. Where multiple sequences were available, a consensus sequence was derived. Determination of coding sequence, untranslated regions and intronic region was based on annotation at the public database, although internal checks were performed to ensure accurate determination of start and stop codons, open reading frames and the like. The genes were chosen because of their relevance to common, clinically significant virulence. They encode proteins involved in virulence. Variation in these genes was studied in a sample including a set of five Mycobacterium tuberculosis strains with known virulence screened for the polymorphisms. Strains chosen for the study were:
a. H37Rv - a reference laboratory strain known to be infective to mice, but is only mildly infective in humans. It has undergone a number of passages in the lab since
its isolation. It is the standard used in studies on tuberculosis in different laboratories across the world.
b. Beijing strain - a clinical isolate with known virulence and infectivity in humans.
70% of the patients with tuberculosis in certain areas of India and China are
infected with this strain. The strain was isolated from a patient in the Western
Indian state of Mumbai.
c. S.I - a mild South Indian strain with only mild virulence and infectivity in humans
isolated from a patient residing in the South Indian state of Hyderabad.
d. N.l.F - Fatal North Indian strain isolated from Safderjung hospital, Delhi where the
patient developed pulmonary tuberculosis died.
e. "N.I.NF - a non-fatal North Indian strain isolated from Safderjung hospital, Delhi.
"Virulence" is the relative capacity of a pathogen to overcome body defenses; it is also the relative ability to cause disease in an infected host. In gram-negative bacterial pathogens, virulence is generally determined by a multiplicity of traits that endow the pathogen with its ability to exploit anatomical weaknesses and overcome the immune defenses of the host. It is expected that a similar multiplicity of traits determines the virulence of pathogenic Mycobacteria. Properties associated with virulence in microorganisms include those listed as below:
1. Infectious: capable of being spread from one individual to another.
2. Capable of entering mammalian host cells.
3. Capable of surviving or escaping phagocyte cellular defenses.
4. Capable of multiplying in host cells.
5. Capable of spreading from one infected cell to an uninfected cell.
6. Capable of causing cell injury that results in pathology.
In addition, a virulent organism may be capable of killing the infected host.
As used herein, the term "virulence factor encoding sequence" denotes a nucleotide sequence that encodes a product that is associated with virulence in a member of the mycobacterial species. This term is encompassed within the term a "sequence associated with virulence" that denotes that a polynucleotide sequence that confers a trait associated with virulence on an avirulent mycobacterium, whether or not the polynucleotide encodes a product.
The virulence factors thus identified could be used as:
i. Diagnostic markers in prediction of disease and its progress in the patient, ii. Drug targets for development of new and effective treatments for TB. iii. Candidate genes/sequences in DNA vaccine, iv. In development of SiRNA technology for combating tuberculosis.
Analysis of SNP(s)
The SNPs identified were of two kinds:
i. Identical nucleotide in CDC1551 and H37Rv, but a different nucleotide in
BCG at the same position, ii. One of the three sequences was polymorphic; the nucleotide sequence of CDC 1551 and H37Rv were different from each other and one of them was identical to the BCG sequence at identical positions.
The SNP(s), thus identified were categorized according to their location in Open
Reading Frames. "Open reading frame" [ORF] as used herein refers to polymorphs in
the coding region. SNP(s) falling within the ORF of both BCG and H37Rv were
identified (Figure 1). The results were validated by determining the presence of SNP(s)
in the ORFs of BCG and CDC1551. The SNPs falling in ORFs were further
categorized into synonymous and non-synonymous SNPs. "Synonymous" refers to a
substitution leading to the same amino acid inspite of a SNP.
A SNP was said to cause a non-synonymous change or a substitution leading to a
different amino acid because of a SNP if:
1) It occurs in an ORF 2) It occurs in the *same* ORF in the genome it is being
compared to.
In some cases a SNP can be in one ORF in the reference sequence but in another ORF in the comparison sequence, due to a frame-shift mutation earlier in the sequence. So before assigning SNP(s) to 'Non Synonymous' or 'Synonymous' groupings all SNP(s) which either did not fall in an ORF, or fell into different ORF's on the reference and comparison sequences were eliminated. The BCG and H37 genomes have been annotated with respect to one another. However CDC 1551 has not been so thoroughly
annotated, hence assessing if an ORF in BCG was the corresponding ORF in CDC. Therefore, a metric was devised to eliminate spurious comparisons.
The non-synonymous SNP(s) thus identified was analysed to predict conservative and non-conservative amino acid substitutions. "Conservative amino acid substitution" as used herein means amino acids belonging to the same functional group of the protein such as valine and isoleucine as a result of SNP.
"Non-conservative substitution" means amino acids resulting in different functional groups such as valine and serine as a result of SNP. The effect of the substitution on the function of the proteins encoded was predicted. This provides a powerful insight in predicting SNP(s) correlating with virulence and infectivity in M tuberculosis,
Indels which are insertions and deletions of one or more bases in the sequence with
respect to BCG sequence. These indels could be of one or more nucleotides.
Considering BCG as reference sequence, indels in both the strains of M. tuberculosis,
H37rv and CDC1551 were identified,
"Long polymorphs" are the regions with numerous changes in the sequences aligned
which are insertions or deletions of long stretches of nucleotides with respect to BCG
sequence.
The annotation table consisting of the GENBANK features of the genes such as coding region, database reference and product information to name a few was constructed.
Overall, the sample of 1229 polymorphic regions were screened for SNPs in a total of 4411.530 kb, consisting of 1130 of coding regions and 99 from adjacent non-coding region. Sequences were amplified by the polymerase chain reaction (PCR) and screened.
The candidate polymorphisms were included in subsequent confirmation tests. PCR assays spanning each exon were designed using Primer 3. 0 release 0.7. PCR was performed according to standard protocols. Primers have been designed to encompass the regions of polymorphisms. DNA from the five strains have been amplified under optimal conditions determined for each primer pair. The amplified fragments have been sequenced and the sequences obtained from different strains compared. The term
primer as used herein refers to a single-stranded oligonucleotide capable of acting as a point of initiation of template-directed DNA synthesis under appropriate conditions (i.e., in the presence of four different nucleoside triphosphates and an agent for polymerization, such as, DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact sequence of the template but must be sufficiently complementary to hybridize with a template. The term primer site refers to the area of the target DNA to which a primer hybridizes. The term primer pair means a set of primers including a 5' upstream primer that hybridizes with the 5' end of the DNA sequence to be amplified and a 3', downstream primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
All in all, totally 1229 primers were amplified based on selecting SNP(s) from all categories such as synonymous, non-synonymous, few indels and few long polymorphs. The idea was to cover as many SNP(s) as possible within these 1229 primers. Out of these 1229 primers, 956 polymorphs were identified. The primers governing these 956 polymorphs were also amplified across 5 strains namely AGTH37RV, BS, SI, NIR and WIS. Out of these 956 polymorphs, 222 regions or genes were chosen based on functional annotation of COG and KEGG. These 222 regions were amplified across 46 spoligotyped strains. Out of these 222 regions 189 of them passed the quality checks of PCR, sequencing and alignment checks. 477 polymorphs in total were identified out of these 189 regions. SNP(s) was then identified in virulent locus 4094022 which corresponds to gene dnaQ, Rv3711c gene relevant to virulence/infectivity from insilico studies.
Because screening methods can generate a significant number of false positives, it was important to confirm reported SNP. Samples implicated as containing a candidate SNP were thus subjected to manual curation of chromatograms from DNA sequencing, to identify and confirm the presence of the SNP.
''Spoligotyping" as mentioned herein refers to a method used for simultaneous detection and typing of Mycobacterium tuberculosis isolates. It is based on analysis of polymorphisms in the direct repeat (DR) region consisting of identical direct repeats and unique spacer sequences. Spoligotyping has a lower level of discrimination than IS6110 typing, suggesting a lower mutation rate in the DR region. This method is based on polymerase chain reaction (PCR) amplification of a highly polymorphic direct repeat locus in the M tuberculosis genome. Results can be obtained from a M tuberculosis culture within 1 day. Thus, the clinical usefulness of spoligotyping is determined by its rapidity, both in detecting causative bacteria and in providing epidemiologic information on strain identities. Implementing such a method in clinic settings are useful in surveillance of tuberculosis transmission and in interventions to prevent further spread of this disease.
"Clinical Isolates" refers to samples obtained from patients infected with TB.
The term "homology" and "homologous" refers to a degree of identity. There may be partial homology or complete homology. A partially homologous sequence is one that is less than 100% identical to another sequence.
From the foregoing, it is apparent that the invention includes a number of general uses that can be expressed concisely as follows. The invention provides for the use of the nucleic acid segments described above in the diagnosis or monitoring of diseases, and infection by microorganisms. The invention further provides for the use of the nucleic acid segments in the manufacture of a medicament for the treatment or prophylaxis of such diseases. The invention further provides for the use of the DNA segments as a pharmaceutical.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Reference will now be made in detail to the preferred embodiments of the invention. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives,
modifications and equivalents, which may be included within the spirit and scope of the invention. The field of genomics has taken rapid strides in recent years. It started with efforts to determine the entire nucleotide sequence of simpler organisms such as viruses and bacteria. As a result, genomic sequences of Hemophilus influenzae (Fleischman et al.. Science 269: 496-512 [1995]) and a number of other bacterial strains {Escherichia coli, Mycobacterium tuberculosis, Helicobacter pylori, Caulobacter jejuni, Mycobacterium leprae) are now available. This was followed by the determination of complete nucleotide sequence of a number of eukaryotic organisms including budding-yeast {Saccharomyces cerevisiae) (Goffeau et al., Science 274: 563-567 [1996]), nematode {Cenorhabditis elegans) (C. elegans sequencing consortium. Science 282: 2012-2018 [1998]) and fruit fly {Drosophila melanogaster) (Adams et al., Science 287: 2185-2195 [2000]). Genome sequencing is rapidly advancing and several genomes are now complete or partially complete, including the human, mouse, and rice genomes.
In one of the embodiments, the invention details the screening for polymorphisms shortlisted from the comparative genome database in isolates of Mycobacterium tuberculosis with known clinical profile, which further helps to arrive at polymorphisms along with a strong correlation with virulence and infectivity. A set of 5 M tuberculosis strains with known virulence/lack of virulence is being screened for the polymorphisms identified from the database. The same may not in anyway be considered restrictive to the present invention. The studies described here helps to identify crucial SNP(s) that could be marked as virulent. It primarily comprised of the steps (a) Identification of SNP(s) across 46 spoligotyped strains and Strain analysis, (b) Genomic subtraction study, (c) SNP(s) to spoligotyping, (d) Mapping genomic subtraction to drug resistance and (e) Gene ontology mapping.
The invention is further elaborated with the help of following examples. However, these examples should not be construed to limit the scope of the invention.
Example 1 Strain analysis
The method used in identifying the SNPs across 46 spoligotyped strains was the same as that used for the 4 strains that were analysed earlier. A total of 222 potentially
virulent regions were picked up after screening them across several annotations which included COG, KEGG pathway and gene ontology.
An elaborate bioinformatic analysis of the sequences were done to identify the SNP position and/or to find SNPs in new locations comprising:
> Quality checking of the DNA sequences of the strains, (using Emboss tools)
> Fixing the window length for the detection of polymorphisms (Perl script)
> Extracting polymorphisms at the predetermined location as well as in new places Clustal-W, Blast etc).
The quality checking of the sequences was done by removing the poly-A tails and terminal N's. The window length of the sequences was fixed by considering the presence of an identical nucleotide across the 5 genomes and the Sstrains under study as a starting point. The same was done to identify an identical nucleotide across the 3 genomes and strains and finally, to look at all polymorphisms that exist between BCQ and the rest of the strains and genomes. By this procedure any false positives, sequencing errors and terminal N's if any were eliminated and strain specific polymorphisms were detected and selected. The results of the analysis are shown in table 1,
Legend: The above table consists of the SNP between BCG and for all the 46 spoligotyped strains under study. The highlighted records indicate the SNPs for which primers were designed. This table shows the complete result of the alignment, inclusive of forward and reverse strand. In the table above the row 'Group' stands for the spoligotyped group of the strains, the row 'Resistance' stands for susceptibility or resistant as in SRRS refers to susceptibility or resistant to Rifampicin, Isoniazid. Ethambutol and Streptomycin respectively. 'NA' is used to denote absence of SNP data as in 'NA, NA' refers to SNP data absent in both forward and reverse strand. ^*' is used to denote a deletion of a base.
Example 2 Genomic subtraction
From the alignment, the SNP position in H37Rv in relation to the reference BCG is known. In order to look at SNP(s) that are common to both BCG and H37Rv, the
SNP(s) within the ORF are considered. In order to achieve this, a database of the SNP(s) and 15 bases flanking the SNP(s) on either side of them was constructed using BCG as the database. "Next, the identical SNP region was taken for H37Rv and a blasui was done against BCG as the database. The idea was to identify unique genes in H37Rv. This is done by comparing the identity scores of BCG versus the H37Rv alignment. This set of genes are then used to analyse the strains in order to identify which SNP(s) are homologues to H37Rv or virulent SNP(s) with respect to H37Rv. These unique SNP regions of H37Rv will serve as a query against a database of all strains SNP data of 15 bases flanking either side of the SNP.
The table 2 shows the result of the virulent SNP(s) identified across the 45 spoligotyped strains by using the above-mentioned genomic subtraction. The same study was also done for the 4 strains dataset namely BS, SI, NIR and WIS that were sequenced earlier.
Legend: The table above displays the unique SNP loci that are homologues to H37Rv and these can be inferred from the HSP Length and match. The region can be used as a marker for virulent SNP(s).
Example 3
SNP(s) to spoligotyping
The SNPs that were identified across the 46 spoligotyped strains was linked to the spoligotyped information to ascertain which of these strains are conserved across the virulent ones. This would enable us to pick out those SNP(s) as virulent. This can be achieved by obtaining a count of each strain across the SNP(s) identified.
Example 4
Mapping genomic subtraction to drug resistant
In order to validate the results that were obtained in the genomic subtraction study, wherein the SNP(s) was identified as virulent, considering the SNP data and the drug resistant an analysis was done. Isoniazid, Rifampicin, Ethambutol and Streptomycin were the 4 drugs against which the susceptibility or resistant or multidrug resistant is already known. The basic idea was to check if the base were conserved across all the resistant ones.
The analysis involved marking each record as virulent or not for the SNP(s) individually to the 4 drugs and then collating the results to obtain a uniform scaling of virulence with respect to the drugs. In order to achieve this, for the SNP record, total resistant data points across which the data is present meaning either forward or reverse strand data of the SNP, the total resistant data points for which the SNP data is not available. Based on this data, the resultant output was filtered to obtain high quality results and therefore the resulting output was classified with respect to its degree of' quality and virulence as 80-85%, 85-90%, 90-95% and 95-100%. This method was done for all 4 drugs independently and extrapolated across the SNP(s) identified as virulent. The results are shown in the tables 3 and 4 below.
Table 4: Results showing the unique counts of SNP alongwith other SNPs
Legend: The above table provides information on unique counts of locus for each group [intervals] and for each drug.
From the two methods of genomic subtraction and mapping it to drug resistance, we were able to pick up the same SNP(s) as virulent. The most important achievement ot this study was to identify genes that have been found to be associated with virulence, namely the katG and faD26 genes as shown in table 5. Further, it also helped in identification of the ermB gene, which encodes a protein involved in multi-drug resistance (efflux protein).
The other SNP(s) validated alongwith the SNP of the invention are as follows:
Example 5
Gene ontology mapping
Figure 2 shows the mapping of the SNP(s) identified as virulent to the gene ontoiog\. Gene ontology mapping was done in order to obtain a complete annotation of the SNP(s) in terms of its pathway or activity and not much could be inferred with respect to its function from either COG or KEGG annotation shown above. By looking at the gene ontology functions, it can be inferred that the locus fall under the primary class of oxidoreductase activity. The mechanism involved is during catalysis of an oxidation-reduction (redox) reaction, a reversible chemical reaction in which the oxidation states of an atom or atoms within a molecule is altered. One substrate acts as a hydrogen or electron donor and becomes oxidized, while the other acts as hydrogen or electron acceptor and becomes reduced.
The genes encoding oxidoreductases are known to be involved in bacterial defense during entry in persistence stage. For example the role of KatG, a catalase-peroxidasc bifunctional enzyme known to play a role in resistance to isoniazid (INH), has been assessed as a potential virulence factor.
The locus ID or SNP ID along with gene name and locus tag and gene ontology ID information is provided in the map as in figure 2.
Example 6
Validation of SNP(s)
Determining the Virulence of M tuberculosis, is finding the factors that are important for progression of Tuberculosis. M tuberculosis virulence will be studied in mice as an animal model. Female BALB/c mice in the age group of 6-8 weeks will be inducted into the study. Animals will be experimentally infected in a biosafety level 3 (BSL 3)
facility by inhalation route in an aerosol infection chamber. In preliminary experiment the concentration of bacilli required to yield 100 and 300 cfu /lung will be determined.
After infection with 100 and 300 bacilli/lung, a group of animals will be sacrificed at 24 hrs after aerosol infection and the extent of infection will be evaluated by plating the lung homogenate. The animals will be housed for 5 months in BSL3 facility. At 3 weeks after infection 50% of the animals will be sacrificed, the organism load will be determined in the lung and spleen. Histopathology will be done on both the organs.
During the course of the study, weekly body weight will be recorded. Clinical signs, morbidity and mortality will be recorded daily. Animals that die during the course ot the study will be subjected to detailed gross & histopathogical examinations. The organism load in lung and spleen will be determined by plating serial dilutions of individual whole organ homogenates on nutrient 7H11 agar and assessing bacterial colony formation. After completion of the study period (5 months) the remaining surviving animals will be sacrificed and examined for gross and histopathological examination, and organisms load in lung and spleen will be determined.
The virulence of the M tuberculosis strains will be measured in terms of organism load. morbidity and mortality. The organism load allows a comparison of fitness of different bacterial strains to survive host response to infection. Morbidity is measured by histopathological analysis, body weight reduction and clinical signs to characterize M. tuberculosis strains affecting virulence without affecting the bacterial load. Mortality is the percentage of infected animals that die and is also measured as the time taken for all animals to die after being infected.
Thus the outcome of the study would be the identification of virulence factors or sequence that could be used as
1. Diagnostic markers in prediction of disease and its progress in the patient
2. Drug targets for development of new and effective treatments of TB
3. Candidate genes / sequences in DNA vaccine.
We Claim:
1) A nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP) and a corresponding polypeptide thereof.
2) The nucleotide sequence as claimed in claim 1, wherein the SNP is associated with virulence in bacterial species selected from a group comprising gram-positive and gram-negative bacteria, preferably members of genus Mycobacteria.
3) A kit for diagnosing plurality of bacterial diseases, preferably tuberculosis, said kit comprising,
a) SEQ ID No. 1 or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1; and
b) buffer components.
4) A method to identify SNP(s) in a nucleotide sequence of SEQ ID No.l, said
method comprising steps of:
a) aligning genomic sequences of different species to select highly conserved polynucleotide sequence (homologous sequence) amongst ihc virulent strains; and
b) comparing the highly conserved polynucleotide sequence with homologous sequence in avirulent strain to identify said SNP(s).
5) The method as claimed in claim 4, wherein the strains are bacterial species, preferably members of genus Mycobacteria.
6) A vector comprising nucleotide sequence of SEQ ID No. 1, having atleast one single nucleotide polymorphism (SNP).
7) A drug composition or a vaccine comprising SEQ ID No. 1, preferablv nanotechnology enabled composition and/or vaccine or transcribed RNA or polypeptides encoded by SEQ ID No. 1, or metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith pharmaceutically acceptable carriers.
8) An antibody raised against polypeptide sequence encoded by SEQ ID No. 1 for
the treatment of plurality of bacterial diseases, preferably Tuberculosis.
9) A method of treating plurality of bacterial diseases, preferably Tuberculosis,
said method comprising administering to a subject in need thereof, a
therapeutically effective amount of a drug composition or a vaccine comprising
SEQ ID No.l or transcribed RNA or polypeptides encoded by SEQ ID No.l or
metabolomes or cell lines arising from SEQ ID No. 1, optionally alongwith
pharmaceutically acceptable carriers or antibody raised against polypeptide
sequence encoded by SEQ ID No. 1.
10) The nucleotide sequence of SEQ ID No, 1, the kit for diagnosing plurality of
bacterial diseases, preferably Tuberculosis, the method to identify SNP(s), the
vector, the drug composition or the vaccine, the antibody and the method of
treating plurality of bacterial diseases, preferably Tuberculosis as substantially
herein described along with accompanying examples and figures.
| # | Name | Date |
|---|---|---|
| 1 | abs-207-che-2008-8.jpg | 2011-09-02 |
| 2 | abs-207-che-2008-7.jpg | 2011-09-02 |
| 3 | abs-207-che-2008-6.jpg | 2011-09-02 |
| 4 | abs-207-che-2008-5.jpg | 2011-09-02 |
| 5 | abs-207-che-2008-4.jpg | 2011-09-02 |
| 6 | abs-207-che-2008-3.jpg | 2011-09-02 |
| 7 | abs-207-che-2008-2.jpg | 2011-09-02 |
| 8 | abs-207-che-2008-1.jpg | 2011-09-02 |
| 9 | 207-che-2008-form 5.pdf | 2011-09-02 |
| 10 | 207-che-2008-form 3.pdf | 2011-09-02 |
| 11 | 207-che-2008-form 1.pdf | 2011-09-02 |
| 12 | 207-che-2008-drawings.pdf | 2011-09-02 |
| 13 | 207-che-2008-description(provisional).pdf | 2011-09-02 |
| 14 | 207-che-2008-correspondnece-others.pdf | 2011-09-02 |
| 15 | 207-che-2008-claims.pdf | 2011-09-02 |
| 16 | 207-che-2008-abstract.pdf | 2011-09-02 |
| 17 | 0207-che-2008 others.pdf | 2011-09-02 |
| 18 | 0207-che-2008 form-5.pdf | 2011-09-02 |
| 19 | 0207-che-2008 form-3.pdf | 2011-09-02 |
| 20 | 0207-che-2008 form-2.pdf | 2011-09-02 |
| 21 | 0207-che-2008 form-13.pdf | 2011-09-02 |
| 22 | 0207-che-2008 form-1.pdf | 2011-09-02 |
| 23 | 0207-che-2008 description (provisional).pdf | 2011-09-02 |
| 24 | 0207-che-2008 correspondence-others.pdf | 2011-09-02 |