Abstract: The present invention relates to SNP markers for identification of a subset of HLA-DQ haplotypes and alleles. In particular, the present invention relates to SNP markers for identification of celiac disease (CD) specific HLA-DQ haplotypes and alleles. The said SNP markers are highly specific and accurately identify the CD specific HLA-DQ haplotypes and alleles specifically in north Indian human population. The SNP markers as identified are rs1129740, rs9273012 and rs7744001 and are highly specific to identify the CD specific HLA-DQ alleles viz HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01 and haplotypes viz DQ2.5, DQ2.2, and DQ8. Figure 1
Description:FIELD OF THE INVENTION
[001] The present invention relates to the field of genetics to identify celiac disease (CD) specific HLA-DQ haplotypes and alleles by using molecular markers. In particular, the present invention relates to Single Nucleotide Polymorphism (SNP) markers capable of identifying CD specific HLA-DQ haplotypes and alleles and method for identifying, sorting and selecting CD specific HLA-DQ haplotypes and alleles using these markers.
BACKGROUND OF THE INVENTION
[002] Celiac disease results from a dysregulated immune response to dietary wheat gluten and related cereal proteins. The disease is an acquired disorder, but with a strong hereditary component. The evidence for the importance of genes comes from familial and twin studies. HLA-DQ2 and HLA-DQ8 confer the major CD susceptibility. Most individuals expressing alleles encoding HLA-DQ2 or HLA-DQ8, however, never develop the disease. However, recent studies confirmed that >85% CD patients are positive for HLA-DQ2/8. In addition, the concordance rate among HLA-identical dizgotic twins that share their HLA genes in addition to on average half of their other genes, is much lower that of the monozygotic twins.
[003] A long series of linkage-based, genome-wide scans with the use of hundreds of multiallelic markers in multiplex families (typically affected sibling pairs) have confirmed the major role of the HLA locus and also suggested several minor loci outside the HLA locus. Genome-wide association studies in CD have identified more than 40 confirmed non-HLA genetic loci with limited trans-ethnic replicability. Strikingly, however, with the exception of the HLA locus, few genetic loci have been identified in the different linkage studies. A region on chromosome 5q31-33, first identified in a genome-wide linkage screen of an Italian dataset has been identified in several linkage studies. A recent comprehensive study using single nucleotide polymorphisms (SNP) to fine map the responsible mutation, identified several SNPs weakly associated with celiac disease, but concluded that none of the associated markers could alone explain the strong linkage signal and the causative gene(s) in this region remain(s) elusive. Moreover, studies of Dutch cohorts have indicated a susceptibility gene located on chromosome 19p13.1, but this finding has been hard to replicate in other populations. Thus, the enormous effort to identify celiac disease susceptibility genes by linkage analysis has, by and large, been a frustrating endeavor.
[004] HLA-DQA1*02:01 and HLA-DQB1*02 alleles constitute HLA-DQ2.2, HLA-DQA1*05:01 and HLA-DQB1*02 alleles constitute HLA-DQ2.5 and HLA-DQA1*03:01 and HLA-DQB1*03:02 constitute HLA-DQ8 haplotype.
[005] High-risk HLA-DQ haplotypes, i.e., DQ2.5, 2.2, and 8, are crucial in the pathogenesis of CD, and HLA-DQ2.5 positivity is attributed to the highest risk with the largest effect size but have limited use in initial diagnosis due to a low positive predictive value. However, their high negative predictive value benefits in excluding non-CD subjects in ambiguous cases when CD is suspected, but other diagnostic tests (such as anti-tTG antibodies or intestinal biopsy) are inconclusive. HLA-DQ typing is an effective method for risk stratification, specifically in populations with high gluten consumption. Additionally, it serves as a useful tool for screening relatives of CD patients who are at higher risk, aiding genetic counseling and preventive measures. The estimated prevalence of CD in northern India is 1.04%.
[006] In India, CD is most frequent in northern India and symptomatic CD was observed only in that geographical region. Prevalence of HLA-DQ2/8 is similar, i.e., one-third of the population across different regions of India. Frequency of DQ2.2 is higher in the south Indian population, while DQ2.5 is more prevalent in northern and western Indian populations.
[007] It is essential to also consider the presence of the DQB1*02 and DQA1*05 alleles, even though their relative risk for CD development is relatively low compared to DQ2.5. A study showed that 5.8% of patients who lack HLA-DQ2 and DQ8 variants still carry the DQB1*02 allele.
[008] Identifying HLA haplotypes and alleles using DNA based methods are costly, tedious and unsuitable for efficient risk prediction in clinic, large-scale population level screening for risk assessment in a country like India. Thus, there is a long-felt need for a SNP marker’s panel capable of specifically and accurately identifying CD specific HLA-DQ haplotypes and alleles and a method of identifying, sorting and selecting CD specific HLA-DQ haplotypes and alleles.
OBJECTS OF THE INVENTION
[009] It is an object of the present invention, to provide molecular markers capable of identifying subset of HLA-DQ haplotypes and alleles.
[010] It is another object of the present invention, to provide SNP markers for identifying CD specific HLA-DQ haplotypes and alleles.
[011] It is yet another object of the present invention, to provide a method for identifying, sorting and selecting CD specific HLA-DQ haplotypes and alleles.
[012] It is yet another object of the present invention, to provide a method for identifying, sorting and selecting CD specific HLA-DQ haplotypes and alleles that is highly specific and sensitive.
SUMMARY OF THE INVENTION
[013] The present invention provides for highly specific molecular markers for identifying a subset of HLA-DQ haplotypes and alleles and a method thereof. In particular the present invention provides for SNP markers that accurately identify the CD specific HLA-DQ haplotypes and alleles and a method of identifying, sorting and selecting CD specific HLA-DQ haplotypes and alleles.
[014] According to a basic aspect of the present invention is provided a SNP marker for identifying a Celiac Disease (CD) specific HLA-DQ haplotype and allele, wherein the SNP marker comprises allelic variation at a SNP marker locus selected from the group consisting of: SNP rs1129740 at position chr6:32641328 in SEQ ID No: 1 (S1), SNP rs9273012 at position chr6:32643864 in SEQ ID No.: 2 (S2), and SNP rs7744001 at position chr6:32658309 in SEQ ID No.: 3 (S3).
[015] In another aspect, the present invention provides the SNP marker, wherein the SNP marker is capable of identifying CD specific allele and wherein the CD specific allele is selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01.
[016] A further aspect of the present invention is to provide the SNP marker wherein the SNP marker is further capable of identifying CD specific haplotype and wherein the CD specific HLA haplotype is selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
[017] In a still further aspect the present invention provides a method for identifying CD specific haplotype and allele, wherein the method comprising the step of detecting the presence or absence of SNP marker in a subset of HLA-DQ haplotype and allele wherein the SNP marker is SNP rs1129740 at position chr6:32641328 in SEQ ID No: 1 (S1), SNP rs9273012 at position chr6:32643864 in SEQ ID No.: 2 (S2), and SNP rs7744001 at position chr6:32658309 in SEQ ID No.: 3 (S3).
[018] In yet further aspect the present invention provides the method, wherein the subset of HLA-DQ haplotype and allele comprises CD specific allele selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; and CD specific HLA-DQ haplotype selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
[019] Still further aspect of the present invention provides the method, wherein detecting the presence or absence of SNP marker comprising the step of analyzing DNA sample for presence or absence of one or more SNP marker; wherein the DNA sample is DNA obtained from individual in North Indian population and wherein the individual is CD positive or CD negative.
[020] Yet further aspect of the present invention provides the method, wherein analyzing the DNA sample for presence of one or more SNP marker comprises assessing the SNP marker to identify CD specific HLA-DQ haplotypes and alleles.
[021] Another aspect of the present invention provides the method, wherein assessing the SNP marker to identify CD specific HLA-DQ haplotypes and alleles comprises assessing the index allele in SNP marker; and wherein assessing the index allele in SNP marker comprising the steps of: (a) assessing the presence of index allele G in SNP rs1129740 at position chr6:32641328; wherein the presence of index allele G in SNP rs1129740 at position chr6:32641328 indicates the absence of allele HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; or (b) assessing the presence of index allele G in SNP rs9273012 at position chr6:32643864; wherein the presence of index allele G in SNP rs9273012 at position chr6:32643864 indicates the presence of allele HLA-DQ A1*05:01; or (c) assessing the presence of index allele A in SNP rs7744001 at position chr6:32658309; wherein the presence of index allele A in SNP rs7744001 at position chr6:32658309 indicates the absence of haplotype HLA-DQ 8; or (d) a combination thereof.
[022] Yet another aspect of the present invention provides for the method, wherein identifying the CD specific HLA-DQ haplotype further comprises assessing the SNP genotypes; wherein assessing the SNP genotype comprising, identifying genotype combinations of SNP rs1129740, rs9273012 and rs774400; wherein the presence of identified genotype combination of SNP rs1129740, rs9273012 and rs774400 identifies the CD specific haplotype.
[023] Still another aspect of the present invention provides for the method, wherein the genotype combination of SNP to identify the CD specific haplotype comprises: i. identifying genotype combination I of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination I identifies haplotype HLA-DQ 2.5; and/or ii. identifying genotype combination II of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination II identifies haplotype HLA-DQ2.2; and/or iii. identifying genotype combination III of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination III identifies absence of haplotype HLA-DQ8;
wherein;
a) the genotype combination I includes combination Ia, Ib, Ic, Id or Ie;
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
Ia AA GG -
Ib AA AG AA
Ic AA AG AG
Id AA AG GG
Ie AG AG -
b) the genotype combination II includes combination IIa or IIb; and
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
IIa AA AA AA
IIb AA AG AA
c) the genotype combination III includes combination IIIa;
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
IIIa - - AA
wherein “-“ represents “AA or AG or GG”
[024] In this respect, before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. These together with other objects of the invention, along with the various features of novelty which characterize the invention, are pointed out with particularity in the disclosure. For a better understanding of the invention, its operating advantages and the specific objects attained by its uses, reference should be had to the accompanying drawings and descriptive matter in which there are illustrated preferred embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[025] For a better understanding of the embodiments of the methods described herein, and to show more clearly how they may be carried into effect, references will now be made, by way of example, to the accompanying drawings, wherein like references numerals represent like elements/components throughout and wherein:
[026] Figure 1. illustrates the location of SNP marker on chromosome in respect to the gene of interest i.e., HLA-DQA1 and HLA-DQB1, according to an embodiment of the present invention.
[027] SEQ ID No.: 1 corresponds to a nucleotide sequence comprising the SNP rs1129740.
[028] SEQ ID No.: 2 corresponds to a nucleotide sequence comprising the SNP rs9273012.
[029] SEQ ID No.: 3 corresponds to a nucleotide sequence comprising the SNP rs7744001.
DETAILED DESCRIPTION OF THE INVENTION
[030] The embodiments of the invention will now be described herein, with reference to the accompanying examples and drawings. It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
[031] The present invention provides highly specific and sensitive SNP markers for identifying Celiac Disease specific HLA-DQ haplotypes and alleles. The present invention further provides a method of identifying and selecting subset of HLA-DQ haplotypes and alleles wherein the HLA-DQ haplotypes and alleles are identified by the presence or absence of at least one of the SNP markers disclosed herein.
DEFINITIONS:
[032] As used herein, a "polymorphism" is a variation in the DNA within a particular locus or physical location in the genome between two or more individuals within a population. A polymorphism preferably has a frequency of at least 1 % in a population. A useful polymorphism can include a single nucleotide polymorphism (SNP), a simple sequence repeat (SSR), an insertion/deletion polymorphism etc.
[033] As used herein, the term "allele" refers to one of two or more different nucleotide sequences that occur at a specific locus or physical location in the genome.
[034] As used herein the term "Allele frequency" refers to the relative frequency of an allele at a genetic locus within a population. One can estimate the allele frequency within a population by averaging the allele frequencies of a sample of individuals from that population. Similarly, one can calculate the allele frequency within a population of groups of individuals by averaging the allele frequencies of groups that make up the population. For a population with a finite number of individuals, an allele frequency can be expressed as a count of individuals (or any other specified grouping) containing the allele in homozygous or heterozygous conditions.
[035] An allele is "associated with" a trait when presence of the particular allele is part of or linked to a DNA sequence or allele is correlated with the expression of the trait. An allele "negatively" correlates with a trait when it is linked to it and when presence of the allele is an indicator that a desired trait or trait form will not occur in an individual comprising the allele. An allele "positively" correlates with a trait when it is linked to it and when presence of the allele is an indicator that the desired trait or trait form will occur in an individual comprising the allele.
[036] As used herein, the term "locus" refers to a position on a chromosome, e.g. where a nucleotide, gene, sequence, or marker is located.
[037] As used herein, the term "marker locus" refers to a specific chromosome location in the genome of a species where a specific marker is located.
[038] As used herein, the term “molecular markers” or "Genetic markers" refers to nucleic acids that are polymorphic in a population. The term includes nucleic acid sequences complementary to the genomic sequences, such as nucleic acids used as probes. Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well- established in the art. These include, but not limited to, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs). Well established methods are also known for the detection of expressed sequence tags (ESTs) and SSR markers derived from EST sequences and randomly amplified polymorphic DNA (RAPD).
[039] As used herein, the term, "marker allele", used interchangeably with the term "allele of a marker locus", can refer to one of a plurality of polymorphic nucleotide sequences found at a marker locus in a population.
[040] As used herein, the term, the term "haplotype" is the genotype of an individual at a plurality of genetic loci, i.e. a combination of alleles. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome segment. The term "haplotype" can refer to alleles at a particular locus, or to alleles at multiple loci along a chromosomal segment.
[041] As used herein, the term "marker haplotype" refers to a combination of marker alleles at a marker locus.
[042] As used herein, the term "complement" refers to a nucleotide sequence that is complementary to a given index nucleotide sequence.
[043] SNPs disclosed herein can be detected by any of the methods known in art, examples of which include, but are not limited to, DNA sequencing, PCR-based sequence specific amplification methods, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), dynamic allele-specific hybridization (DASH), molecular beacons, microarray hybridization, oligonucleotide ligase assays, Flap endonucleases, 5' endonucleases, primer extension, single strand conformation polymorphism (SSCP) or temperature gradient gel electrophoresis (TGGE). DNA sequencing, such as the pyrosequencing and next generation sequencing technology has the advantage of being able to detect a series of linked SNP alleles that constitute a haplotype.
[044] As used herein, the term "linkage" is used to describe the degree with which one marker locus is associated with another marker locus or some other locus. The linkage relationship between a molecular marker and a locus affecting a phenotype is given as a "probability" or "adjusted probability".
[045] As used herein, the term "linkage disequilibrium" (or LD) refers to a non-random segregation of genetic loci or traits (or both). In either case, linkage disequilibrium implies that the relevant loci are within sufficient physical proximity along a length of a chromosome so that they segregate together with greater than random (i.e., non- random) frequency. Markers that show linkage disequilibrium are considered linked. Linked loci co-segregate more than 50% of the time, e.g., from about 51 % to about 100% of the time. In other words, two markers that co-segregate have a recombination frequency of less than 50% (and by definition, are separated by less than 50 cM on the same linkage group.) As used herein, linkage can be between two markers, or alternatively between a marker and a locus affecting a phenotype. A marker locus can be "associated with" (linked to) a trait. The degree of linkage of a marker locus and a locus affecting a phenotypic trait is measured, e.g., as a statistical probability of co-segregation of that molecular marker with the phenotype (e.g., an F statistic or LOD score).
[046] As used herein, "linkage equilibrium" describes a situation where two markers independently segregate, i.e., sort among progeny randomly. Markers that show linkage equilibrium are considered unlinked (whether or not they lie on the same chromosome).
[047] The "logarithm of odds (LOD) value" or "LOD score" is used in genetic interval mapping to describe the degree of linkage between two marker loci. A LOD score of three between two markers indicates that linkage is 1000 times more likely than no linkage, while a LOD score of two indicates that linkage is 100 times more likely than no linkage. LOD scores greater than or equal to two may be used to detect linkage. LOD scores can also be used to show the strength of association between marker loci and quantitative traits in "quantitative trait loci" mapping. In this case, the LOD score's size is dependent on the closeness of the marker locus to the locus affecting the quantitative trait, as well as the size of the quantitative trait effect.
[048] As used herein, the term, "probability value" or "p-value" is the statistical likelihood that the particular combination of a phenotype and the presence or absence of a particular marker allele is random. Thus, the lower the probability score, the greater the likelihood that a locus and a phenotype are associated. The probability score can be affected by the proximity of the first locus (usually a marker locus) and the locus affecting the phenotype, plus the magnitude of the phenotypic effect (the change in phenotype caused by an allele substitution). In some aspects, the probability score is considered "significant" or "nonsignificant". In some embodiments, a probability score of 0.05 (p=0.05, or a 5% probability) of random assortment is considered a significant indication of association. However, an acceptable probability can be any probability of less than 50% (p=0.5). For example, a significant probability can be less than 0.25, less than 0.20, less than 0.15, less than 0.1, less than 0.05, less than 0.01, or less than 0.001.
[049] As used herein, the term "reference sequence" or a "consensus sequence" refers to a defined sequence used as a basis for sequence comparison.
[050] As used herein, the term "rs_id " means rs-ID, which is an independent marker assigned to all SNPs initially registered by the NCBI dbSNP.
[051] As used herein, the term “Positive Predicting value” of a tag SNP refers to the likelihood that an individual truly possesses a specific genetic variant within a haplotype when the tag SNP test indicates its presence. Essentially, it reflects how reliably the tag SNP predicts the actual occurrence of the variant, particularly in regions of high linkage disequilibrium. A high positive predictive value means the tag SNP is highly accurate in identifying individuals who carry the target variant when the test result is positive.
[052] As used herein, the term “Negative Predicting value” of a tag SNP represents the likelihood that an individual who tests negative for the tag SNP truly does not possess the associated disease-related haplotype. It suggests, if the tag SNP suggests no risk, it is highly probable that the person does not carry the genetic risk factor. This measure essentially reflects how reliable a negative test result is in ruling out the presence of the target haplotype.
[053] As used herein, the term “Sensitivity” means SNP accurately identifies nearly all individuals who carry the target haplotype, ensuring a high true positive detection rate when used as a genetic marker. A tag SNP with high sensitivity is a single SNP that effectively detects a specific genetic variant with a sensitivity near 100%.
[054] As used herein, the term “Specificity” means SNP accurately identifies the presence of a specific haplotype, with a low rate of false positives, typically considered to be above 95% in most research contexts; essentially, if a tag SNP indicates a specific haplotype, it is highly likely that the variant is truly present.
[055] As used herein, the term “subject” refers to a human individual that is studied in a research or experimental setting to gather clinical or statistical data, test hypotheses, or draw conclusions.
[056] As used herein, the term “CD-positive subjects” refers to individuals who have been diagnosed with CD based on serological tests, genetic markers, or biopsy results. These individuals have an immune-mediated reaction to gluten, leading to small intestine damage.
[057] As used herein, the term “CD-negative subjects” refers to healthy individuals of the same ethnicity as CD-positive subjects, meaning they do not show any manifestation or diagnostic markers indicating the presence of the disease. These individuals do not have an immune-mediated reaction to gluten that leads to damage in the small intestine.
[058] As used herein, the term “Genotype” refers to the genetic makeup of an organism, specifically the combination of alleles present at a particular gene or locus in its DNA.
[059] The present invention provides for a SNP marker for identifying CD specific HLA-DQ haplotypes and alleles, wherein the SNP marker comprises allelic variation at a SNP marker locus and wherein the SNP marker is selected from the group consisting of: SNP rs1129740 at position chr6:32641328 in SEQ ID No: 1 (S1) , SNP rs9273012 at position chr6:32643864 in SEQ ID No.: 2 (S2) , and SNP rs7744001 at position chr6:32658309 in SEQ ID No.: 3 (S3).
[060] According to an embodiment of the present invention, the SNP marker as disclosed herein is identified in north Indian population.
[061] According to an embodiment of the present invention, the SNP marker disclosed herein is capable of identifying CD specific allele at the SNP marker loci and wherein the CD specific allele is selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01.
[062] According to another embodiment of the present invention, the SNP marker disclosed herein is capable of identifying CD specific haplotype and wherein the CD specific haplotype is selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
[063] According to further embodiment of the present invention, the sensitivity and specificity of the index allele G of SNP marker rs1129740 in negatively predicting any allele (i.e., HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01) is 87.63% and 97.95% respectively.
[064] According to yet further embodiment of the present invention, the positive predicting value and negative predicting value of the index allele G of SNP marker rs1129740 in negatively predicting any allele (i.e., HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01) 85.86% and 98.24% respectively.
[065] According to another embodiment of the present invention, the sensitivity and specificity of the index allele G of SNP marker rs9273012 in positively predicting DQA1*05:01, is 95% and 90.95% respectively.
[066] According to yet another embodiment of the present invention, the positive predicting value and negative predicting value of the index allele G of SNP marker rs9273012 in positively predicting DQA1*05:01 is 96.84% and 86.19% respectively.
[067] According to another embodiment of the present invention, the sensitivity and specificity of the index allele A of SNP marker rs7744001 in negatively predicting HLA-DQ8, is 100 % and 99.87% respectively.
[068] According to yet another embodiment of the present invention, the positive predicting value and negative predicting value of the index allele A of SNP marker rs7744001 in negatively predicting HLA-DQ8 is 99.82% and 100% respectively.
[069] According to an embodiment of the present invention is provided a method for identifying CD specific haplotypes and alleles, the method comprising the steps of: detecting the presence or absence of SNP marker, wherein the SNP marker as disclosed herein is capable of identifying CD specific subset of HLA-DQ haplotype and allele.
[070] In another embodiment of the present invention is provided the method as disclosed herein, wherein the method further comprising the steps of:
a. analyzing DNA sample for detecting the presence or absence of one or more SNP markers as disclosed herein.
[071] According to yet another embodiment of the present invention, the DNA sample is DNA obtained from subject in North Indian population and wherein the subject is CD positive or CD negative.
[072] According to a further embodiment of the present invention, analyzing the DNA sample for detecting the presence of one or more SNP marker disclosed herein comprises assessing the SNP marker to identify CD specific HLA-DQ haplotypes and alleles.
[073] According to another embodiment of the present invention, the SNP marker is capable of identifying CD specific haplotype and allele at the SNP marker loci and wherein the CD specific allele is selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; and wherein the CD specific haplotype is selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
[074] According to yet further embodiment of the present invention, assessing SNP marker comprises assessing the index allele in SNP marker; and wherein assessing the index allele in SNP marker comprises the steps of:
(a) assessing the presence of index allele G in SNP rs1129740 at position chr6:32641328; wherein the presence of index allele G in SNP rs1129740 at position chr6:32641328 indicates the absence of allele HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; or
(b) assessing the presence of index allele G in SNP rs9273012 at position chr6:32643864; wherein the presence of index allele G in SNP rs9273012 at position chr6:32643864 indicates the presence of allele HLA-DQ A1*05:01; or
(c) assessing the presence of index allele A in SNP rs7744001 at position chr6:32658309; wherein the presence of index allele A in SNP rs7744001 at position chr6:32658309 indicates the absence of haplotype HLA-DQ 8; or
(d) a combination thereof.
[075] According to an embodiment of the present invention, the index allele G of SNP rs1129740 at position chr6:32641328 is a negative predicting allele for HLA-DQ alleles HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01 with negative predictive value of 98.24%. Further, the index allele G of SNP rs9273012 at position chr6:32643864 is positive predicting allele for HLA-DQA1*05:01 with positive predictive value of 96.84% and the index allele A of SNP rs7744001 at position chr6:32658309 is a negative predicting allele for HLA-DQ 8 with a negative predictive value of 100%.
[076] According to a further embodiment of the present invention, identifying the CD specific HLA-DQ haplotype further comprises assessing the SNP genotypes; wherein assessing the SNP genotype comprises identifying genotype combinations of SNP rs1129740, rs9273012 and rs7744001 wherein the presence of said identified genotype combination of SNP identifies the CD specific haplotype.
[077] According to another embodiment the present invention, identifying the genotype combinations of SNP to identify the CD specific haplotype comprises:
I) identifying genotype combination I of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination I identifies haplotype HLA-DQ 2.5; and/or
II) identifying genotype combination II of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination II identifies haplotype HLA-DQ2.2; and/or
III) identifying genotype combination III of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination III identifies absence of haplotype HLA-DQ8;
wherein:
a. the genotype combination I includes:
SNP rs1129740 SNP rs9273012 SNP rs7744001
Allele AA GG -
AA AG AA
AA AG AG
AA AG GG
AG AG -
b. the genotype combination II includes:
SNP rs1129740 SNP rs9273012 SNP rs7744001
Allele AA AA AA
AA AG AA
c. the genotype combination III includes:
SNP rs1129740 SNP rs9273012 SNP rs7744001
Allele - - AA
wherein “-“ represents “AA or AG or GG.
Example
[078] According to an embodiment of the present invention, the CD specific haplotypes and alleles were identified using SNP markers. Briefly, 459 CD positive individual and 450 CD negative individual DNA samples were genotyped for approximately 200,000 variations including Major Histocompatibility Complex (MHC) locus of chromosome 6 on Illumina Immunochip platform. Afterwards, these DNA samples were subjected to HLA typing using allele specific primers through SSP_PCR method. HLA typing identified HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01, DQA1*03:01 alleles and HLA-DQ2.2, HLA-DQ2.5, HLA-DQ8 haplotypes in these samples. Further, all the variations obtained from Immunochip assay from MHC locus (chr6:28510120-33480577) harboring HLA genes were considered for LD (r2>0.9) pruning using Plink tool. Thereafter, LD independent variations from only HLA-DQA1 and HLA-DQB1 were considered for subjectively prioritizing tag-SNP identification. The efficiency of the tag-SNPs in identifying HLA-DQA1 and HLA-DQB1 alleles and HLA-DQ haplotypes was assessed using sensitivity, specificity, positive predicting value and negative predicting value. The results obtained are summarized in tabulated form below:
Table 1: Three SNP markers and their efficiencies in predicting the celiac disease specific HLA-DQ alleles.
DQ2/8
Tag-SNP
Index allele
Location
Sensitivity
(%) Specificity
(%) Positive
Predictive
Value (PPV) (%) Negative
Predictive
Value (NPV) (%)
Any allele # rs1129740 G
(negative predicting allele) chr6: 32641328
HLA-DQA1
coding exon 87.63
97.95
85.86
98.24
HLA-DQ A1*05:01 rs9273012 G
(positive predicting allele) chr6: 32643864
HLA-DQA1
intron Variant 95.00
90.95
96.84
86.19
HLA-DQ 8 rs7744001 A
(negative predicting allele) chr6:32658309
HLA-DQB1 1.2kb downstream /HLA-DQB1-AS1
2kb upstream 100 99.87 99.82 100
# DQA1*05:01, DQA1*02:01, DQA1*03:01, DQB1*02, and DQB1*03:02
Table 2: Serial combination of SNP marker genotypes and their efficiencies in predicting the celiac disease specific HLA-DQ haplotypes.
rs1129740 rs9273012 rs7744001 Sensitivity Specificity Positive Predictive Value Negative Predictive Value
HLA-DQ2.5
(positive) AA GG -
AA AG AA
AA AG AG
AA AG GG
AG AG - 88.52
83.02
91.00
78.85
HLA-DQ2.2
(positive) AA AA AA
AA AG AA
41.06
93.63
76.60
75.78
HLA-DQ8
(negative) - - AA
- - -
100 99.87 99.82 100
[079] The results indicated that the index allele G of SNP rs1129740 at position chr6:32641328 was a negative predicting allele for HLA-DQ alleles HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01 with negative predictive
value of 98.24%. Further, the index allele G of SNP rs9273012 at position chr6:32643864 was positive predicting allele for HLA-DQA1*05:01 with positive predictive value of 96.84% and the index allele A of SNP rs7744001 at position chr6:32658309 was a negative predicting allele for HLA-DQ 8 with a negative predictive value of 100%.
[080] It will be apparent to those skilled in the art that various modifications and variations can be made in the products and methods of the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. Additionally, the foregoing examples are appended for the purpose of illustrating the claimed invention, and should not be construed so as to limit the scope of the claimed invention.
, Claims:We claim:
1. A SNP marker for identifying Celiac Disease (CD) specific HLA-DQ haplotype and allele, wherein the SNP marker comprises allelic variation at a SNP marker locus, and
wherein the SNP marker is selected from the group consisting of:
SNP rs1129740 at position chr6:32641328 in SEQ ID No: 1 (S1);
SNP rs9273012 at position chr6:32643864 in SEQ ID No.: 2 (S2); and
SNP rs7744001 at position chr6:32658309 in SEQ ID No.: 3 (S3).
2. The SNP marker as claimed in claim 1, wherein the SNP marker is capable of identifying CD specific allele and wherein the CD specific allele is selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01.
3. The SNP marker as claimed in claim 1, wherein the SNP marker is further capable of identifying CD specific HLA haplotype and wherein the CD specific HLA haplotype is selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
4. A method for identifying CD specific haplotype and allele, wherein the method comprising the step of detecting the presence or absence of SNP marker in a subset of HLA-DQ haplotype and allele wherein the SNP marker is SNP rs1129740 at position chr6:32641328 in SEQ ID No: 1 (S1), SNP rs9273012 at position chr6:32643864 in SEQ ID No.: 2 (S2), and SNP rs7744001 at position chr6:32658309 in SEQ ID No.: 3 (S3).
5. The method as claimed in claim 4, wherein the subset of HLA-DQ haplotype and allele comprises CD specific allele selected from the group consisting of HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; and CD specific HLA-DQ haplotype selected from the group consisting of DQ2.5, DQ2.2, and DQ8.
6. The method as claimed in claim 4, wherein detecting the presence or absence of SNP marker comprising the steps of:
a. analyzing DNA sample for presence or absence of one or more SNP marker;
wherein the DNA sample is DNA obtained from individual in North Indian population and wherein the individual is CD positive or CD negative.
7. The method as claimed in claim 4, wherein analyzing the DNA sample for presence or absence of one or more SNP marker comprises assessing the SNP marker to identify CD specific HLA-DQ haplotypes and alleles.
8. The method as claimed in claim 4, wherein assessing the SNP marker to identify CD specific HLA-DQ haplotypes and alleles comprises assessing the index allele in SNP marker; and wherein assessing the index allele in SNP marker comprising the steps of:
(a) assessing the presence of index allele G in SNP rs1129740 at position chr6:32641328; wherein the presence of index allele G in SNP rs1129740 at position chr6:32641328 indicates the absence of allele HLA-DQB1*02, DQB1*03:02, DQA1*05:01, DQA1*02:01 and DQA1*03:01; or
(b) assessing the presence of index allele G in SNP rs9273012 at position chr6:32643864; wherein the presence of index allele G in SNP rs9273012 at position chr6:32643864 indicates the presence of allele HLA-DQ A1*05:01; or
(c) assessing the presence of index allele A in SNP rs7744001 at position chr6:32658309; wherein the presence of index allele A in SNP rs7744001 at position chr6:32658309 indicates the absence of haplotype HLA-DQ 8; or
(d) a combination thereof.
9. The method as claimed in claim 4, wherein identifying the CD specific HLA-DQ haplotype further comprises assessing the SNP marker genotypes; wherein assessing the SNP marker genotype comprising, identifying genotype combinations of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of identified genotype combination of SNP rs1129740, rs9273012 and rs7744001 identifies the CD specific haplotype HLA-DQ2.5, HLA-DQ2.2 and HLA-DQ8.
10. The method as claimed in clam 4, wherein the genotype combination of SNP marker to identify the CD specific haplotype comprises:
i. identifying genotype combination I of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination I identifies haplotype HLA-DQ 2.5; and/or
ii. identifying genotype combination II of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination II identifies haplotype HLA-DQ2.2; and/or
iii. identifying genotype combination III of SNP rs1129740, rs9273012 and rs7744001; wherein the presence of genotype combination III identifies absence of haplotype HLA-DQ8.
wherein;
a) the genotype combination I includes a combination Ia, Ib, Ic, Id or Ie;
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
Ia AA GG -
Ib AA AG AA
Ic AA AG AG
Id AA AG GG
Ie AG AG -
b) the genotype combination II includes a combination IIa or IIb; and
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
IIa AA AA AA
IIb AA AG AA
c) the genotype combination III includes a combination IIIa;
Allele combination SNP rs1129740 SNP rs9273012 SNP rs7744001
IIIa - - AA
wherein “-“ represents AA, AG or GG.
| # | Name | Date |
|---|---|---|
| 1 | 202511034984-STATEMENT OF UNDERTAKING (FORM 3) [09-04-2025(online)].pdf | 2025-04-09 |
| 2 | 202511034984-Sequence Listing in txt [09-04-2025(online)].txt | 2025-04-09 |
| 3 | 202511034984-Sequence Listing in PDF [09-04-2025(online)].pdf | 2025-04-09 |
| 4 | 202511034984-FORM 1 [09-04-2025(online)].pdf | 2025-04-09 |
| 5 | 202511034984-FIGURE OF ABSTRACT [09-04-2025(online)].pdf | 2025-04-09 |
| 6 | 202511034984-DRAWINGS [09-04-2025(online)].pdf | 2025-04-09 |
| 7 | 202511034984-DECLARATION OF INVENTORSHIP (FORM 5) [09-04-2025(online)].pdf | 2025-04-09 |
| 8 | 202511034984-COMPLETE SPECIFICATION [09-04-2025(online)].pdf | 2025-04-09 |
| 9 | 202511034984-POA [06-05-2025(online)].pdf | 2025-05-06 |
| 10 | 202511034984-FORM 13 [06-05-2025(online)].pdf | 2025-05-06 |
| 11 | 202511034984-AMMENDED DOCUMENTS [06-05-2025(online)].pdf | 2025-05-06 |
| 12 | 202511034984-FORM-9 [17-05-2025(online)].pdf | 2025-05-17 |
| 13 | 202511034984-FORM-26 [17-05-2025(online)].pdf | 2025-05-17 |
| 14 | 202511034984-FORM 18A [02-06-2025(online)].pdf | 2025-06-02 |
| 15 | 202511034984-EVIDENCE OF ELIGIBILTY RULE 24C1f [02-06-2025(online)].pdf | 2025-06-02 |