Protein Sequencing Using Tandem Mass Spectrometry

< Back

Protein Sequencing Using Tandem Mass Spectrometry

Abstract: The present invention discloses a mass spectrometry for identifying a sequence of a protein or antibody at amino acid level. The disclosed method employs multiple protease to get 100% coverage of amino acid sequence.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 March 2019

Publication Number

24/2021

Publication Type

INA

Invention Field

BIO-CHEMISTRY

Status

srinivasvr@drreddys.com

Parent Application

Applicants

Dr. Reddy’s Laboratories Limited

8-2-337 Road No. 3, Banjara Hills, Hyderabad Telangana, India - 500034.

Inventors

1. Rakesh Komarla Sathyanarayana Setty

500/96, East End Main Road, Jayanagar, 9th Block, East, Bangalore, Karnataka, India - 500069.

2. Murali Jayaraman

Third Street, Nandivaram, Guduvancheri Post Kancheepuram (Dist) Tamilnadu India 603202

3. Avinash Bharati

Railway Bungalow No. L-62-B Railway colony Aburoad Sirohi (District) Rajasthan India-307026.

Specification

DESC:INTRODUCTION
The present invention is related to amino acid sequencing of a protein, particularly an antibody molecule, with high accuracy viz., accuracy to the level of 100%.
BACKGROUND OF INVENTION
With the advent of biotherapeutic drugs, specifically monoclonal antibody drug products, treatment of many chronic diseases, including cancer and autoimmune diseases have become plausible and so development of biotherapeutic products has secured high significance in therapeutic medical sciences domain and in commercial market. Particularly, development of low cost biosimilar products (therapeutic products that are similar in safety, efficacy and quality to an already approved reference biotherapeutic product) provide affordability and improved access to large patient population. And for any biosimilar product development, verified product/protein sequence is the first and foremost critical requirement.
Complete protein sequence of the reference biotherapeutics are generally assembled from various public domains, literature, patents and regulatory documents. However, such sequences may have differences and the amino acid at every position of the sequence in a reference therapeutic product needs to be verified to ensure the accuracy of the sequence. In which case, the reference therapeutic product in itself serves as the decisive source for the actual protein sequence.
Mass spectrometry (MS) as a technique has been widely used for the identification of protein sequence. Typically the technique employs a protease to digest the protein or polypeptides to fragments, followed by ionization of the fragments and assignment of amino acids based on the mass ratio of the generated fragment ions. However, sequence coverage following a single protease digest does not identify all the amino acids in a protein / antibody and a few of the missing ones (when compared with the reference protein / antibody) are generally assigned in from the data obtained from the sequence library or various other sources. And so, the sequence obtained is never 100% accurate, although it is corroborated by data from various other sources. Even experimental replicates of single protease digestion process cannot identify the missing or concealed amino acid/s and hence the technique do not render 100 % sequence coverage or identification.
The objective of the present invention is to address the gap and difficulty in MS technique in sequencing a protein/antibody molecule.
SUMMARY OF THE INVENTION
The present invention discloses a method of identifying a sequence of a protein or antibody at amino acid level. The method of the invention involves digestion of the protein/antibody molecule with multiple proteases, and employs optimized conditions for the identification and analysis of a protein or antibody sequence, such that every amino acid in the protein or antibody is identified (i.e., 100 % sequence coverage). The method results in experimental evidence wherein every amino acid in a fragment of the protein / antibody is represented at tandem MS (i.e, MS/MS). Thus the method of the instant invention does not assign any amino acid to fill in the missing peptide fragment masses, and thus results in the protein / biosimilar or biotherapeutic product sequencing wherein full coverage of amino acids is achieved with accuracy (100 % accuracy in determining the sequence).
The disclosed method is sensitive enough to detect 0.08 ng/µl of a peptide’s amino acid sequence at MS/MS level.
DETAILED DESCRIPTION OF THE INVENTION
The present invention discloses a mass spectrometry method to sequence a protein at amino acid level, wherein the method employs multiple protease to cover 100% sequence of the protein.
In an embodiment, the invention discloses a method of identifying a protein sequence using MS technique, comprising the steps of;
a) denaturation of the protein
b) reduction and alkylation of the protein sample
c) proteolytic digestion of the protein with multiple (at least 3) proteases wherein aliquots of the protein sample is incubated separately with separate protease for 4 h for generating fragments of protein
d) subjecting the fragments to liquid chromatography followed by
e) ionization of the fragments in MS/MS
f) analysis of the fragments, using High Definition Data Dependent Acquisition (HD-DDA) method
g) Identification of the amino acids of the protein by HD-DDA method.
In the above mentioned embodiment of the invention, the protein in the above inventive method is a glycoprotein.
In the above mentioned embodiment of the invention, denaturation of the protein is performed using urea or guanidinium hydrochloride
In the above mentioned embodiment of the invention, the denatured protein is further reduced using Dithiothretriol (DTT).
In the above mentioned embodiment of the invention, the reduced protein is alkylated using iodoacetamide.
In an embodiment, the proteases or proteolytic enzyme recognizes a specific sequence of amino acids and cleaves a site within, adjacent to, or at a distance to the specific sequence of amino acids. The proteases used to digest the protein sample to generate protein fragments are trypsin, elastase, Asp-N, Glu-C, thermolysin, chymotrypsin and Lys-C. Multiple proteases compliments the missed sequence of a single protease and helps in achieving 100% amino acid sequence coverage.
In the above mentioned embodiment of the invention, a single digestion buffer is used for multiple proteases. The single digestion conditions of multiple proteases provide peptide arrays which in turn analyzed by MS/MS technique.
In the above mentioned embodiment, the single digestion buffer used for multiple proteases comprises 1M Urea, I mM EDTA, 20 mM Hydroxyl ammonium chloride and 0.1 M Tris, and pH of the buffer is 7.5.
In the above mentioned embodiments of the invention, the liquid chromatography technique used, to separate the peptides of a protein post multiple proteases treatment. Further, the liquid chromatography is a reverse-phase chromatography.
In an embodiment, the invention discloses a method of identifying an antibody sequence using MS technique, comprising the steps of;
a) denaturation of the antibody
b) reduction and alkylation of the antibody sample
c) preparation of multiple proteases using single digestion buffer
d) proteolytic digestion of the antibody with multiple (at least 3) proteases wherein aliquots of the protein sample is incubated separately with separate protease for 4 h for generating fragments of protein
e) subjecting the fragments to liquid chromatography followed by
f) ionization of the fragments in MS/MS
g) analysis of the fragments, using High Definition Data Dependent Acquisition (HD-DDA) method
h) Identification of the amino acids of the antibody by HD-DDA method.
In the above mentioned embodiment, the antibody is a therapeutic antibody and is selected from the group consisting of anti-TNF-a antibody, anti-CTLA4 antibody, anti-PD1 antibody, anti-PDL1 antibody, anti-Her2 antibody, anti-IL6R antibody, anti-VEGFR antibody, anti-IL17A antibody, Anti-a4ß7 antibody, and anti-IgE antibody.
In an embodiment, the single digestion buffer used for multiple proteases comprises 1M Urea, I mM EDTA, 20 mM Hydroxyl ammonium chloride and 0.1 M Tris, and pH of the buffer is 7.5.
The disclosed method employs tandem mass spectrometry (MS/MS) analysis with sensitive, specific and information rich method called high definition data dependent acquisition (HDDDA) to verify a protein sequence at amino acid level.
In an embodiment, the inventive method as described above, enables detection of post translational modification in a glycoprotein, including variants of the glycoprotein such as truncation, fragmentation, N- and C- terminal variants, amino acid substitutions/mutations.
The method does not assign any amino acid to fill in the missing peptide fragment masses and thus enables identification of the sequence of a biosimilar or biotherapeutic product with 100 % accuracy.
A systematic understanding of the molecule and its sequence analysis was carried out at various structural levels such as intact protein, subunit protein and middle-up protein LC MS analyses to understand the molecule and verify the sequence at each of these levels.
DEFINITIONS
The term “antibody” as used herein encompasses whole antibodies and any antigen binding fragment (i.e., “antigen-binding portion”) or single chains or fusion protein thereof. An “antibody” refers to a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds, or an antigen binding portion thereof.
The term “glycoprotein” refers to protein or polypeptide having at least one glycan moiety. Thus, any polypeptide attached to a saccharide moiety is termed as glycoprotein.
The terms “protein denaturation” or “immunoglobulin denaturation” or “denaturation” refer to alteration of the higher order structure (secondary, tertiary or quaternary structure) of proteins that may result in loss of activity. A variety of exemplary denaturing agents without any limitation may include urea, guanidine hydrochloride, etc.
The terms “protein reduction” or “antibody reduction” or “immunoglobulin reduction” or “reduction” refers to reduction of disulphide bonds between the cysteine residues in order to separate the individual polypeptide chains. A variety of exemplary reducing agents without any limitation may include dithiothreitol, tris [2-carboxyethyl] phosphine (TCEP) etc.
“Protein alkylation” or “Antibody alkylation” or “immunoglobulin alkylation” or “alkylation” refers to modifying the free sulfhydryl group on cysteine residues (after the reduction step) to prevent the re-formation of the disulphide bonds. A variety of exemplary reducing agents without any limitation may include 2-Iodoacetamide, 2-Iodoacetic acid, N-Ethylmaleimide (NEM).
As used herein, the term “peptide” refers to a class of compounds of low molecular weight which yield two or more amino acids on hydrolysis and form the constituent parts of proteins.
“Reverse phase chromatography” is a chromatographic technique wherein mobile phase solute (e.g. proteins/peptides etc.) binds to an immobilized n-alkyl hydrocarbon or aromatic ligand via hydrophobic interaction. The biomolecules are then generally eluted using gradient elution instead of isocratic elution. While biomolecules are strongly adsorbed to the surface of a reversed phase matrix under aqueous/relatively less organic conditions, they desorb from the matrix within a very narrow window of organic/ relatively increased organic modifier concentration. Since biomolecules would vary in terms of their hydrophobicity, it is an efficient technique to separate biomolecules by using gradient of organic modifier and thus pattern their separation
Mass spectrometry is an analytical technique that is used to identify unknown compounds, quantify known materials, and elucidate the structural and physical properties of ions. Mass Spectrometry can be used in conjunction with chromatography techniques, such as LC-MS and GC-MS. Examples of mass spectrometry tools for use as detection agents include, but are not limited to, electron ionisation (EI), chemical ionisation (CI), fast atom bombardment (FAB)/liquid secondary ionisation (LSIMS), matrix assisted laser desorption ionisation (MALDI), and electrospray ionisation (ESI). See, for example, Gary Siuzdak, Mass Spectrometry for Biotechnology, Academic Press, San Diego, 1996.
High definition data dependent acquisition (HD-DDA) is a novel MS method which incorporates instrumental and application benefits for the identification of proteins and peptides, where ion mobility spectrometry is incorporated into a quadrupole time-of-flight mass spectrometer. HD-DDA uses a high duty cycle mode and enhanced decision making to provide a highly sensitive and selective/specific experiment. HD-DDA enhancements include full support for Wideband Enhancement, which affords a signal increase of five- to ten-fold as well as enhanced decision making logic when switching between MS and MS/MS modes. Wideband Enhancement utilizes ion mobility separation of product ions of a single charge state in combination with pusher synchronization to achieve nearly 100% duty cycle. Spectral quality for low abundance species/peptides is significantly increased by HD-DDA’s method. The percentage of MS/MS spectra generating a positive match is dramatically increased using HD-DDA data.
Specific embodiments of the invention are more fully defined by reference to the following examples. These examples should not, however, be construed as limiting the scope of the invention.
Examples:
Example 1:
A human anti-TNF a monoclonal antibody, adalimumab available as Humira® (here after mentioned as RMP-(reference medicinal product)) from the innovator Abbvie was procured for amino acid sequencing. The concentration of the RMP obtained was 50 mg/ml, post which the RMP was mixed with denaturation buffer (8.2 M Guandinum HCl, 1 mM EDTA and 0.1 M Tris, pH 7.5) to get final concentration of the protein to 1 mg/ml. After mixing, the sample was kept at room temperature for few minutes. Post which, the denatured sample was reduced by addition of 5 mM DTT and incubated at 37 °C for 10 minutes to reduce inter-chain disulfide bonds to produce HC and LC molecules. The reduced protein sample was alkylated by addition of 10 mM concentration of iodoacetamide and incubated at room temperature for 40 minutes. Further, the sample cleanup was performed using PD10 cartridges to remove salts, excipients, buffer components and denaturing agents. The cleaned up sample was treated with 2 µl of PNGase F and incubated at 37 ? for 4 hours to remove N-glycans (to improve MS/MS fragmentation of N-linked glycopeptides). This sample was further aliquoted for separate proteolytic digestion reactions with trypsin, Asp-N, Glu-C or elastase as shown in Table 1. All proteolytic enzymes were prepared using a single digestion buffer (1M Urea, I mM EDTA, 20 mM Hydroxyl ammonium chloride and 0.1 M Tris, and pH is 7.5).

Enzyme reaction Enzyme:Protein Digestion condition
Proteolytic digestion with Trypsin 1 µg : 50 µg 37°C, 4 h
Proteolytic digestion with Glu-C 1 µg : 50 µg 37°C, 4 h
Proteolytic digestion with Asp-N 1 µg : 50 µg 37°C, 4 h
Proteolytic digestion with Elastase 1 µg : 50 µg 37°C, ½ h and 4 h
Table 1: Enzymatic digestion conditions used for RMP sequencing

Post incubation with multiple proteases, each reaction mixture of the protease was subjected to RP-UPLC using 2.1 mm X 150 mm Peptide BEH C18 Column 1.7 µm particle size, 300 Å pore size (Waters ACQUITY UPLC H Class Bio) . The operating parameters and the mobile phase gradient used during reverse phase chromatography are provided in Table 2 and Table3 respectively. The eluate from RP-UPLC was then subjected to MS using Synapt G2-Si HDMS instrument. The critical parameters for mass spectrometer are given in Table 4. A new HD-DDA method was employed to get high quality mass spectral data in terms of sensitivity and selectivity (specificity) required for reliable RMP amino acid sequencing. The MS/MS based amino acid sequence coverage of adalimumab light chain and heavy chain by proteolytic enzymes trypsin, Glu-C, Asp-N and elastase are mentioned in Figure 1 (a) and (b). Expected sequence mentioned in Figure 1 (a) and (b) are from publically available databases such as Drug bank, IMGT and Pharmaceutical and Medical Devices Agency (PMDA).
Parameter name Value/ranges
Column Temperature 60 °C.
Injection volume 20 µL
Flow rate 0.3 mL/min
Detection wave length 214 nm and 280 nm
Mobile phase A Water
Mobile phase B Acetonitrile
Mobile phase C 1.0 % TFA in water
Table 2 Operating parameters for reverse phase UPLC.
Time (min) Flow rate (mL/min) % Solvent A % Solvent B % Solvent C
0.0 0.3 87.0 3.0 10.0
1.0 0.3 87.0 3.0 10.0
16.0 0.2 78.0 12.0 10.0
32.0 0.3 70.0 20.0 10.0
61.0 0.3 50.0 40.0 10.0
64.0 0.3 10.0 80.0 10.0
68.0 0.3 10.0 80.0 10.0
68.2 0.3 87.0 3.0 10.0
75.0 0.3 87.0 3.0 10.0
Table 3 Mobile phase gradient used for reverse phase chromatography

MS method parameters Optimal value Range
Capillary voltage (kV) 3.0 0.0-4.0
Collision energy (V): Low energy (Trap and/or Transfer) 2 to 6 0-200
Collision energy (V): High energy (Trap and/or Transfer) 15-60 0-200
Cone Gas Flow (L/Hr) 0.0 0-100
Data acquisition time 0-75 As per requirement
Desolvation gas flow (L/h) 500.0 100-1200
Desolvation temperature (°C) 300.0 100-600
Mass range 50-1995 50-5000
Nebuliser Gas Flow (Bar) 6.5 0-100
Resolution 20000.0 Minimum 15000
Sample Cone voltage (V) 25 0-150
Source offset (V) 30.0 0-100
Source temperature (°C) 120.0 80-150
Table 4: MS method operating parameters

Figure 1 (a): MS/MS based aminoacid sequence coverage of Adalimumab light chain by proteolytic enzymes trypsin, Glu-C, Asp-N and Elastase.

Figure 2 (b): MS/MS based aminoacid sequence coverage of Adalimumab heavy chain by proteolytic enzymes trypsin, Glu-C, Asp-N and Elastase.

Figure 1 (a) and (b) shows overlapping amino acid sequence coverage of adalimumab RMP sample by proteolytic enzymes trypsin, Glu-C, Asp-N and elastase digestion. Major part of adalimumab protein sequence was identified and confirmed at MS/MS or aminoacid level using trypsin and high definition DDA MS methods. However there were certain regions of sequences missed at MS/MS level by trypsin and to confirm these sequences Asp-N, Glu-C or/and elastase digests are utilized. Use of multiple proteases in the experiment helps in achieving 100% amino acid sequence coverage of a protein.
,CLAIMS:We Claim:
1). A method of identifying a protein sequence using Mass Spectrometry (MS) technique, comprising the steps of;
a) denaturation of the protein
b) reduction and alkylation of the protein sample
c) proteolytic digestion of the protein with multiple (at least 3) proteases, wherein, aliquots of the protein sample is incubated separately with separate protease for 4 h for generating fragments of protein
d) subjecting the fragments to liquid chromatography followed by
e) ionization of the fragments in MS/MS
f) analysis of the fragments, using High Definition Data Dependent Acquisition (HD-DDA) method
g) Identification of the amino acids of the protein by HD-DDA method
wherein, the method does not assign any amino acid to fill in the missing peptide fragment masses, in identifying the protein sequence.
2). The method as claimed in claim 1, wherein the protein is a glycoprotein which includes antibody.
3). The antibody as claimed in claim 2, is a therapeutic antibody and is selected from the group consisting of anti-TNF-a antibody, anti-CTLA4 antibody, anti-PD1 antibody, anti-PDL1 antibody, anti-Her2 antibody, anti-IL6R antibody, anti-VEGFR antibody, anti-IL17A antibody, Anti-a4ß7 antibody, and anti-IgE antibody.
4). The method as claimed in claim 1, wherein the denaturation of the protein is performed using urea or guanidinium hydrochloride.
5). The method as claimed in claim 1, wherein the denatured protein is further reduced using dithiothretriol (DTT).
6). The method as claimed in claim 1, wherein the alkylation step is done using iodoacetamide.
7). The method as claimed in claim 1, wherein the proteases used to digest the protein sample are trypsin, elastase, Asp-N, Glu-C, thermolysin, chymotrypsin and Lys-C.
8). The method as claimed in claim 1, wherein the proteolytic digestion of the protein is performed using a single digestion buffer and single digestion condition for the multiple proteases, employed in step c) of claim 1.
9). The single digestion buffer as claimed in claim 8, comprises 1M Urea, I mM EDTA, 20 mM Hydroxyl ammonium chloride and 0.1 M Tris, with a pH value of 7.5.
10). The method as claimed in claim 1, wherein the liquid chromatography technique of step d) of claim 1, is a reverse phase chromatography.

Documents

Application Documents

#	Name	Date
1	201941009557-STATEMENT OF UNDERTAKING (FORM 3) [12-03-2019(online)].pdf	2019-03-12
2	201941009557-PROVISIONAL SPECIFICATION [12-03-2019(online)].pdf	2019-03-12
3	201941009557-FORM 1 [12-03-2019(online)].pdf	2019-03-12
4	Form3_After Filling_21-03-2019.pdf	2019-03-21
5	Form2 Title Page_Provisional_21-03-2019.pdf	2019-03-21
6	Form1_After Filling_21-03-2019.pdf	2019-03-21
7	Description Provisional_After Filling_21-03-2019.pdf	2019-03-21
8	Correspondence by Applicant_F1, F2 and F3_21-03-2019.pdf	2019-03-21
9	201941009557-ENDORSEMENT BY INVENTORS [11-03-2020(online)].pdf	2020-03-11
10	201941009557-CORRESPONDENCE-OTHERS [11-03-2020(online)].pdf	2020-03-11
11	201941009557-COMPLETE SPECIFICATION [11-03-2020(online)].pdf	2020-03-11
12	201941009557-FORM 18 [10-03-2023(online)].pdf	2023-03-10