Abstract: The present disclosure provides a method and system reliant on artificial intelligence and data analytics, wherein the said method and system are adapted for application in the field of protein chemistry and solubility. More particularly, the present disclosure provides a method and system for determining enzyme(s) that enhance the solubility of protein(s). The method of the present disclosure allows determination of enzyme(s) suitable for enhancing solubility of a protein while ensuring high accuracy and low-turn around time.
FORM 2
THE PATENTS ACT 1970
[39 OF 1970]
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See section 10 and Rule 13]
“METHOD AND SYSTEM FOR DETERMINING ENZYME(S) SUITABILITY FOR FACILITATING PROTEIN SOLUBILIZATION”
Name and Address of the Applicant: RELIANCE INDUSTRIES LIMITED
3rd Floor, Maker Chamber-IV, 222, Nariman Point, Mumbai - 400 021, Maharashtra, India
Nationality: India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The present disclosure generally relates to the field of artificial intelligence and data analytics. The present disclosure provides a method and system reliant on artificial intelligence and data analytics, wherein the said method and system are adapted for application in the field of protein chemistry and solubility. More particularly, the present disclosure provides a method and system for determining enzyme(s) that enhance the solubility of protein(s).
BACKGROUND
[002] Proteins are macromolecules that are essential to living organisms and carry out or are associated with many functions within organisms, including, for example, catalysing metabolic reactions, facilitating DNA replication, responding to stimuli, providing structure to cells and tissue, and transporting molecules. Proteins are made of one or more chains of amino acids and typically form three-dimensional conformations.
[003] Protein solvation is a major issue in use of protein based products. This problem is not unique and can be dated back to soy protein introduction to the market. One of the solutions to this problem is to digest protein beforehand using various chemical and enzymatic methods. Numerous experiments have been performed to find the best and economically viable method for soy protein digestion to improve its solubility.
[004] However, determining a suitable enzyme manually is time, cost, and energy intensive. The need of the hour is a method that is improved over the traditional methods, reducing total number of experimentations by providing scientifically explainable methods required to achieve desired protein solvation. While machine learning based methods for predicting protein structure and solubility have been previously reported, these methods have been restricted to prediction of a particular protein property, without providing any leads to enhance such properties, more so by reliance on external agents such as enzymes.
[005] The information disclosed in this background of the disclosure section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
SUMMARY
[006] Addressing the above identified need in the art, the present disclosure provides a method
of determining enzyme(s) suitability for facilitating protein solubilization, the method
comprising:
receiving, by a solubility determination system, data on a protein;
determining, by the solubility determination system, a solubility score of the protein based on
feature(s) of the protein extracted from the data using a protein solvation identification model,
wherein the protein solvation identification model is trained by correlating training feature(s)
derived from a protein repository dataset with a corresponding solubility score;
determining, by the solubility determination system, a solubility score of each peptide of a
plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s)
using the protein solvation identification model; and
determining, by the solubility determination system, the suitability of the one or more
enzyme(s) for protein solubilization based on a comparison of the solubility score of the
plurality of peptides with the solubility score of the protein.
[007] In some embodiments, the data on the protein is selected from structural data of the protein and/or composition of the protein(s); wherein the structural data is selected from a group comprising primary structure, secondary structure, tertiary structure and quaternary structure of the protein or any combination thereof. In some embodiments, the feature(s) of the protein comprise one-dimensional, two-dimensional or three-dimensional properties of the complete protein or any part thereof, or any combination thereof.
[008] In some embodiments, the suitability of the one or more enzyme(s) for protein solubilization is determined when the solubility score of the plurality of peptide(s) is higher than the solubility score of the protein.
[009] In some embodiments, the method of the present disclosure further comprises predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
[0010] In some embodiments, the method of the present disclosure further comprises determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
[0011] In some embodiments, the method of the present disclosure further comprises ranking the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on the determined solubility score of the plurality of peptides.
[0012] In some embodiments, the method of the present disclosure further comprises predicting a group of enzyme(s) for facilitating solubilization of the protein, and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
[0013] In some embodiments, the method of the present disclosure further comprises determining an order of the one or more enzyme(s) for the protein solubilization using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
[0014] In some embodiments, the method of the present disclosure further comprises re-training of the protein solvation identification model based on the determined enzyme suitability and the ranking.
[0015] In some embodiments, the method of the present disclosure comprises:
receiving, by the solubility determination system, the data on a protein;
determining, by the solubility determination system, a solubility score of the protein based on
feature(s) of the protein extracted from the data using the protein solvation identification
model;
determining, by the solubility determination system, a solubility score of each peptide of a
plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s)
using the protein solvation identification model, wherein the one or more enzyme(s) are
selected based on the ranking with respect to their suitability for facilitating solubilization of
previously encountered protein(s) characterized by feature(s) same or similar to the extracted
feature(s) ; and
[0016] determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
[0017] In some embodiments, the method of the present disclosure comprises: receiving, by a solubility determination system, data on a protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and additionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset;
determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s); and predicting a group of enzyme(s) for facilitating solubilization of the protein(s), and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
[0018] In some embodiments, further provided in the present disclosure is a solubility determination system (201) for determining enzyme suitability for facilitating protein solubilization, comprising: a processor (211); and
a memory (209) communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to:
receive, by a solubility determination system, data on a protein;
determine a solubility score of the protein based on feature(s) of the protein extracted from
the data using a protein solvation identification model (213), wherein the protein solvation
identification model is trained by correlating training feature(s) derived from a protein
repository dataset (203) with a corresponding solubility score;
determine a solubility score of each peptide of a plurality of peptides generated by in-silico
digestion of the protein by one or more enzyme(s) using the protein solvation identification
model; and
determine the suitability of the one or more enzyme(s) for protein solubilization based on
a comparison of the solubility score of the peptide(s) with the solubility score of the
protein.
[0019] In some embodiments, the processer (211) receives the data on the protein selected from a group comprising primary structure, secondary structure, tertiary structure and quaternary structure of the protein or any combination thereof.
[0020] In some embodiments, the feature(s) of the protein comprise one-dimensional, two-dimensional or three-dimensional properties of the protein or any part thereof, or any combination thereof.
[0021] In some embodiments, the processor (211) determines suitability of the one or more enzyme(s) for protein solubilization when the solubility score of the peptide(s) is higher than the solubility score of the protein.
[0022] In some embodiments, the processor (211) predicts probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a prediction model; wherein the peptide prediction model is trained using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
[0023] In some embodiments, the processor (211) determines total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
[0024] In some embodiments, the processor (211) ranks the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on the determined solubility score of the plurality of peptides.
[0025] In some embodiments, the processor (211) predicts a group of enzyme(s) for facilitating solubilization of the protein, and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
[0026] In some embodiments, the processer (211) re-trains the protein solvation identification model (213) based on the determined enzyme suitability and the ranking.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
[0027] The novel features and characteristics of the disclosure are set forth in the appended claims. The disclosure itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying figures. One or more embodiments are now described, by way of example only, with reference to the accompanying figures wherein like reference numerals represent like elements and in which:
[0028] Figure 1 depicts a block diagram of the method of determining enzyme(s) suitability for facilitating protein solubilization, in accordance with embodiments of the present disclosure.
[0029] Figure 2 depicts a block diagram of the system for determining enzyme(s) suitability for facilitating protein solubilization, in accordance with embodiments of the present disclosure.
[0030] Figure 3 depicts an exemplary flow-diagram of the steps that characterize the method of the present disclosure.
[0031] Figure 4 depicts an exemplary graph demonstrating validation of the output derived from the system of the present disclosure with respect to enzymes suitable for enhancement of algal protein solubility, by protein digestion and solubility analysis in real time.
[0032] Figure 5 depicts provides an exemplary representation of the architecture of system of the present disclosure.
[0033] Figure 6 provides a schematic representation of the relationship between modules for predicting enzyme or enzyme combination for maximizing solubility of the protein(s) based on amino acid sequences.
[0034] It should be appreciated by those skilled in the art that any block diagram herein represents conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
Definitions
[0035] The phrase “facilitating solubilization” or equivalent phrases bearing the same meaning in the context of the present disclosure are intended to refer to enhancement of solubility of a protein.
[0036] The term “solubility” refers to the tangible value assigned to the tendency for a protein to dissolve in a solvent. The solubility score is a value out of “1”. Thus, the highest solubility score for a protein is 1 till its saturation point. In an embodiment, unless otherwise defined, “solubility” refers to the solubility of the protein or peptide in water. However, it is within the ambit of the present disclosure that the solubility of a protein or peptide in any solvent of interest may be determined by the same method and system as defined herein, wherein the datasets accessed may, therefore, additionally include data on solubility of the protein or peptide in the said solvent of interest.
[0037] As used herein, the term “cleavage potency” refers to the propensity of enzyme digestion at a particular site.
[0038] As used herein, the term “artificial intelligence” generally refers to machines or computers that can perform tasks in a manner that is “intelligent” or non-repetitive or rote or pre-programmed.
[0039] As used herein, the term “neural net” refers to an artificial neural network. An artificial neural network has the general structure of an interconnected group of nodes. The nodes are often organized into a plurality of layers in which each layer comprises one or more nodes. Signals can propagate through the neural network from one layer to the next.
[0040] As used herein, the term “trained” or “training” in the context of a model in the present disclosure refers to at least one model trained on at least one data set. Examples of models can be linear models, transformers, or neural networks such as convolutional neural networks (CNNs), Bayesian Neural Networks (BNN), and the like. In some embodiments, the model is trained on one or more of the data sets.
[0041] As used herein, the term “machine learning” refers to a type of learning in which the machine (e.g., computer program) can learn on its own without being programmed.
[0042] As used herein, the term “peptide prediction model” refers to a machine learning model that predicts the formation of peptides, preferably along with the probability of formation of the said peptides from an input protein or protein consortium, using one or more enzymes derived from an enzyme repository data set.
[0043] As used herein, the term “protein solvation identification model” refers to a machine learning model that determines enzyme(s) suitable for facilitating solubilization of protein(s) based on a comparison of solubility score of the protein(s) and the peptide(s) formed therefrom.
[0044] As used herein, the term “enzyme grouping model” refers to a model capable of predicting ratio of each enzyme in the group of enzymes determined to be suitable for improving the solubility of a particular protein or protein consortium.
[0045] A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention.
[0046] When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place
of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.
[0047] The term "about" is used herein to mean approximately, in the region of, roughly, or around. When the term "about" is used in conjunction with a numerical value/range, it modifies that value/range by extending the boundaries above and below the numerical value(s) set forth. As used herein, the term “about” a number refers to that number plus or minus 25%, plus or minus 15% or plus or minus 10 % of that number.
[0048] The term “exemplary” is used herein to mean serving as an example, instance, or illustration. Any embodiment or implementation of the present subject matter described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
[0049] Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising” wherever used, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
[0050] With respect to the use of any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
[0051] As regards the embodiments characterized in this specification, it is intended that each embodiment be read independently as well as in combination with another embodiment. For example, in case of an embodiment 1 reciting 3 alternatives A, B and C, an embodiment 2 reciting 3 alternatives D, E and F and an embodiment 3 reciting 3 alternatives G, H and I, it is to be understood that the specification unambiguously discloses embodiments corresponding to combinations A, D, G; A, D, H; A, D, I; A, E, G; A, E, H; A, E, I; A, F, G; A, F, H; A, F, I;
B, D, G; B, D, H; B, D, I; B, E, G; B, E, H; B, E, I; B, F, G; B, F, H; B, F, I; C, D, G; C, D, H;
C, D, I; C, E, G; C, E, H; C, E, I; C, F, G; C, F, H; C, F, I, unless specifically mentioned
otherwise.
Disclosure
Method
[0052] Addressing the need in the art with respect to determination of enzyme(s) suitable for enhancement of solubility specific proteins as elucidated above, the present disclosure provides an artificial intelligence (AI) reliant method, addressing the need to find means to enhance the solubility of proteins or protein-based products. Particularly, the present disclosure provides a method for determining the suitability of specific enzymes or combinations thereof for improving the solubility of individual proteins or a protein consortium.
[0053] The devices, software, systems, and methods described herein leverage the capabilities of artificial intelligence or machine learning techniques for polypeptide or protein analysis to make predictions with respect to protein susceptibility to digestion by one or more enzyme(s) and/or, the impact of the said one or more enzyme(s) in enhancing protein solubility. Machine learning techniques enable the generation of models with increased predictive ability compared to standard non-ML approaches.
[0054] Particularly provided in the present disclosure is a method of determining enzyme(s) suitability for facilitating protein solubilization, the method comprising: receiving, by a solubility determination system, data on a protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; and determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
[0055] In some embodiments, the data on the protein is selected from structural data of the protein and/or composition of the protein.
[0056] In some embodiments, structural data of the protein is selected from primary, secondary, tertiary and quaternary structure of the protein or any combination thereof.
[0057] Without intending to be limited by theory, primary structure of a protein may include whole and partial amino acid sequence for the given protein. Secondary structure may include a designation of whether an amino acid or a sequence of amino acids in a polypeptide is predicted to have an alpha helical structure, a beta sheet structure, or a disordered or loop structure. Tertiary structure may include the location or positioning of amino acids or portions of the polypeptide in three-dimensional space. Quaternary structure may include the location or positioning of multiple polypeptides forming a single protein.
[0058] In some embodiments, the data on the protein is the protein sequence i.e., primary structure of the protein.
[0059] In some embodiments, the data on the protein composition refers to the definition of the components of a protein, for instance, in case of a complex protein or protein consortium, optionally along with ratio of the respective components in the protein or protein consortium. A non-limiting example of a protein consortium includes algal protein that is composed of a consortium of more than 3000 proteins usually derived after cell lysis and or total protein extraction.
[0060] In some embodiments, described herein is a method for identifying a previously unknown association between a protein sequence and one or more enzyme(s) suitable for enhancing solubility of the protein represented by the said sequence, the method comprising: receiving, by a solubility determination system, the protein sequence; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the protein sequence using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score;
determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; and
determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
[0061] In some embodiments, the protein is a single protein or is a consortium of a plurality of proteins.
[0062] In some embodiments, the identity of the protein is known or unknown.
[0063] In some embodiments, in case of an unknown protein, the identity of the protein is derived from a protein solvation identification model, wherein the protein solvation identification model performs:
extraction of features from the received data on the protein; and
determination of solubility score of the protein based on a correlation between the
extracted features and the solubility score derived from a protein repository dataset.
[0064] As mentioned in the above embodiments, the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score.
[0065] In some embodiments, the protein repository dataset comprises data on a plurality of proteins collected from different sources. In some embodiments, the protein repository dataset is a static or dynamic dataset. In some embodiments, the protein repository dataset is accessed from a database wherein the database may be internet-based, web-based, cloud computing-based or based on one or more local computer storage devices.
[0066] In some embodiments, the protein repository dataset comprises data on a plurality of proteins including a plurality of classes of proteins. Non-limiting examples of classes of proteins include but are not limited to structural proteins, contractile proteins, storage proteins, defensive proteins (e.g., antibodies), transport proteins, signal proteins, plant proteins, animal proteins and microbial proteins. In some embodiments, the classes of proteins may include proteins having amino acid sequences sharing one or more functional and/or structural similarities.
[0067] In some embodiments, the protein repository dataset comprises data such as but not limited to protein ID, source organism, amino acid sequence, EC number and solubility score
of each of the plurality of proteins, optionally along with data on one or more of the secondary protein structure, and a tertiary protein structure of each of the plurality of proteins.
[0068] In some embodiments, the protein repository dataset categorizes the plurality of proteins into at least three classes - highly soluble, partially soluble, and insoluble, based on the solubility score.
[0069] In some embodiments, the protein repository data set is pruned to remove redundant data on identical or similar proteins. In a non-limiting embodiment, redundancy may be removed based on biophysical properties, such as solubility, structural features, secondary or tertiary motifs, thermostability, and other features known in the art.
[0070] In a non-limiting embodiment, the redundant or highly similar protein data is removed from the protein repository data set based on a greedy incremental technique that classifies each sequence as a redundant or representative sequence based on its similarities to the existing representatives. In some embodiments, the protein repository data set is a database cluster, with a predefined tolerance threshold with respect to protein identity. For the purpose of exemplification, in a non-limiting embodiment, when the predefined threshold is about 0.9, it is implied that proteins having a sequence similarity of about 90% or higher may be removed from the data set to reduce redundancy in the protein repository data set.
[0071] In another non-limiting embodiment, the pruning of the protein repository dataset may rely on one or more data descriptor techniques to identify and merge unique features including but not limited to 1, 2 and 3-dimensional properties, biophysical properties, such as solubility, structural features, secondary or tertiary motifs, thermostability, and other features known in the art, taking forward sequences characterized by features with low similarity coefficient to form the protein repository dataset and the basis of training data set.
[0072] Techniques for such removal of redundancy from the protein repository dataset would be well known to a person skilled in the art, wherein suitable methods may be selected based on accessibility, size of the dataset and efficiency of the technique. Accordingly, alternative techniques, not restricted to the above provided example, may be relied upon for pruning the protein repository dataset.
[0073] In some embodiments, the above-described removal of redundancy provides a clustered data set containing low similarity coefficient based representatives of the plurality of protein sequences that form the protein repository dataset and form the basis of the training feature(s).
[0074] In some embodiments, to facilitate the above defined method, the protein solvation identification model is trained by correlating training feature(s) derived from the protein repository dataset with a corresponding solubility score. Said training, in some embodiments, is achieved by machine learning technique.
[0075] In a non-limiting embodiment, the machine learning type relied upon for training the protein solvation identification model includes but is not limited to methods that use deep neural networks. In some embodiments, the machine learning method utilizes a predictive model such as a neural network, a decision tree, a support vector machine, or other applicable model. In some embodiments, the machine learning method is selected from the group including a supervised, semi-supervised and unsupervised learning, such as, for example, a support vector machine (SVM), a Naïve Bayes classification, a random forest, a neural network, a decision tree, a K-means, learning vector quantization (LVQ), self-organizing map (SOM), graphical model, regression method (e.g., linear, logistic, multivariate, association rule learning, deep learning, dimensionality reduction and ensemble selection methods).
[0076] In some embodiments, the protein solvation identification model comprises a neural network selected from convolutional neural network and/or recurrent neural network. In some embodiments, the convolutional network may comprise architecture from VGG16, VGG19, Deep ResNet, Inception/GoogLeNet (V1-V4), Inception/GoogLeNet ResNet, Xception, AlexNet, LeNet, MobileNet, DenseNet, NASNet, and/or MobileNet.
[0077] In some embodiments, the machine learning method relies on a neural network comprising 3 or more layers.
[0078] In a preferred embodiment, the machine learning method relies on a neural network comprising 5 layers.
[0079] In an exemplary embodiment, the machine learning method relies on a neural network comprising 5 layers, wherein the 5 layers are individually selected from a group comprising input layers, flattening layer and dense layers. In some embodiments, each said layer comprises about 128 neurons. In some embodiments, the said neural network comprises an activation
function such as, but not limited to, rectified linear unit ReLU and Softmax. In some embodiments, the said neural network comprises an optimizer such as Adam. In some embodiments, loss calculation in such a neural network is based on Sparse Categorical Cross (SCC) entropy. Said architecture is depicted in Figure 5.
[0080] In some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization finds application in real time, in determining the enzyme(s) suitable for facilitating solubilization of a protein whose data is introduced as input into the system.
[0081] In some embodiments, the protein data provided as input to the system is converted into a set of features used as identifiers, based on which protein identity and/or solubility score is determined by the trained solubility determination system.
[0082] In some embodiments, the protein solvation identification model provides an accuracy score of about 75% to about 90% for the identification of the protein sample introduced as input into the system. In a non-limiting embodiment, said accuracy of the protein solvation identification model is based on the machine learning design of the present disclosure.
[0083] In some embodiments, the protein solvation identification model determines/predicts the category of protein (highly soluble, partially soluble and insoluble) along with solubility score of the proteins. Determination of solubility score in the method of the present disclosure, in addition to the category of protein in terms of solubility allows accurate determination of enzyme(s) that enhance solubility of the protein.
[0084] In some embodiments, the in-silico digestion is performed using one or more enzyme(s) using an enzyme repository dataset comprising data on a plurality of protease enzymes. The method of the present disclosure is thus characterized by reliance on two datasets, one comprising data on a plurality of proteins and the other comprising data on a plurality of protease enzymes.
[0085] In some embodiments, the enzyme repository dataset comprising data on a plurality of protease enzymes comprises data on the enzyme identity [for instance, enzyme name, enzyme class, enzyme structure/sequence, Enzyme Commission number (EC number)] and optionally, site of proteolysis and proteolytic efficiency or cleavage potency.
[0086] In some embodiments, the enzyme repository dataset comprising data on a plurality of protease enzymes is a static or dynamic dataset. In some embodiments, the enzyme repository comprising data on a plurality of protease enzymes is accessed from a database wherein, the said database may be internet-based, web-based, cloud computing-based or based on one or more local computer storage devices.
[0087] In some embodiments, the in-silico digestion of each protein introduced as input into the solubility determination system yields a plurality of peptides.
[0088] In some embodiments, the plurality of peptides is used as input for the protein solvation identification model and solubility of each of the plurality of peptides is determined by the same mechanism as that used for determining solubility of the protein.
[0089] In some embodiments, the in-silico digestion followed by determination of solubility of each peptide of the plurality of peptides arising from the digestion of the protein is repeated with all enzymes of the dataset comprising data on a plurality of protease enzymes.
[0090] In a non-limiting embodiment, the said repetition of the method is performed using a greedy technique to identify the enzyme that yields the most soluble peptides. While greedy technique has been referred to for the purpose of exemplification, a person skilled in the art would be equipped to use an alternative technique for achieving the same objective. Accordingly, the said reference to ‘greedy technique’ is not intended to be restrictive in any manner.
[0091] In some embodiments, the suitability of the one or more enzyme(s) for protein solubilization is determined when the solubility score of the plurality of peptide(s) is higher than the solubility score of the protein. This scheme is depicted in the in-silico experimental work flow provided in Figure 3.
[0092] In some embodiments, the suitability of the one or more enzyme(s) for protein solubilization is determined when average of the solubility score of the plurality of peptides is higher than the solubility score of the protein.
[0093] In some embodiments, the method further comprises predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model;
wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from the enzyme repository dataset.
[0094] Accordingly, in some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization comprises: receiving, by a solubility determination system, data on a protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and additionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset.
[0095] In some embodiments, the peptide prediction model may be trained using Bayesian Neural Networks (BNN).
[0096] In some embodiments, the method of the present disclosure further comprises determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation the said peptide(s). In an embodiment, the function may include multiplication. Without intending to be limited by theory, the said multiplication may involve multiplying the amount (in %) of a protein a consortium with the peptide formation probability to get the peptide composition in the plurality of peptide(s). In a non-limiting example if % of protein in a consortium is about 30% and probability of formation of peptide is about 0.1% then the % availability of peptide would be about 30%*0.1 i.e., about 3%
[0097] Without intending to be limited by theory, the information on the composition of the proteins(s) may be received as a manual input into the system by the user or as an input from the protein repository dataset and comprises information with regard to the ratio of each protein in a protein consortium or ratio of each protein chain in a complex protein.
[0098] Accordingly, in some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization comprises: receiving, by a solubility determination system, data on a protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and optionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset; and
determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
[0099] Therefore, in some embodiments, envisaged herein is a method of predicting probability of formation of peptide(s) from a protein or protein consortium and determining total availability of each such peptide, said method comprising: receiving, by a solubility determination system, data on protein(s); determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score;
determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and additionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset; and
determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
[00100] In some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization, as defined above, further comprises ranking the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on the determined solubility score of the plurality of peptides.
[00101] In some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization further comprises ranking the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on an average of the determined solubility score of the plurality of peptides.
[00102] In some embodiments, the method of the present disclosure, based on the aforesaid ranking, suggests the suitability of one or more enzyme(s) for a particular protein or protein consortium. In other words, the method of the present disclosure allows optimization of the choice of enzyme or combination of enzymes for a particular protein or protein consortium. Without intending to be limited by theory, the method of the present disclosure determines the optimum enzyme or combination of enzymes for a particular protein or protein consortium. In some embodiments, a single or a combination of enzymes may be determined to be suitable for a single protein, wherein the protein may be a simple or a complex protein. In some embodiments, a single or a combination of enzymes may be determined to be suitable for a protein consortium, wherein the protein consortium may comprise proteins having similar or different solvation behaviour.
[00103] In an exemplary embodiment, the method of the present disclosure determines or confirms suitability of a combination of enzymes to improve the solubility of a particular protein or protein consortium.
[00104] In some embodiments, the method of the present disclosure further comprises predicting a group of enzyme(s) for facilitating solubilization of the protein(s), and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
[00105] In some embodiments, the method of the present disclosure further comprises predicting a group of enzyme(s) and ratio of each enzyme in the group of enzymes, for facilitating solubilization of the protein based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using the enzyme grouping model.
[00106] Without intending to be limited by theory, the method of the present disclosure may, in addition to predicting suitability of an enzyme or a combination of enzymes for cleavage of a particular protein or protein consortium, also predict the ratio in which the said enzyme(s) must be employed to maximize the solubility of the protein(s). Said prediction, in some embodiments, is determined by an enzyme grouping model which is trained based on the total availability of each peptide of the plurality of peptide(s) yielded by a particular enzyme and the solubility score of each peptide of a plurality of peptides yielded by the said enzyme. In a non-limiting embodiment, said prediction is facilitated by an optimizer function or a fitness function of a genetic algorithm that optimizes the parameter of solubility based on the choice of enzymes(s) for a particular protein. Other non-limiting examples of such optimizers include, but are not limited to, Artificial Neural Network (ANN) based swarm optimization, gradient descent, and the like. Particularly, the optimizer function may optimize solubility of the protein based on the total availability of each peptide of the plurality of peptide(s) yielded and the solubility score of each peptide of a plurality of peptides associated with a combination of each of the one or more enzymes.
[00107] In some embodiments, when the input to the solubility determination system is a single protein, the method may predict a single enzyme or a combination of enzymes for cleavage, thus solubility enhancement of the single protein.
[00108] In some embodiments, when the input to the solubility determination system is a protein consortium or a complex protein, the method may predict a combination of enzymes and optionally, ratio of each enzyme in the said combination of enzymes for cleavage thus solubility enhancement of the said protein(s).
[00109] Therefore, in some embodiments, when the input to the solubility determination system comprises data on a protein consortium, the system, based the proportion of each protein in the protein consortium used as input into the system, determines the ratio of respective enzymes determined to be suitable for each protein in the protein consortium in the combination of enzymes determined to be suitable for the digestion of the protein consortium to enhance solubility of the protein consortium.
[00110] Accordingly, in some embodiments, provided herein is a method of determining a group of enzyme(s) for facilitating solubilization of a protein or protein consortium comprising comprises:
receiving, by a solubility determination system, data on the protein(s); determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and additionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset;
determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s); and
predicting a group of enzyme(s) for facilitating solubilization of the protein(s), and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
[00111] Without intending to be limited by theory, the present disclosure provides an in-silico protein digestion based method wherein a peptide prediction model can provide the peptides formed from digesting a protein utilizing enzyme(s) from an enzyme repository data set, along with probability of formation of the said peptides. These peptide sequences are utilized by a pre-trained protein solvation identification model which computes the solubility for each peptide. This information is provided to the enzyme grouping model which maximizes solubility of the protein by grouping most suited enzymes for the given protein or mixture of protein. The enzyme grouping model may predict the proportion of each enzyme to be utilized to get the maximum possible solubility of a protein or protein consortium based on the input on composition of the protein or protein consortium along with the solubility score of each peptide of the plurality of peptides as derived from the protein solvation identification model and the total availability of each peptide of the plurality of peptides that is determined based on a function of the composition of the protein(s) and the probability of formation of each peptide of the plurality of peptides.
[00112] In some embodiments, the method of determining enzyme(s) suitability for facilitating protein solubilization further comprises re-training of the protein solvation identification model based on the determined enzyme suitability and the ranking. In view of the same, in some embodiments, the protein solvation identification model is configured to comprise a feedback loop whereby the feature(s) of the protein and the enzyme suitability output are utilized to re¬train the protein solvation identification model.
[00113] In some embodiments, the said re-training of the protein solvation identification model is based on a feedback mechanism based on the determined enzyme suitability and the ranking i.e. the output of the solubility determination system or the method employing the same. The said feedback mechanism ensures continuous learning and accurate output along with a reduction in run time.
[00114] The feedback mechanism allows the solubility determination system to specifically choose enzyme(s) from the dataset comprising data on the plurality of protease enzymes which
were determined to be suitable and ranked relatively high with respect to their suitability for facilitating solubilization of previously encountered protein(s) having same or similar features(s) as that extracted from the protein data inputted into the system, for the in-silico digestion of the said protein.
[00115] Thus, in some embodiments, the method of determining enzyme(s) suitability for
facilitating protein solubilization comprises: receiving, by the solubility determination system, data on the protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using the protein solvation identification model;
determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model, wherein the one or more enzyme(s) are selected based on the ranking with respect to their suitability for facilitating solubilization of previously encountered protein(s) characterized by feature(s) same or similar to the extracted feature(s); and
determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
[00116] In some embodiments, in view of the aforesaid feedback mechanism, the protein solvation identification model employed in the method of determining enzyme(s) suitability for facilitating protein solubilization is trained to correlate the protein feature(s) extracted from protein data used as input to the system and the determined enzyme suitability for a protein characterized by the said feature(s).The said correlation between the protein feature(s) extracted from protein data used as input to the system and the determined enzyme suitability for a protein characterized by the said feature(s) calls for training the protein solvation identification model using data from the dataset comprising data on a plurality of protease enzymes in addition to data from the protein repository dataset.
[00117] In a non-limiting embodiment, the method of the present disclosure provides the order in which each enzyme in a combination of enzymes determined to be suitable for protein digestion may be used to enhance protein solubility. In some embodiments, the order is determined using a knowledge based expert system that provides said prediction based on
information on the site of protein digestion for the respective protease enzymes as present in the enzyme repository dataset and protein structural data provided as input.
[00118] Figure 1 shows a flowchart illustrating method of the present disclosure.
[00119] As illustrated in Figure 1, the method may include one or more blocks illustrating the method for determining enzyme(s) suitability for facilitating protein solubilization. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform specific functions or implement specific abstract data types.
[00120] The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
[00121] At block 101, the method includes receiving, by a solubility determination system, data on a protein or protein structure, wherein the data may be received from any source. In a non-limiting embodiment, the data may be manually inputted by a user or be received from an automated source or server. In a non-limiting embodiment, the data on the protein may be selected from primary, secondary, or tertiary structure of the protein.
[00122] At block 103, the method includes extraction of feature(s) of the protein from the inputted data using a protein solvation identification model; and determination of the solubility score of the protein, based on its extracted features. In some embodiments, extracted feature(s) of the protein include but are not limited to one-dimensional, two-dimensional or three-dimensional properties of the complete protein or any part thereof, or any combination thereof. Said determination of the solubility score of the protein is performed using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score.
[00123] At block 105, the method includes in-silico digestion of the protein by one or more enzyme(s), wherein the one or more enzyme(s) are derived from a dataset comprising data on a plurality of protease enzymes. The said in-silico digestion yields peptide(s) which are used as input in the protein solvation identification model of block 103. This is followed by determination of the solubility score of the peptide(s), generated by digestion of the protein by the one or more enzyme(s), by the protein solvation identification model.
[00124] At block 107, the method includes determination of the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides or average of the solubility score of the plurality of peptides with the solubility score of the protein inputted into the solubility determination system. In some embodiments, the suitability of the one or more enzyme(s) for solubilization of the protein is determined when the solubility score of the plurality of peptide(s) is higher than the solubility score of the protein.
System
[00125] In order to facilitate the above method, the present disclosure further provides a solubility determination system for determining enzyme suitability for facilitating protein solubilization, said system comprising: a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to: receive, by a solubility determination system, data on a protein; determine a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score;
determine a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; and
determine the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the peptide(s) with the solubility score of the protein.
[00126] In some embodiments, envisaged herein is a system for determining enzyme suitability for facilitating protein solubilization and optionally, a group of enzyme(s) to facilitate solubilization of a protein or protein consortium, said system comprising: a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to: receive, by a solubility determination system, data on the protein(s); determine a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score;
determine a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model;
determine the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the peptide(s) with the solubility score of the protein; and optionally,
predict probability of formation of each peptide of the plurality of peptides during the in-silico digestion using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site and associated cleavage potency of the one or more enzyme(s) from an enzyme repository dataset; and predict a group of enzyme(s) for facilitating solubilization of the protein, and optionally, ratio of each enzyme in the group of enzyme(s), based on the solubility score of each peptide of the plurality of peptides and the total availability of the said peptide(s), using an enzyme grouping model.
[00127] In some envisaged herein is a system for determining a group of enzyme(s) to facilitate solubilization of a protein or protein consortium, said system comprising: a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to: receive, by a solubility determination system, data on a protein; determine a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification
model is trained by correlating training feature(s) derived from a protein repository dataset with
a corresponding solubility score;
determine a solubility score of each peptide of a plurality of peptides generated by in-silico
digestion of the protein by one or more enzyme(s) using the protein solvation identification
model;
determine the suitability of the one or more enzyme(s) for protein solubilization based on a
comparison of the solubility score of the peptide(s) with the solubility score of the protein; and
additionally,
predict probability of formation of each peptide of the plurality of peptides during the in-silico
digestion using a peptide prediction model; wherein the peptide prediction model is trained
using information on digestion site and associated cleavage potency of the one or more
enzyme(s) from an enzyme repository dataset; and
predict a group of enzyme(s) for facilitating solubilization of the protein, and optionally, ratio
of each enzyme in the group of enzyme(s), based on the solubility score of each peptide of the
plurality of peptides and the total availability of the said peptide(s), using an enzyme grouping
model.
[00128] The above system of the present disclosure is depicted in Figure 2.
[00129] Figure 2A illustrates an exemplary environment comprising a solubility determination system (201) and one or more sources (203) for determining the suitability of the one or more enzyme(s) for protein solubilization.
[00130] Figure 2B illustrates an exemplary environment comprising a solubility determination system (201) and one or more sources (203) for determining the suitability of the one or more enzyme(s) for protein solubilization.
[00131] The solubility determination system (201) of the present disclosure may receive data on protein(s) as input from one or more sources (203). In some embodiments, the solubility determination system (201) may be connected with the one or more sources (203) via a communication network (205) to receive such input. In some embodiments, the solubility determination system (201) may include an input device to receive the input. Non-limiting examples of input device may include, but are not limited to, a keyboard, a mouse, trackball, track pad, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen.
[00132] In some embodiments, the solubility determination system (201) comprises a machine learning model such as the protein solvation identification model (213) for determining enzyme suitability for facilitating protein solubilization based on input data such as a primary, secondary or tertiary structure data of a protein.
[00133] In some embodiments, as depicted in Figure 2B, the solubility determination system (201) comprises a peptide prediction model 21) for predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion; wherein the peptide prediction model 215 is trained using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
[00134] In some embodiments, as depicted in Figure 2B, the solubility determination system (201) comprises an enzyme grouping model (217) for predicting a group of enzyme(s) for facilitating solubilization of the protein, and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s).
[00135] In some embodiments, the solubility determination system (201) may be a computing device such as, but not limited to, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, notepad computers, set-top computers, handheld computers, internet appliances, mobile smartphones, tablet computers and personal digital assistants.
[00136] In some embodiments, the solubility determination system (201) may include an Input/Output (I/O) interface 207, a memory (209) and a processor (211), as shown in Figure 2A-2B. The I/O interface (207) may receive the input protein data from one or more sources (203).
[00137] In some embodiments, as per Figure 2A, the memory (209) may be communicatively coupled to the processor (211). The memory (209) stores instructions, executable by the processor (211), which, on execution, may cause the solubility determination system (201) to determine the suitability of the one or more enzyme(s) for protein solubilization, as disclosed in the present disclosure. When the system additionally comprises the peptide prediction model (215) and the enzyme grouping model ( 217), as depicted in Figure 2B, the processor (211) may execute instructions from the memory (209) causing the solubility determination system (201) to additionally predict probability of formation of each peptide of the plurality of peptides
during the in-silico digestion using the peptide prediction model (215). Further, the processor (211) may execute instructions and cause the solubility determination system (201) to predict a group of enzyme(s) for facilitating solubilization of the protein, and optionally, ratio of each enzyme in the group of enzyme(s) using an enzyme grouping model (217), as disclosed in the present disclosure.
[00138] In some embodiment, the processor (211) may comprise a plurality of processing units configured to generate the trained model(s). In some embodiments, the system comprises a number of smaller logical cores composed of arithmetic logic units (ALUs), control units, and memory caches when compared to central processing units (CPUs). In some embodiments, the system comprises one or more tensor processing units (TPUs), which are AI application-specific integrated circuits (ASIC) for neural network machine learning.
[00139] In some embodiments, the solubility determination system (201) comprises one or more hardware central processing units (CPUs) that carry out the functions of the system. The solubility determination system (201) further comprises an operating system configured to perform executable instructions. The system may be optionally connected to the Internet such that it accesses the World Wide Web. The solubility determination system (201) may be optionally connected to a cloud computing infrastructure.
[00140] In some embodiments, the solubility determination system (201) includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing.
[00141] The solubility determination system (201) of the present disclosure, in some embodiments, includes one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked system. In some embodiments, the computer readable storage medium is optionally
removable from the system. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
[00142] In some embodiments, the solubility determination system (201) is a computer based system. Such embodiments include a CPU including a processor and memory which may be in the form of a non-transitory computer readable storage medium. These system embodiments further include software that is typically stored in memory (such as in the form of a non-transitory computer readable storage medium) where the software is configured to cause the processor to carry out a function. Software embodiments incorporated into the systems described herein contain one or more modules.
[00143] In some embodiments, the solubility determination system (201) as described herein comprises a network element for communicating with a server. In some embodiments, the system is configured to upload to and/or download data from the server. In some embodiments, the server is configured to store input data, output, and/or other information. In some embodiments, the server is configured to backup data from the system or apparatus.
[00144] In some embodiments, the solubility determination system (201) includes or is operatively coupled to a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage or any combination thereof.
[00145] Figure 6 provides a schematic representation of the above method of the present disclosure. As depicted in Figure 6, in an exemplary embodiment, the depicted modules are pre-trained models. Module 1 shows processing at the peptide prediction model that relies on in-silico digestion of proteins to predict resulting peptides and also provide propensity/ probability of each peptide using enzyme(s); Module 2, shows processing at the pre-trained protein solvation identification model which may use output of the peptide prediction model and predict solubility of each peptide; Module 3, shows processing in the enzyme grouping
model, which processes and optimizes the output of the peptide prediction model and the protein solvation identification model to predict the combination of enzymes for maximizing solubility of the input protein(s) in Module 1.
[00146] Advantages of the embodiments of the present disclosure are illustrated herein-[00147] In an embodiment, the present disclosure helps in accurate identification of a protein, whose data is inputted into the system and determination of its solubility in a particular solvent.
[00148] In an embodiment, the present disclosure determines individual enzymes or a combination of enzymes that can enhance solubility of a protein or protein consortium. Thus, in terms of application, it allows flexibility of input with respect to one or a combination of proteins.
[00149] In an embodiment, the present disclosure determines ranking of the ability of the enzymes to enhance solubility, thus giving the user a choice to choose a suitable enzyme depending on the extent of enhancement of solubility required.
[00150] In an embodiment, the feedback mechanism in the method and the system of the present disclosure gradually helps to reduce the run-time of the method, in-view of the re-training based on the correlation of the protein feature(s) and the determined enzyme suitability for previously encountered proteins characterized by same or similar features.
[00151] In an embodiment, the method and the system of the present disclosure, while suggesting suitability of a combination of enzymes for a protein consortium, provides the ratio between the different enzymes and the order in which the enzymes must be employed for protein digestion to allow the enzymes to efficiently act in synergy and enhance solubility; thereby reducing the burden of experimentation on the user.
[00152] In some embodiments, a system or method as described herein generates a database containing or comprising input and/or output data.
[00153] In an embodiment, the method and the system of the present disclosure are versatile in their intended application and can be relied upon to determine individual enzymes or combinations thereof for any known or unknown protein, irrespective of its source or type, wherein the solubility could be solubility in any solvent of interest, provided the data set(s) is updated accordingly.
[00154] In a non-limiting embodiment, examples of such applications include but are not limited to the use of the method or the system as described above for development of novel food ingredients and formulation food and feed.
[00155] Evidently, the present disclosure has a practical application and provides a technically advanced solution to the technical problems associated with existing techniques for determining enzymes suitable for enhancement of protein solubility. The aforesaid technical advancements and practical applications of the disclosed method may be attributed to the aspect of efficiently training the protein solvation identification model to draw a correlation between specific protein features, protein solubility and enzyme suitability by relying on data sets comprising protein data and enzyme data respectively.
[00156] In light of the technical advancements provided by the disclosed method and system, the claimed steps, as discussed above, are not routine, conventional, or well-known aspects in the art, as the claimed steps provide the aforesaid solutions to the technical problems existing in the conventional technologies. Further, the claimed steps clearly bring an improvement in determination of enzymes for enhancement of protein solubility.
[00157] While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.
EXAMPLES:
EXAMPLE 1: Application of the method on algal proteins
[00158] As an example, some of the abundant proteins in algal protein powder were introduced as input to the system of the present disclosure. An enzyme repository dataset comprising data on a multiplicity of enzymes was relied upon for enzyme data.
[00159] The said proteins that were taken into consideration are present in equal amount in the algal protein powder, and in view of the same, the scores were averaged and used to rank peptidases (based on effectiveness in a protein consortia) for enhanced protein solubility.
[00160] Some peptidases suggested by the system in terms of their potential to improve solubility of algal proteins were as follows: Kexin, Jararhagin, Hepsin, Insulysin and Staphylolysin.
[00161] In a similar manner, the enzymes of low relevance, that were found to have low potential to improve solubility of the algal proteins were also determined by the system as follows: Neprosin and Peptidyl-D-Amino Acid Hydrolase (Cephalopod).
EXAMPLE 2: Application of the method on algal proteins – validation in real time [00162] Primary structure data on the algal protein consortium was used as input in the solubility determination system of the present disclosure. Enzymes Trypsin and Papain were determined by the system as some enzymes found suitable for improving protein solubility. Going by solubility score, the said enzymes were found to increase solubility score up to 3 times the original protein. The data pre and post in-silico digestion is provided in Table 3.
Table 1: Pre and post in-silico digestion solubility scores for some proteins of interest present in algal protein powder
S No. Peptide Name Solubility
score before
digestion Solubility
score after
Trypsin
digestion Solubility score
after Papain
digestion
1. Ribulose bisphosphate carboxylase 0.79 0.94 0.89
2. ATP synthase 0.51 0.87 0.91
3. Translational elongation factor 1 0.20 0.82 0.89
4. Translational elongation factor Tu 0.78 0.85 0.80
5. ATP synthase subunit alpha 0.65 0.96 0.86
6. Chlorophyll a/b light -harvesting protein 0.79 0.76 0.83
7. Putative fructose-bisphosphate aldolase 0.14 0.68 0.56
[00163] To validate the above results, concentration-based experiments testing the impact of papain on the solubility of algal protein isolate were conducted. Algal protein solution was
treated with about 0.5% papain at room temperature. After digestion, the protein solution was analysed for concentration dissolved and protein concentration analysis was done using Bradford assay. These experiments were conducted at two temperature conditions viz. 25⁰C and 50⁰C. It was found that there was a high level of correlation between system predicted increment and experimental observation (Figure 4). This experiment thus validated the reliability of the output of the system of the present disclosure.
EXAMPLE 3: Application of the method on pea proteins
[00164] As an example, the following proteins from pea protein powder – Albumin-1, Convicilin, Glutelin, Legumin, Prolamin and Vicilin were used as input to the system of the present disclosure. An enzyme repository dataset comprising data on 540 enzymes was relied upon for enzyme data.
[00165] The said proteins are present in almost equal amounts in the pea protein powder. Accordingly, the scores were averaged and used to rank peptidases (based on effectiveness in a protein consortia) for enhanced protein solubility. Some of peptidases determined to be suitable by the system, in terms of their potential to improve solubility, were as follows: Endothiapepsin, Ananain, Pepsin A5 (Homo Sapiens), Human Enterovirus 71 3C Peptidase, Methionyl Aminopeptidase 2, Caspase-10, Poliovirus-Type Picornain 3C, Methionyl Aminopeptidase 1 (Escherichia-Type) and Caspase-1.
[00166] Endothiapepsin, Ananain and Pepsin A5 (Homo Sapiens) were grouped into one enzyme combination deemed suitable for digestion of the said proteins. Human Enterovirus 71 3C Peptidase Methionyl Aminopeptidase 2 and Caspase-10 were grouped into another enzyme group whereas Poliovirus-Type Picornain 3C, Methionyl Aminopeptidase 1 (Escherichia-Type) and Caspase-1 were grouped into a further enzyme group for digestion of the said proteins and solubility enhancement of the said 6 pea proteins. While the data on peptides generated by each of the enzymes starting from the individual proteins is extensive, for the purposes of exemplification, the below table is representative of the data on peptides generated from Legumin using enzyme Pepsin A5. For reasons of brevity, the below table provides a portion of the peptide data generated by the system of the present disclosure using Legumin as the input and Pepsin A5 as the enzyme –
Table 2: Representative table to exemplify peptides obtained from digestion of Legumin
with enzyme Pepsin A5
Sr. No. Peptides after digestion Length Propensity
1 V 1 0.540179
2 R 1 0.557604
3 Q 1 0.562791
4 L 1 0.60199
5 LKSN 4 0.547991
6 MAKLL 5 0.818985
7 KFLVPARE 8 0.703985
8 NNPFKFLVPARE 12 0.543286
9 AGIARLAGTSSVINN 15 0.355121
10 SLSDRFSYVAFKTNDR 16 0.365903
11 NLQRNEARQLKSNNPF 16 0.51518
12 RQLKSNNPFKFLVPAR 16 0.64033
13 RAGIARLAGTSSVINNLPL 19 0.460163
14 MAKLLALSLSFCFLLLGGCFA 21 0.3605
15 EDEERQPRHQRRRGEEEEEDKK 22 0.828125
16 NCNGNTVFDGELEAGRALTVPQN 23 0.484375
17 NYAVAAKSLSDRFSYVAFKTNDR 23 0.504651
18 TVPQNYAVAAKSLSDRFSYVAFK 23 0.539801
19 LREQPQQNECQLERLDALEPDNRIESEGGLIETWNPN 37 0.37656
20 MAKLLALSLSFCFLLLGGCFALREQPQQNECQLERLD 37 0.2715
[00167] While the above represents only a portion of the large volume of data that was generated while determining the suitability of different enzymes with varying degrees of impact on solubility of the studied proteins in order to maintain brevity of the description, the above data is reflective of the output provided by the system, wherein the above referred enzymes were grouped into 3 different groups based on their respective rankings determined by the system. Based on the above, it was inferred that at least 3 separate proteins or protein consortia are present in the pea protein powder.
[00168] The above is depictive of the versatility of the method and the system of the present disclosure to multiple types of proteins and protein consortia, independent of their source.
EXAMPLE 4: Application of the method on soy proteins
[00169] As an example, some of the most abundant proteins in soy protein powder were introduced as input to the system of the present disclosure. An enzyme repository dataset comprising data on a multiplicity of enzymes was relied upon for enzyme data.
[00170] Since the proteins taken into consideration are present in equal amount in the soy protein powder, the scores were averaged and used to rank peptidases (based on effectiveness in a protein consortia) for enhanced protein solubility. Based on observations capturing suitability of readily available enzymes and their respective ranks, some of the peptidases in terms of potential to improve solubility were determined to be as follows: Enteropeptidase, Bacterial Collagenase H, LAST Peptidase (Limulus-Type), SENP5 Peptidase, Pz-Peptidase A, Leukotriene A4 Hydrolase, Retropepsin (Human T-cell Leukemia Virus), Rpumapepsin, Presenilin 1 and Penicillopepsin.
[00171] Enteropeptidase, Bacterial Collagenase H, LAST Peptidase (Limulus-Type), SENP5 Peptidase, Pz-Peptidase A were grouped into one enzyme combination deemed suitable for digestion of the said proteins. Leukotriene A4 Hydrolase, Retropepsin (Human T-cell Leukemia Virus), Rpumapepsin, Presenilin 1 and Penicillopepsin were grouped into a further enzyme group for digestion of the said proteins and solubility enhancement of the said 6 soy proteins.
[00172] While the above represents only a portion of the large volume of data that was generated while determining the suitability of different enzymes with varying degrees of impact on solubility of the studied proteins, in order to maintain brevity of the description, the above data is reflective of the output provided by the system, wherein the above listed enzymes were grouped into 2 different groups based on their respective rankings, indicating, at least based on the above, the presence of 2 separate proteins or protein consortia in soy protein powder.
EXAMPLE 5: Application of the method on whey proteins
[00173] As an example, proteins Afamin, Alpha-lactalbumin, Beta-lactoglobulin, Immunoglobulins, Lactoperoxidase and Lactotransferrin present in whey protein powder were introduced as input to the system of the present disclosure. An enzyme repository dataset comprising data on a multiplicity of enzymes was relied upon for enzyme data.
[00174] The considered proteins are present in equal amount in the soy protein powder, and accordingly, the scores were averaged and used to rank peptidases (based on effectiveness in a protein consortia) for enhanced protein solubility. The following enzymes were determined to be well-performing enzymes in terms of enhancing the solubility of the soy protein consortium: Griselysin Bap1 Peptidase (Bothrops Asper), Leukotriene A4, Hydrolase, Presenilin 1,
Penicillopepsin, Comosain, Human Enterovirus 71 3C Peptidase, Bovine Immunodeficiency Virus Retropepsin and Dipeptidyl-Peptidase I.
[00175] Griselysin Bap1 Peptidase (Bothrops Asper) and Leukotriene A were grouped into one enzyme combination deemed suitable for digestion of the said proteins Presenilin 1, Penicillopepsin and Comosain were grouped into another enzyme group whereas Human Enterovirus 71 3C Peptidase, Bovine Immunodeficiency Virus Retropepsin and Dipeptidyl-Peptidase I were grouped into a further enzyme group for digestion of the said proteins and solubility enhancement of the said 6 whey proteins.
[00176] While the above represents only a portion of the large volume of data that was generated while determining the suitability of different enzymes with varying degrees of impact on solubility of the studied proteins, in order to maintain brevity of the description, the above data is reflective of the output provided by the system, wherein the above listed enzymes were grouped into 3 different groups based on their respective rankings, indicating, at least based on the above, the presence of 3 separate proteins or protein consortia in whey protein powder.
[00177] The above is depictive of the versatility of the method and the system of the present disclosure to multiple types of proteins and protein consortia, independent of their source.
[00178] The foregoing description of the specific embodiments fully reveals the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments in this disclosure have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.
[00179] While considerable emphasis has been placed herein on the particular features of this disclosure, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other modifications in the nature of the disclosure or the preferred
embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the disclosure and not as a limitation.
[00180] All references, articles, publications, general disclosures etc. cited herein are incorporated by reference in their entireties for all purposes. However, mention of any reference, article, publication etc. cited herein is not, and should not be taken as, an acknowledgment or any form of suggestion that they constitute valid prior art or form part of the common general knowledge in any country in the world.
WE CLAIM;
1. A method of determining enzyme(s) suitability for facilitating protein solubilization, the method comprising:
receiving, by a solubility determination system, data on a protein;
determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score; determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; and
determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
2. The method as claimed in claim 1, wherein the data on the protein is selected from structural data of the protein and/or composition of the protein(s); wherein the structural data is selected from a group comprising primary structure, secondary structure, tertiary structure and quaternary structure of the protein or any combination thereof.
3. The method as claimed in claim 1, wherein the feature(s) of the protein comprise one-
dimensional, two-dimensional or three-dimensional properties of the complete protein or any
part thereof, or any combination thereof.
4. The method as claimed in claim 1, wherein the suitability of the one or more enzyme(s) for protein solubilization is determined when the solubility score of the plurality of peptide(s) is higher than the solubility score of the protein.
5. The method as claimed in claim 1, further comprising predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
6. The method as claimed in any of claims 1, 2 or 5, further comprising determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
7. The method as claimed in claim 1, further comprising ranking the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on the determined solubility score of the plurality of peptides.
8. The method as claimed in claims 1 and 6, further comprising predicting a group of enzyme(s) for facilitating solubilization of the protein, and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
9. The method as claimed in claim 1, further comprising determining an order of the one or more enzyme(s) for the protein solubilization using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
10. The method as claimed in claim 7, further comprising re-training of the protein solvation identification model based on the determined enzyme suitability and the ranking.
11. The method as claimed in claims 4 and 7, comprising:
receiving, by the solubility determination system, the data on a protein; determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using the protein solvation identification model;
determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model, wherein the one or more enzyme(s) are selected based on the ranking with respect to their suitability for facilitating solubilization of previously encountered protein(s) characterized by feature(s) same or similar to the extracted feature(s); and
determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein.
12. The method as claimed in claims 1, 5, 6 and 8 comprising:
receiving, by a solubility determination system, data on a protein;
determining, by the solubility determination system, a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model, wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset with a corresponding solubility score;
determining, by the solubility determination system, a solubility score of each peptide of a plurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model;
determining, by the solubility determination system, the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the plurality of peptides with the solubility score of the protein; and additionally, predicting probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a peptide prediction model; wherein the peptide prediction model is trained using information on digestion site(s) and associated cleavage potency of each enzyme at the said digestion site(s) from an enzyme repository dataset;
determining total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s); and predicting a group of enzyme(s) for facilitating solubilization of the protein(s), and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model. 13. A solubility determination system (201) for determining enzyme suitability for facilitating protein solubilization, comprising: a processor (211); and
a memory (209) communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to: receive, by a solubility determination system, data on a protein; determine a solubility score of the protein based on feature(s) of the protein extracted from the data using a protein solvation identification model (213), wherein the protein solvation identification model is trained by correlating training feature(s) derived from a protein repository dataset (203) with a corresponding solubility score;
determine a solubility score of each peptide of aplurality of peptides generated by in-silico digestion of the protein by one or more enzyme(s) using the protein solvation identification model; and
determine the suitability of the one or more enzyme(s) for protein solubilization based on a comparison of the solubility score of the peptide(s) with the solubility score of the protein.
14. The solubility determination system as claimed in claim 13, wherein the processer (211) receives the data on the protein selected from a group comprising primary structure, secondary structure, tertiary structure and quaternary structure of the protein or any combination thereof.
15. The solubility determination system as claimed in claim 13, wherein the feature(s) of the protein comprise one-dimensional, two-dimensional or three-dimensional properties of the protein or any part thereof, or any combination thereof.
16. The solubility determination system as claimed in claim 13, wherein the processor (211) determines suitability of the one or more enzyme(s) for protein solubilization when the solubility score of the peptide(s) is higher than the solubility score of the protein.
17. The solubility determination system as claimed in claim 13, wherein the processor (211) predicts probability of formation of each peptide of the plurality of peptides during the in-silico digestion, based on the data on the protein and the one or more enzyme(s) using a prediction model; wherein the peptide prediction model is trained using information on digestion site and associated cleavage potency of each enzyme from an enzyme repository dataset.
18. The solubility determination system as claimed in claim 13, wherein the processor (211) determines total availability of each peptide of the plurality of peptide(s) based on a function of the composition of the protein(s) and the probability of formation of the said peptide(s).
19. The solubility determination system as claimed in claim 13, wherein the processor (211) ranks the one or more enzyme(s) with respect to their suitability for facilitating protein solubilization based on the determined solubility score of the plurality of peptides.
20. The solubility determination system as claimed in claim 13, wherein the processor (211) predicts a group of enzyme(s) for facilitating solubilization of the protein, and optionally, predicting ratio of each enzyme in the group of enzymes, based on the solubility score of each peptide of the plurality of peptides and the total availability of each peptide of the plurality of peptide(s), using an enzyme grouping model.
21. The solubility determination system as claimed in claim 13, wherein the processer (211) re¬trains the protein solvation identification model (213) based on the determined enzyme suitability and the ranking.
| # | Name | Date |
|---|---|---|
| 1 | 202321032956-STATEMENT OF UNDERTAKING (FORM 3) [10-05-2023(online)].pdf | 2023-05-10 |
| 2 | 202321032956-REQUEST FOR EXAMINATION (FORM-18) [10-05-2023(online)].pdf | 2023-05-10 |
| 3 | 202321032956-PROOF OF RIGHT [10-05-2023(online)].pdf | 2023-05-10 |
| 4 | 202321032956-FORM 18 [10-05-2023(online)].pdf | 2023-05-10 |
| 5 | 202321032956-FORM 1 [10-05-2023(online)].pdf | 2023-05-10 |
| 6 | 202321032956-DRAWINGS [10-05-2023(online)].pdf | 2023-05-10 |
| 7 | 202321032956-DECLARATION OF INVENTORSHIP (FORM 5) [10-05-2023(online)].pdf | 2023-05-10 |
| 8 | 202321032956-COMPLETE SPECIFICATION [10-05-2023(online)].pdf | 2023-05-10 |
| 9 | 202321032956-FORM-26 [10-07-2023(online)].pdf | 2023-07-10 |
| 10 | 202321032956-Power of Attorney [07-08-2024(online)].pdf | 2024-08-07 |
| 11 | 202321032956-Form 1 (Submitted on date of filing) [07-08-2024(online)].pdf | 2024-08-07 |
| 12 | 202321032956-Covering Letter [07-08-2024(online)].pdf | 2024-08-07 |
| 13 | 202321032956-Power of Attorney [13-08-2024(online)].pdf | 2024-08-13 |
| 14 | 202321032956-Form 1 (Submitted on date of filing) [13-08-2024(online)].pdf | 2024-08-13 |
| 15 | 202321032956-Covering Letter [13-08-2024(online)].pdf | 2024-08-13 |
| 16 | 202321032956-CORRESPONDENCE(IPO)-(WIPO DAS)-13-08-2024.pdf | 2024-08-13 |
| 17 | 202321032956-FORM 3 [08-11-2024(online)].pdf | 2024-11-08 |