Method And System For Assigning Unique Labels To Atoms In A Chemical

< Back

Method And System For Assigning Unique Labels To Atoms In A Chemical Compound

Abstract: To assign a unique and correct identity to atoms in chemical compounds, chemists follow a tedious application of IUPAC rules. Nowadays, when chemical companies and even individuals establish their own databases of real or virtual compounds and reactions, the problem of identifying compounds has become more critical than ever. The problem can be described as that of compond canonicalization. It involves application of a set of rules onto a compound in a standard representation to obtain a unique character string easily comparable by computer or manually to the corresponding strings of other compounds. This is equivalent to producing a unique renumbering of the atoms in a molecule, a canonical numbering. Each cheminformatics toolkit or method provides its own version of a canonical ordering, mostly based on unpublished methods, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization system that uses a novel topology based renumbering method instead of other previous methods. This new method is able to generate a canonical order of the atoms of chemical compounds within a few milliseconds with zero error rate.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

07 May 2019

Publication Number

21/2019

Publication Type

INA

Invention Field

MECHANICAL ENGINEERING

Status

kulharia@gmail.com

Parent Application

Applicants

Central University of Punjab

City Campus Mansa Road Bathinda Punjab India

Inventors

1. Mahesh Kulharia

Sri Sai Sadan, VPO Harita, Hisar, Haryana, India (125001)

2. Suchismita Mahato

Zone No. 1B, Birsanagar, Near Loyola B.Ed College, Jamshedpur, Jharkhand, India (831019)

3. Saurav Kumar JIndal

Near Old Nagar Palika, Badrinath Bazar, Gangapur City, Rajasthan, India (322201)

4. Surinder Singh Khurana

s/o Jagdish Singh, Main Bazaar, Talwandi Sabo, Dist-Bathinda, Punjab, India (151302)

5. Vicky Kumar

H.No. 315 W.No. 19, Indira Marg,Near Baba Ramdev Mandir Sunam, Punjab India (148028)

6. Kousik Giri

Village and PO Bakra, Jhargram, West Bengal

Specification

TECHNICAL FIELD
The present disclosure is related to assigning unique labels to atoms in a chemical compound and more specifically, to a method and system for assigning a unique labelling of atoms which can also be easily adapted to allow for automated chemical compound identification, chemical reaction search, minimization of redundancy in chemical databases, inventory management etc.
BACKGROUND
Rigorous characterization of chemical compounds in terms of their structural and biological properties is vital for chemical research. The three-dimensional structure of a compound, is an inefficient handle for searching similar compounds. Instead, compound identifiers play a key role in digital archiving of chemical compounds. Unique and reproducible atom identifiers are required to ensure the correct cross-referencing of properties associated with chemical compounds archived in databases. All currently available approaches fail to provide unique and unambiguous atom nomenclature. The use of imprecise and ambiguous atom nomenclature has resulted in the proliferation of compound datasets with questionable data. It is very important to construct a method and system that derives the atom labels purely on the basis of chemical structure in an unambiguous manner so that a precise and unique label can be assigned to a chemical compound.
FIELD OF INVENTION
In this millenial era, information is the vital element for any advancement in science. Generating, classifying and utilizing information is imperative for any progression of technology. Therefore, databases have increasingly become a valuable asset in information utilization. The primary motive to maintain such databases is to ensure non-redundancy among the stored information. If we consider chemical databases, there are many existing ones such as PubChem, ZINC etc. To maintain non-redundancy across these databases, the unique labelling of chemical compounds is the main concern and this in turn is dependent upon the methods to assign unequivocal atom nomenclature. Existing methods suffer from multiple disadvantages (as mentioned in prior art) which result in massive duplication of stored information, especially in chemical compound databases.

DESCRIPTION OF RELATED ART
The experience gained by the organic chemists in numbering and ordering of atoms and/or groups of atoms of molecular structures has subsequently led to evolution of several different systems for labelling atoms of chemical compounds. The most widely used nomenclature rules are given by "The International Union of Pure and Applied Chemistry" (IUPAC). However, these rules also come with its own short-comings. They are conflicting, difficult to understand and sometimes ambiguous.
In assigning unique labels to atoms in a chemical compound, one can represent the molecule as a graph. Therefore, two molecules are the same if the graph representations are isomorphic. Several methods have been developed to check whether two graphs are isomorphic. However, if the labelling of atoms in two different conformations of the same molecules is not identical, determining whether these are indeed same becomes a non-trivial problem.
The relationship between atoms and their bonds can be written in the form of O's and l's wherein, if two atoms are connected with the help of bonds, 1 is denoted to show the connection and 0 in the reverse case. Prokurowski (1974) had studied canonical orderings of molecules using the incidence matrix generated in this way. Tinhofer and coworkers (1995) had developed a method to generate canonical numbers for a labelled graph using the adjacency matrix. This happens by considering all adjacent matrices of the graph that belong to the isomorphism class. Each matrix is read row by row as a binary number and the matrix with the smallest number is selected. The graph with this numbering is considered to be canonically numbered. The disadvantage of using this approach is its reliability. While it may be true that two molecules of same isomorphism class will give rise to same adjacency matrix, but it is also possible, that two non-isomorphic molecules can also give rise to the same adjacency matrix. Hence, we cannot rely on their canonical labelling alone. Using graph theory for solving such problems was thought of decades ago, but, it was actually used only relatively recently. Brendan McKay (1981) had developed a method to generate a canonical labelling map of a labelled graph. This method has been implemented in the program called "nauty." However, since nauty does not work with labelled graphs, this method is ill-suited to work with chemical compound canonicalization.

D. Weininger (1989) developed a chemical notation language SMILES (Simplified molecular-input line-entry system) for the conversion of a chemical structure to a single notation. This process named CANGEN involves two stages. The first stage involves CANonicalization of the structure, whereby each atom is canonically ordered and labelled. The second stage involves GENerating a molecular graph which starts with the lowest labelled atom. The disadvantage of SMILES is its arbitrariness in its starting atom (D Hutchison (2005)). This is ambiguous in nature as there can be more than one valid SMILES notation for some molecular structures. Also, there can be more than one formula for the same chemical structure. For example, vanillin can be written as 0=Cclccc(0)c(OC)cl as well as Occlcc(C=0)ccclO. This ambiguity arises due to its method of canonicalization mentioned by D Hutchison (2005).
Robert Grossman (2003) had developed a method called the UCK (Universal Chemical Key) which represented different types of covalent bonds, i.e. single, double or triple by just a single bond between atoms of the molecule. The next step in the UCK method is to replace the labels with new labels that capture some of the local connectivity and chemical environment around each atom. Atoms with similar connectivity end up having similar labels. Subsequently, the lengths of the shortest paths between each pair of vertices of the vertex set V of the graph G are generated. The path labels are produced by concatenating the source label, the path length, and the destination label. At every stage of labelling, a rule based lexicographical ordering is followed so that the whole procedure is invariant to the changes in ordering of the vertex set V. A lexicographical ordering of the labels of all pair of shortest path sets is done. The labels are concatenated to form a string and prefixed by the molecular formula of the molecule and are called UCK of the molecule. The disadvantage of this method is that two non-isomorphic molecules may result in the same UCKs. This is because of their inability in providing unique labelling of atoms and prioritising the atoms according to their labels.
International Chemical Identifier (InChl) (2015) is an identifier for molecular structures developed by IUPAC and NIST (National Institute of Standards and Technology). It is non-proprietary and freely usable. It is processed in 3 stages, namely, normalization, canonicalization and serialization. InChl is human-readable. InChl describes chemical substances in the form of six layers : main layer, charge layer, stereochemical layer, isotopic layer, fixed-H layer and reconnected layer. The disadvantages of InChl are

that it does not correctly provide the information about the stereochemical and aromatic property in a molecule and also cannot renumber the atoms in the molecules in a precise manner. ALATIS (Atom Label Assignment Tool Using InChl String) (2017) is another method which claims to improve InChl by rigorously performing a renumbering method to the InChl substring. However, ALATIS too fails to correctly identify the stereochemical information. It also does not provide with the same numbering scheme, incase the molecules are processed with different initial numbering.
OBJECT OF INVENTION
The existing automated methods are unable to accurately deduce the complete stereochemical information from a chemical structure file and provide unique, ever-reproducible renumbering scheme for atoms. This has reduced the confidence in chemical data-management systems. Reproducible unique compound identifier generation must take into account the covalent molecular structure, chirality and complete atom nomenclature. Without the implementation of a robust method for unique and reproducible labelling of atoms, the outcomes of all methods for compound identifier generation would remain ill-suited. Therefore, a method and system for reproducible unique atom label nomenclature would enhance the efficiency of methods for chemical compound identifier assignment. The object of this invention is to assign unique labels to the atoms in a chemical compound.
SUMMARY
The shortcomings in the field of unique labelling of atoms in chemical compounds are overcome and additional advantages are provided through the present disclosure. Additional features and advantages are realized through the techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the present disclosure.
The present disclosure provides a method for assigning unique labels to atoms for every chemical compound. The molecule is considered as an undirected labelled graph G = (V, E) consisting of a finite set of vertices V and a finite set of edges E. The set of atoms in the molecule is the vertex set, V and the set of covalent bonds between the atoms is the edge set, E. Each edge has a unity weight. The process starts by extracting and storing the basic information such as bond orders, aromatic regions of molecule using the

interatomic distance based estimation and maintaining an adjacency matrix to store the connection between atoms. It then identifies the stereogenic centres and identifies the R/S configuration by strict implementation of Cahn-lngold-Prelog rules. This is a recursive process of evaluation of priorities of atoms adjacent to the central atom wherein the neighbourhoods are traversed in an efferent direction so as to find the first point of difference or the end of neighborhood - whichever is earlier. The principles of Cartesian geometry and these priorities determine the R/S configuration for each chiral centre.
The molecular structure file has the elements labelled and numbered in an arbitrary manner. This numbering system is reordered and the atoms are renumbered based on the structure of the molecule. The process canonicalises the molecules by treating them as graphs by considering each atom as a node and bond (one or more) as a single edge, with retention of the atom name. To identify one or more putative root atoms, the process samples the nodes with highest degrees having "lexicographically least valued" neighbourhood, by calculating the cumulative valency of adjacent neighbours. The root(s) act as initiation point for iterative traversal wherein the identity of the series of traversed atoms are recorded as route(s). The routes are then examined one by one wherein each atom of every branch are compared among the same level of occurrence of the branches and stored based on the local key of the atoms in a renumbering list. This process is carried out for all the branches of the routes. The final outcome of this process is the list of atoms in the renumbering list called the traversal sequence which are renumbered in the order of occurrence in the traversal sequence. This becomes the candidate tree key for the given root atom. The candidate tree key also contains the information of the R/S configuration of the atoms. This process is repeated for all the root atoms. The final renumbered list called the primary tree key would be the one amongst the candidate tree keys which have the least value lexicographically and which contains the atoms having R or S configuration higher in the order of occurrence in the candidate tree key. The atoms are then labelled according to the order of occurrence of the atoms in the primary tree key.
The present disclosure provides a system for assigning a unique label of atoms for every chemical compound. The system comprises a processing unit, a memory unit, a storage device, I/O interface and I/O device. The structure information for the chemical compound will be stored in storage device through I/O interface. The storage device is used to store the structural information of the molecule such as bond-order and the

connection table of every atom. The processing unit is then configured to iterate over every stereogenic centre to obtain the priority of every neighbour of the centre. The storage unit then stores the priority order of every stereogenic centre. The principles of Cartesian geometry is used by the processing unit over the priority order stored in the storage unit to determine the R/S configuration and are stored in the storage unit.
The processing unit is then configured to proceed with the renumbering of the atoms in the molecule by identifying the root atoms with the help of memory unit which have highest degrees and have "lexicographically least valued" neighbourhood. This is done by the processing unit by the calculation of the cumulative valency of adjacent neighbours. The root(s) act as initiation point for iterative traversal for the processing unit wherein the identity of the series of traversed atoms are recorded in the memory unit as route(s). The routes are then examined one by one by the processing unit wherein each atom of every branch is compared among the same level of occurrence of the branches and stored in the memory unit based on the priority key of the atoms in a renumbering list. This process is carried out for all the branches of the routes by the processing unit. The final outcome of this process is the list of atoms in the renumbering list stored in the memory unit called the traversal sequence which are renumbered in the order of occurrence in the traversal sequence. This information is stored in the memory unit as the candidate tree key for the given root atom. The candidate tree key also contains the information of the R/S configuration of the atoms derived from the storage unit. This process is repeated for all the root atoms. The final renumbered list called the primary tree key would be the one amongst the candidate tree keys which have the least value lexicographically and which contains the atoms having R or S configuration higher in the order of occurrence. This is determined by the processing unit with the help of the memory unit which stores the primary tree key. The atoms are then labelled and stored in the storage unit according to the order of occurrence of the atoms in the primary tree key. The I/O device is used to display the unique labelling of atoms in the molecule. This display of information is carried out with the help of I/O interface.
BRIEF DESCRIPTION OF THE DRAWINGS
The novel features and characteristics of the disclosure are set forth in the appended claims. The embodiments of the disclosure itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the

following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings. One or more embodiments are described, by way of example only, with reference to the accompanying drawings.
Figure 1 illustrates a system for assigning unique labelling of atoms in chemical compounds in accordance with an embodiment of the present disclosure;
Figure 2 illustrates the complete method involved in assigning unique labelling of atoms in chemical compounds in accordance with the present disclosure;
The figures depict embodiments of the disclosure for the purpose of illustration only. One familiar with the art would readily understand from the detailed description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION
The foregoing has broadly outlined the features and technical advantages of the present disclosure in order that the detailed description of the disclosure that follows may be better understood. Additional features and advantages of the disclosure will be described hereinafter which form the subject of the claims of the disclosure. It should be appreciated by those skilled in the art that the conception and specific aspect disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. It should be also realised by those skilled in the art that such equivalent constructions do not depart from the spirit, scope and principles of the disclosure as set forth in the appended claims.
The present disclosure is related to a method and system for assigning unique labels to atoms in every chemical compound. The molecule is considered to be as an undirected graph G = (V, E) consisting of a finite set of vertices V and a set of edges E. The set of atoms in the molecule is the vertex set, V and the set of covalent bonds between the atoms is the edge set, E. The process starts by extracting and storing the basic information such as bond orders and aromatic regions of molecule using the interatomic distance based estimation. The connection of the atoms with other adjacent atoms are maintained in an adjacency matrix. It then identifies the stereogenic centres for

R/S configuration. The priority of the neighbourhood of the chiral atoms is then decided based on Cahn-lngold-Prelog rules. This is a recursive process of comparison when more than one atoms directly connected to the central atom are identical wherein the neighbourhoods are traversed in an efferent direction so as to find the first point of difference or the end of neighborhood - whichever is earlier. After obtaining the priority of the directly connected atoms for the chiral atom, a vector is obtained from the coordinates of the central atom and the atom having the lowest priority using the principles of Cartesian geometry. Subsequently, another vector is created from the three points of the atoms in the order of highest to lowest priority. A dot-product is calculated between the vectors obtained from the first and the second step. If the resultant is negative, the chiral atom is considered to be of R configuration and if it is positive, the chiral atom is considered to be of S configuration.
The numbering system of the molecular structure file is then reordered and the atoms are renumbered based on the structure of the molecules. The process for carrying out renumbering or canonicalization are carried out in the following steps: Firstly, an atom is classified as terminal atom if it has only one atom connected to it. Subsequently, all the terminal atoms of the molecule are identified and ignored from the adjacency matrix. This is done to ease out the process of canonicalization. Now, the connection list contains those atoms which have more than one connection to other atoms as per the original structured file. Secondly, those atoms which have the maximum number of connections and have the least value of the elemental names of the connected atoms (lexicographically) are selected as root atoms. For instance, a central carbon (C) connected to three carbon atoms (CCC) will get priority over the one connected to two carbon and one nitrogen atoms (CCN). If more than one atom has identical neighbours then all of them are used as initial anchors (root) for path traversal. A local key is maintained for every atom wherein the elemental names of the adjacent atoms, their R/S status and the R/S status of the atom itself are appended. The root(s) act as initiation point for iterative traversal wherein the identity of the series of traversed atoms are recorded as route(s). The local key is used as a basis for prioritising how the neighbours are traversed. The routes are then examined one by one wherein each atom of every branch are compared among the same level of occurrence of other branches and stored based on the priority key of the atoms in a renumbering list as traversal sequence. The priority key comprises the elemental names of the atoms and its neighbours, the R/S status of the

atoms and its neighbours, the occurrence of the previously attached atom in the traversal sequence and the matrix containing the shortest inter-atomic distances for all the atoms in the compound. This process is carried out for all the branches of the routes. The final outcome of this process is the list of atoms in the renumbering list called the traversal sequence which are renumbered in the order of occurrence of the atoms in the traversal sequence. This becomes the candidate tree key for the given root atom. The candidate tree key also contains the information of the R/S configuration of the atoms. This entire process is repeated for every root atom. The final renumbered list called the primary tree key would be the one amongst the candidate tree keys which have the least value lexicographically and which contains the atoms having R or S configuration higher in the order of occurrence. The atoms are then labelled according to the order of occurrence of the atoms in the primary tree key.
Figure 1 illustrates a system for assigning unique labelling of atoms in chemical compounds in accordance with an embodiment of the present disclosure. The system comprises a computing unit 101, an I/O device 109 and a storage unit 111. The computing unit 101 comprises a processing unit 103, a memory unit 105, an I/O interface 107 and a bus 113 interconnecting the processing unit 103 with the memory unit 105 and the I/O interface 107. The structured file will be stored in storage unit 111 through I/O interface 107. The storage unit 111 is used to store the structural information of the molecule such as bond-order and the connection table of every atom. The processing unit 103 is then configured to iterate over every stereogenic centre to obtain the priority of every neighbour of the centre. The storage unit 111 then stores the priority order of every stereogenic centre. The principles of Cartesian geometry is used by the processing unit 103 over the priority order stored in the storage unit 111 to determine the R/S configuration and are stored in the storage unit 111. The processing unit 103 is then configured to proceed with the renumbering of the atoms in the molecule by identifying the root atoms with the help of memory unit 105 which have highest degrees and have "lexicographically least valued" neighbourhood. This is done by the calculation of the cumulative valency of adjacent neighbours. The root(s) act as initiation point for iterative traversal for processing unit 103 wherein the identity of the series of traversed atoms are recorded in the memory unit 105 as route(s). The routes are then examined one by one by the processing unit 103 wherein each atom of every branch is compared among the same level of occurrence of the branches and stored in the memory unit 105 based on the priority key of the atoms in a

renumbering list. This process is carried out for all the branches of the routes by the processing unit 103. The final outcome of this process is the list of atoms in the renumbering list stored in the memory unit 105 called the traversal sequence which are renumbered in the order of occurrence of atoms in the traversal sequence. This information is stored in the memory unit 105 as the candidate tree key for the given root atom. The candidate tree key also contains the information of the R/S configuration of the atoms derived from the storage unit 111. This process is repeated for all the root atoms. The final renumbered list called the primary tree key would be the one amongst the candidate tree keys which have the least value lexicographically and which contains the atoms having R or S configuration higher in the order of occurrence. This is determined by the processing unit 103 with the help of the memory unit 105 which stores the primary tree key. The atoms are then labelled and stored in the storage unit 111 according to the order of occurrence of the atoms in the primary tree key. The I/O device 109 is used to display the unique labelling of atoms in the molecule. This display of information is carried out with the help of I/O interface 107.
Figure 2 illustrates the complete method involved in assigning unique labelling of atoms in chemical compounds in accordance with the present disclosure. The process starts by extracting and storing the basic information such as bond orders, aromatic regions of molecule using the interatomic distance based estimation and maintaining an adjacency matrix to store the connection between atoms. It then identifies the stereogenic centres and the R/S configuration by strict implementation of Cahn-lngold-Prelog rules. It then canonicalises the molecules by treating them as graphs by considering each atom as a node and bond (one or more) as a single edge with the retention of the atom name. To identify one or more putative root atoms, the process samples the nodes with highest degrees having "lexicographically least valued" neighbourhood, by calculating the cumulative valency of adjacent neighbours. The root(s) act as initiation point for iterative traversal wherein the identity of the series of traversed atoms are recorded as route(s). The routes are then examined one by one wherein each atom of every branch are compared among the same level of occurrence of the branches and stored based on the local key of the atoms in a renumbering list. This process is carried out for all the branches of the routes. The final outcome of this process is the list of atoms in the renumbering list called the traversal sequence which are renumbered in the order of occurrence of atoms in the traversal sequence. This becomes the candidate tree key for the given root atom. The

candidate tree key also contains the information of the R/S configuration of the atoms. This process is repeated for all the root atoms. The final renumbered list called the primary tree key would be the one amongst the candidate tree keys which have the least value lexicographically and which contains the atoms having R or S configuration higher in the order of occurrence. As clear from the figure, the candidate tree keys are: CCNCCN, CCCCNN and NCNCCC. The one which is the lowest lexicographically is CCCCNN. Therefore, that candidate tree key becomes the primary tree key. The atoms are then labelled according to the order of occurrence of the atoms in the primary tree key.
Referral Numerals:

We claim:
1. A method for providing a unique identification key to a molecule from its structure comprising:
finding the number of atoms in the molecule;
obtaining the adjacency matrix for all atoms in the molecule alongwith their bond order;
recording the elemental names of all atoms in the molecule;
identifying all of the chiral atoms in the molecule;
calculating the R and S stereo configuration for all chiral atoms in the molecule;
identifying all terminal atoms;
assigning a cognate local key to each non terminal atom in the molecule;
labeling the atoms into roots and non-roots;
generating a traversal tree for each root atom;
generating a traversal sequence for each traversal tree;
generating a candidate tree key for each traversal tree;
identifying the primary tree key amongst the collection of all candidate keys; and
renumbering the atoms in sequential order of their appearence in the primary tree key.
2. The method as claimed in claim 1, wherein the adjacency matrix is a square matrix comprising of
bond-order information of adjacent atoms in the molecule.

3. The method as claimed in claim 1, wherein the atoms in the molecule with only one adjacent atom is identified as terminal atom.
4. The method as claimed in claim 1, wherein a cognate local key is assigned to each non terminal atom in the molecule by using a partial or complete combination of properties including but not limited to the elemental name of atom itself alongwith its R and S stereo configuration status, elemental names of all its directly adjacent atoms with their cognate R and S stereo configuration status, in sorted order.
5. The method as claimed in claim 1, wherein the atoms in the molecule are labeled as root using the sorted collection of largest local keys.
6. The method as claimed in claim 1, wherein the traversal tree is generated by traversing the atoms in the molecule with root atom as traversal initiation point and using the local key.
7. The method as claimed in claim 1, wherein the traversal sequence is generated by concatenating the atom identities in the traversal tree using the priority flag based on a partial or complete combination of properties in a fixed order including but not limited to the elemental name of each atom in the molecule alongwith its R and S stereo configuration status, elemental name and R and S stereo configuration status of the penultimate atom on the traversal path, location of penultimate atom in the traversal path and sorted series of all inter-node distances.
8. The method as claimed in claim 1, wherein the candidate tree key is generated by concatenating by a partial or complete combination of properties in a fixed order including but not limited to the elemental name of each atom in the molecule alongwith its R and S stereo configuration status, in the traversal sequence.
9. The method as claimed in claim 1, wherein a candidate tree key is labeled as the primary tree key from the sorted collection of all candidate tree keys by fixing any location.
10. A system for providing a unique identification key to a molecule from its structure; comprising:

a storage unit configured to store the structure information of the chemical compound, the bond-order and the connection table of every atom, the priority order of every stereogenic centre, R7 S configuration status of every atom, the renumbered file of the compound;
an I/O device is configured to display the unique labelling of atoms in the molecule, to input the structure information of the chemical compound;
a computing unit communicatively connected to the storage unit and I/O device to obtain the data associated with the structure information of the chemical compound from I/O device; comprising:
an I/O interface configured to pass on the information from the I/O device, storage
device to the processing unit and the memory unit;
a memory unit configured to store data of the local keys for all atoms in the chemical compound, the root atoms of the chemical compound, the traversal tree for each root atom, the traversal sequence for each traversal tree, the candidate tree key for each traversal sequence and the primary tree key amongst the collection of all candidate tree keys of the chemical compound;
a processing unit communicatively connected to the memory unit and I/O interface to obtain the data associated with the structure information of the chemical compound from I/O device, the processing unit being capable of:
finding the number of atoms in the molecule;
obtaining the adjacency matrix for all atoms in the molecule alongwith their bond order;
recording the elemental names of all atoms in the molecule;
identifying all of the chiral atoms in the molecule;
calculating the R and S stereo configuration for all chiral atoms in the molecule;
identifying all terminal atoms;
assigning a cognate local key to each non terminal atom in the molecule;

labeling the atoms into roots and non-roots;
generating a traversal tree for each root atom;
generating a traversal sequence for each traversal tree;
generating a candidate tree key for each traversal tree;
identifying the primary tree key amongst the collection of all candidate keys; and
renumbering the atoms in sequential order of their appearence in the primary tree key.
11. The system as claimed in claim 10, wherein the adjacency matrix as stored in the storage unit is a square matrix comprising of bond-order information of adjacent atoms in the molecule.
12. The system as claimed in claim 10, wherein the atoms in the molecule with only one adjacent atom is identified as terminal atom by the processing unit.
13. The system as claimed in claim 10, wherein a local key stored in the memory unit are assigned to each non terminal atom in the molecule by using a partial or complete combination of properties including but not limited to the elemental name of atom itself alongwith its R and S stereo configuration status obtained from the storage unit, elemental names of all its directly adjacent atoms obtained from the storage unit with their cognate R and S stereo configuration status obtained from the storage unit, in sorted order.
14. The system as claimed in claim 10, wherein the atoms in the molecule are labelled as root by the processing unit using the sorted collection of largest local keys and stored in the memory unit.
15. The system as claimed in claim 10, wherein the traversal tree is generated by the processing unit and stored in the memory unit by traversing the atoms in the molecule with root atom as traversal initiation point and using the local key.

16. The system as claimed in claim 1, wherein the traversal sequence is generated by the processing
unit and stored in the memory unit by concatenating the atom identities in the traversal tree using
the priority flag based on a partial or complete combination of properties in a fixed order including
but not limited to the elemental name of each atom in the molecule alongwith its R and S stereo
configuration status obtained from the storage unit, elemental name and R and S stereo
configuration status of the penultimate atom on the traversal path obtained from the storage unit,
location of penultimate atom in the traversal path and sorted series of all inter-node distances.
17. The system as claimed in claim 10, wherein the candidate tree key is generated by the processing unit and stored in the memory unit by concatenating by a partial or complete combination of properties in a fixed order including but not limited to the elemental name of each atom in the molecule alongwith its R and S stereo configuration status obtained from the storage unit, in the traversal sequence.
18. The system as claimed in claim 10, wherein the processing unit labels a candidate tree key as the primary tree key from the sorted collection of all candidate tree keys stored in the memory unit by fixing any location.

Documents

Application Documents

#	Name	Date
1	201911018211-STATEMENT OF UNDERTAKING (FORM 3) [07-05-2019(online)].pdf	2019-05-07
1	abstract.jpg	2019-06-14
2	201911018211-COMPLETE SPECIFICATION [07-05-2019(online)].pdf	2019-05-07
2	201911018211-REQUEST FOR EARLY PUBLICATION(FORM-9) [07-05-2019(online)].pdf	2019-05-07
3	201911018211-DECLARATION OF INVENTORSHIP (FORM 5) [07-05-2019(online)].pdf	2019-05-07
3	201911018211-FORM-9 [07-05-2019(online)].pdf	2019-05-07
4	201911018211-DRAWINGS [07-05-2019(online)].pdf	2019-05-07
4	201911018211-FORM 1 [07-05-2019(online)].pdf	2019-05-07
5	201911018211-FIGURE OF ABSTRACT [07-05-2019(online)].jpg	2019-05-07
6	201911018211-DRAWINGS [07-05-2019(online)].pdf	2019-05-07
6	201911018211-FORM 1 [07-05-2019(online)].pdf	2019-05-07
7	201911018211-DECLARATION OF INVENTORSHIP (FORM 5) [07-05-2019(online)].pdf	2019-05-07
7	201911018211-FORM-9 [07-05-2019(online)].pdf	2019-05-07
8	201911018211-COMPLETE SPECIFICATION [07-05-2019(online)].pdf	2019-05-07
8	201911018211-REQUEST FOR EARLY PUBLICATION(FORM-9) [07-05-2019(online)].pdf	2019-05-07
9	201911018211-STATEMENT OF UNDERTAKING (FORM 3) [07-05-2019(online)].pdf	2019-05-07
9	abstract.jpg	2019-06-14