Abstract: A system for representing natural language partial dependency analysis and a method for performing the same has been disclosed. The system receives partial analysis of natural language sentences and places group(s) of words into a bag. The bag allows a group of words to be put together, without having to give their dependency relations beforehand. The system includes a predefined set of rules and corresponding operations to process the elements and saturate the bag. Saturated bags are reduced, so that elements inside can be moved out, as part of a dependency tree representation. In this way, the system aims at progressively completing the partial analysis by accommodating changes in the representation using predetermined operations on elements in the bag. Thus, the system provides acomprehensive framework for representing partial dependency analysis, and accommodating changes in the dependency tree.
FIELD OF THE INVENTION
The present invention relates to the field of natural language processing.
DEFINITIONS OF TERMS USED IN THE SPECIFICATION
The term 'bag' in this specification relates to data structure which contains one or more elements as its members.
The term 'element' in this specification relates to a token, a local word group, or a bag.
The term 'bag category' in this specification relates to the part of speech category of the element identified as the head of the bag.
The term 'local word group' in this specification relates to a unit formed by performing analysis at a lower level which includes words and their case endings and markers such as postposition and preposition.
The term 'partial dependency analysis' in this specification relates to a representation of a natural language input sentence in which dependency relationship between all the words in the input sentence is not established.
The term 'saturate' in this specification relates to a bag in which the dependency relationships amongst all the elements of the bag are identified.
These definitions are in addition to those expressed in the art.
BACKGROUND OF THE INVENTION AND PRIOR ART
Natural language processing is a field of computer science aimed at providing computing devices with the ability to understand, analyze and generate human languages. Natural language processing is an integral part of search engines for information extraction and retrieval. Natural language processing also finds applications in language translators, summarizations, speech recognition, interactive systems, semantic analysis and the like applications.
One of the key steps in natural language processing is parsing, parsing leads to analysis of a sentence in a human language into dependency structure or phrase structure. The dependency structure is usually represented in the form of dependency trees. Dependency trees, typically, consist of nodes corresponding to words in a sentence, together with directed arcs between the nodes for representing their dependency relationship. However, at times a parser is not able to give the complete parse; only some of the dependency relations are identified. Such partial analysis is not useful where semantic analysis is an integral part of an application.
Therefore, there is felt a need for a system for natural language processing to represent the partial analysis and make it progressively complete. Also, there is felt a need for a system which gives a systematic approach and predetermined set of operations for effectively establishing dependency relationships while representing the partial dependency analysis.
OBJECT OF THE INVENTION
It is an object of the present invention to provide a system for natural language processing to represent partial analysis.
It is still another object of the present invention to provide a multilingual natural language processing system for representing partial analysis.
It is yet another object of the present invention to provide a natural language processing system which represents partial analysis in less iteration of operations.
SUMMARY OF THE INVENTION
The present invention envisages a system for natural language processing comprising:
• a pre-processing unit adapted to analyze elements present in a natural language input sentence and provide grammatical features of theelements and further adapted to group theelements into local word groups based on said grammatical features and process elements in the local word groups to determine their dependency relationships and still further adapted to provide a partial dependency analysis for the input sentence and collate the local word group(s) in the partial dependency analysis into bag(s) in the event that the dependency relationships between the elements in local word group(s) are not identified;
• arepository adapted to store transformation rules for die natural language and their corresponding operations;
• a processing unit co-operating with the pre-processing unit and the repository adapted to progressively parse and perform predetermined operations on elements in the bag(s) based on the transformation rules and further adapted to saturate the bag(s) in the event that the partial dependency analysis is complete; and
• a post-processing unit co-operating with the processing unit and the repository adapted to reduce and eliminate saturated bag(s) based on the transformation rules and further adapted to move corresponding elements of the eliminated bag(s) as a part of a dependency tree for the natural language input sentence.
Typically, the pre-processing means includes:
• a morphological analyzer adapted to receive and morphologically analyze elements present in a natural language input sentence and further adapted to provide grammatical features of the elements, wherein each element in the input sentence is associated with a feature structure and the feature structure is updated with information on the grammatical features;
• grouping means co-operating with the morphological analyzer adapted to group the elements into local word groups based on the grammatical features of adjacent elements in the input sentence;
• pre-processing means adapted to process elements in the local word groups to determine their dependency relationships based on predetermined linguistic knowledge and further adapted to provide a partial dependency analysis for the input sentence, wherein the partial dependency analysis consists of local word groups representing nodes on a dependency tree; and
• collation means adapted to process the partial dependency analysis and put the
local word group(s) into bag(s) in the event that the dependency relationships
between the elements in group are not identified.
Preferably, the collation means includes sequence generation means adapted to generate a sequential number to identify each element present in the partial dependency analysis based on predetermined rules. And, bag definition means adapted to define the category and the feature structure for the bag and its elements.
Further, the processing unit includes:
• first fetching means adapted to fetch transformation rules from the repository; and
• a parsing engine co-operating with the fetching means to perform predetermined operations on the elements of the bag(s) based on transformation rules.
Still further, the post-processing unit includes:
• second fetching means adapted to fetch transformation rules from the repository;
• reduction means adapted to reduce and eliminate saturated bag(s) based on the transformation rules; and
• dependency tree creation means adapted to represent elements of reduced and eliminated bag(s) as a part of a dependency tree.
In accordance with the present invention there is provided a method for natural language processing comprising the following steps:
• creating a repository to store transformation rules for the natural language and its corresponding language processing operations;
• receiving and analyzing elements present in a natural language input sentence and providing grammatical features for each of the elements;
• grouping the elements into local word groups based on the grammatical features of adjacent elements;
• processing elements in the local word groups to determine their dependency relationships and providing a partial dependency analysis for the input sentence;
• collating local word group(s) in the partial dependency analysis into bag(s) in the event that the dependency relationships between the elements in local word group(s) are not identified;
• progressively parsing and performing predetermined operations on elements in the bag(s) based on the transformation rules to saturate the bag in the event that the partial dependency analysis is complete;
• reducing and eliminating saturated bag(s) based on the transformation rules; and
• moving corresponding elements of the eliminated bag(s) as a part of a dependency tree for the natural language input sentence.
In accordance with the present invention, the step of providing grammatical features for each of the elements includes the steps of creating feature structures for each of the elements and storing the grammatical features in the feature structure. And, the step of collating local word group(s) in the partial dependency analysis into bag(s) includes the steps of:
• adding bag markers to the local word groups whose elements dependency relationship is not identified; and
• creating a feature structure for the bag, wherein the feature structure stores category of bag and feature values of the elements present in the bag. the step of progressively parsing and performing predetermined operations on elements in the bag(s) based on the transformation rules includes the steps of:
• identifying type of bag;
• fetching the rules from the repository based on type of bag; and
• performing operations on the elements of the bag based on fetched rules.
Typically, the step of reducing and eliminating the bag includes the steps of:
• moving every element of the bag except the head element out of the bag wherein thedependency relationship of the moved elements withthe bag is identified;
• removing the dependency relationship of the moved elements with the head node and marking it to the bag;
• marking a dependency relationship between the moved element and the bag;
• moving the head of the bag out of the bag; and
• copying the feature structure of the bag as the feature structure of the head node and removing the bag.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The present invention will now be described with reference to the accompanying drawings, in which:
FIGURE 1 illustrates a schematic of the system for natural language processing in accordance with the present invention;
FIGURE 2 shows a pictorial representation of a dependence tree for a sample natural language sentence in accordance with the present invention;
FIGURE 3 shows a pictorial representation of a dependency tree with roots as nodes in accordance with the present invention; and
FIGURE 4 is a flowchart showing the steps involved in natural language processing in accordance with the present invention.
DETAILED DESCRIPTION
The present invention will now be described in detail with reference to the accompanying drawings. The description and drawings do not limit the scope and ambit of the invention and are provided purely by way of example and illustration.
The natural language parsers known in the art provide dependency analysis of natural language sentences; however, the analysis is only partially complete as the dependency relationships amongst the elements of the sentence are not fully established. Hence, to overcome this shortcoming the present invention envisages a system for natural language processing which accurately completes the partial dependency analysis. Also, the system provides complete dependency analysis and dependency tree representation for the natural language sentences.
In accordance with the present invention, a natural language input sentence is received by the proposed system. The system analyzes elements present in the natural language input sentence and provides grammatical features of these elements. These grammatical features are updated in features structures associated with these elements. The proposed system then uses these grammatical features to group the elements into local word groups and further process the elements in these local word groups to determine their dependency relationships to provide partial dependency analysis for the input sentence.
This partial dependency analysis includes local word groups which can be represented as nodes on a dependency tree and arcs between the elements of the local word groups represent the dependency relationships between the elements of the group.
If however, in a local word group exact connectivity is not identified for adjacent elements then these elements cannot be represented on the dependency tree. The present invention overcomes this limitation in the partial dependency analysis by introducing a 'bag' structure. The bag holds the elements which do not have their dependency relationships identified.
In accordance with one aspect of the present invention, a bag contains one or more elements as members with the following properties:
• Partial specification: partial specification indicates that dependency relations among member elements in a bag may or may not be specified. Thus, a bag allows partial specification of a dependency tree;
• Encapsulation: encapsulation indicates that there can be no relations from a member element inside a bag to any element outside of the bag. In other words, relation arcs cannot cross the boundary of the bag. Once the elements are put inside a bag, even though relations among them may not be marked, it is guaranteed that relations can exist only among the member elements; and
• Head: One of the member elements in the bag is the head; however, it may or may not be marked.
In accordance with another aspect of the present invention, a bag has a category and feature structure associated with it. Additionally, bags are permitted to be present inside bags making them nested bags. The present invention defines various types of bags and various operations which can be performed on these bags to complete the partial analysis.
Referring to the accompanying drawings, FIGURE 1 illustrates a schematic of the system for natural language processing. The system comprises a pre-processing unit 100 which performs the pre-processing operations on the natural language sentence. The preprocessing unit 100 includes a morphological analyzer 102 which receives and morphologically analyzes elements present in a natural language input sentence and provides their grammatical features. These grammatical features include providing information on the part of speech category of the elements, their root, their number and case. Each element in the input sentence is associated with a feature structure and this feature structure is updated with information on the grammatical features for each of the elements. Typically, a feature structure is a data structure, in which a plurality of feature attributes can be defined to represent feature-values indicating the property/characteristic of an element. In accordance with the present invention a plurality of attributes are
defined in a feature structure 'fs' including 'root' for head of the group, 'posn' for the position number, 'tam' for tense access modality, 'cm' for case marking, 'pos' for part of speech category identification and the like.
The pre-processing unit 100 further includes grouping means 104 co-operating with the morphological analyzer 102 to group the elements into local word groups based on the grammatical features of adjacent elements in the input sentence. For instance, the grouping means 104 groups function words like prepositions, pronouns, auxiliary verbs and conjunctions with context words (words providing important information in a sentence) to form local word groups.
These local word groups are provided to pre-processing means 106 which processes elements in the local word groups to determine their dependency relationships based on predetermined linguistic knowledge of the natural language. The analysis is provided in the form of a partial dependency analysis for the input sentence.
In accordance with the present invention, the partial dependency analysis consists of local word groups which can be represented as nodes on a dependency tree. However, the nodes of the partial dependency analysis do not identify interconnectivity of some of the elements in the local word group even after pre-processing. Hence, the pre-processing unit 100 includes collation means 108 to collate the local word group(s) into discrete bag(s) in the event that the dependency relationships between the elements in a local word group are not identified.
In accordance with yet another aspect of the present invention, the elements in a bag do not have any order indicated by the bag itself. The feature structure for the bag includes a position attribute 'posn' which is used to indicate the position of the elements in the bag and position of the elements in the partial analysis. Hence, the collation means 108 includes sequence generation means 110 to generate sequential numbers to identify the order of the elements in the partial dependency analysis. The number generation is determined by a set of predetermined rules including:
• the inner bag may be given a position corresponding to the 'posn' number of its head;
• The 'posn' attribute of an element indicates that every element has a 'posn' attribute whose value is defined as follows:
■ for a token, the value of 'posn' attribute is given by its position in the input order of the sentence;
■ for a local word group, the value of 'posn' attribute is given by the position of its head word (which will be the same as the value of attribute 'posn' of its head word if the same attribute is used at the analytic level); and
■ for a bag, the value of 'posn' attribute is given by the value of attribute 'posn' of its head element. The value of 'posn' attribute of every member element in a bag is a unique number. In other words, no two members of the bag have the same number as the value of the attribute 'posn'. The order defined over member elements of a bag using the value of 'posn' attribute of the member elements gives a linear order.
The collation means 108 includes bag definition means 112 which defines the part of speech category for the bag and its elements. In addition, the bag definition means 112 creates the feature structure for the bag and inserts pre generated feature-values for the predetermined attributes for instance, the 'posn' numbers and the category generated for each of the elements present in the bag and adds the bag markers to the local word groups whose dependency relationship is not identified.
The functionality performed by the pre-processing unit 100 will be described with
examples hereinafter. For instance, in the following natural language input sentence
'Children are watching programmes on TV the morphological analyzer 102 analyzes the
elements in the sentence and determines their grammatical features as follows:
(Children )( are watching) ( programs) ( on TV)
NN VAUX VBG NN PREP NN
where, NN indicates noun, VAUX indicates verb auxiliary, VBG indicates verb and PREP indicates preposition.
Based on these grammatical features the grouping means 104 groups the adjacent function words based elements with context word based elements. The local word groups are seen with a pair of double parenthesis as seen below:
((Children)) (( are watching)) (( programs)) (( on TV)) NPNNVGVAUX VBGNPNNNP PREP NN
Along with grouping, the grouping means 104 also determines the part of speech category of the local word groups like NP for Noun Phrase and VG for Verb Group. The part of speech category of the local word groups is determined as the part of speech category of the head element of the group. In accordance with this invention, the local word groups may contain one or more elements.
Hereinafter, the notation followed for representing a dependency analysis for a natural language sentence will include:
• marking of local word groups and bags by double parenthesis;
• marking the part of speech category below the words; and
• marking the group part of speech category below the opening parenthesis '(('.
After processing the local word groups, the information pertaining to prepositions and auxiliary verbs and the like gets stored in feature-values of the feature structures of the respective elements in accordance with this invention.
After the local word groups are fully processed, function words such as auxiliary verbs, preposition, post-positions and articles are completely removed as lexical items depending on the linguistic knowledge used by the pre-processing means 108. In the example below, 'are' is removed because the information is already aggregated as 'be' in the value of attribute TAM (Tense-Aspect- Modality label) for the local word group for the verb 'watching'. Similarly, the lexical item 'on' is removed, as 'on' appears as value of attribute 'cm' for Case marking for local word group. The revised dependency analysis is seen as follows:
children (( are watching)) programs (( on TV))
NNVG VAUXVBG NNNP PREP NN
TAM=be_ingcm=prep__on
Thus, the introduction of local word groups enables creation of nodes, which can keep aggregation of information from lower levels as part of feature structures of nodes.
In the above notation, the sentence can also be represented horizontally and their categories and features may be marked below. In other words, the sentence is written vertically. The vertical notation is represented as follows:
Children NN
(( VG
are VAUX
watching VBG
))
programmes NNS
(( NP
on PREP
TV NN
))
PUNC
On indicating the information about TAM and CM as feature values and dropping the function words, the vertical notation is represented as follows:
Children NN
(( VG
watching VBG
))
programmes NNS
(( NP
TV NN
))
PUNC
Dependency tree for the sentence may also be shown pictorially as seen in FIGURE 2 where the local word group nodes are marked by square brackets. Alternatively, the root of the head word may be pictorially shown inside the square rackets (and may be replaced by root forms) to show in a more compact form as seen in FIGURE 3
In accordance with the present invention, there is provided a repository 114 which stores transformation rules for the natural language and the operations which are to be performed based on the corresponding rules. These transformation rules enable processing unit 116to progressively parse and perform predetermined operations on elements in the bag. On progressively parsing the individual elements in each of the bags the processing unit 116 identifies the dependency relationships between elements in a bag.
The processing unit 116 includes first fetching means 118 to fetch the rules from the repository 114 and a parsing engine 120 to parse the elements in the bag based on the fetched rules. The parsing engine 120 firstly identifies the type of bag and further based on the type and rules, performs predetermined operations on the bag.
The various types of bags and the operations which can be performed on those types of bags are explained in detail hereinafter.
Anchored Bag: A bag in which one of the member elements is marked as head is called an anchored bag. As per the rules, an anchored bag must have a category. For linguistic consistency, the category of the bag must be compatible with its head.
Saturated Bag: An anchored bag in which there is a dependency tree spanning all the member elements and rooted at the head element is called a saturated bag. The term 'saturated' relates to no more dependency information or analysis at that level can be added to the bag. As, per the rules in the repository 114, if there is a spanning dependency tree rooted at an element other than the head element, it is an inconsistent bag and is not permitted.
Unsaturated Bag: A bag which does not have its head marked, or does not have a spanning tree covering its member elements is called an unsaturated bag.
The parsing engine 120 can perform the below operations, which are predefined in the repository 114, on these types of bags:
Mark Head Operation: Mark head operation marks the head of anunanchored bag, typically, using the natural language linguistic knowledge.
Mark Dependency Relation Operation: Mark dependency relation operation takes a bag and puts one or more links indicating dependency relations among members of a bag.
Mark Full-Subtree Operation: Mark full-subtree operation takes an anchored bag and performs two tasks on it:
■ identifies a child 'c' of the head of the given anchored bag from among the members of the bag, and marks dependency relation from head to 'c'; and
■ identifies all the dependents of the child 'c' among the members of the anchored bag, and builds a spanning tree rooted at 'c' over the identified members.
Saturate Bag Operation: Saturate bag operation takes a bag and performs typically, two tasks on it:
■ marks one of the members of the bag as its head, if not already marked; and
■ builds a spanning tree over members of the bag, with the tree rooted at the head. For performing these operations, the saturate bag operation makes use of operations such as mark head, mark dependency relation, and mark full-subtree.
Thus, the parsing engine 120 receives the information on the elements in the bag(s) and the rules. If on parsing the elements the parsing engine 120 does not locate a head for the bag; then based on the fetched rules, the parsing engine 120 checks if the dependency relationships are identified in the bag. If yes, then the parsing engine 120 performs the 'Mark Head' followed by 'Saturate bag' operation. Alternatively, if the dependency relationships are not identified amongst the elements in the bag, the parsing engine 120 performs the 'Mark Head' operation followed by the 'Mark Dependency relation' operation to mark the dependency information in the bag and further saturate the bag using 'Saturate bag' operation to indicate that the spanning tree is complete.
After the parsing engine 120 has saturated the bag, the saturated bag is passed to a post processing unit 122 which reduces the bags. In accordance with the present invention the elements for which dependency information is fully marked can be moved out of the bag and represented as part of a normal dependency tree. The post-processing unit 122 cooperates with the repository 114and the processing unit 116and performs operations for reduction of bags. The post-processing unit 122 includes second fetching means 124 to fetch bag reduction and elimination rules from the repository 114, reduction means 126 which facilitates reduction and elimination of saturated bags and dependency tree creation means 128 torepresent and merge elements of reduced and eliminated bag(s) as a part of a dependency tree.
The second fetching means 124 fetches the transformation rules for reduction and elimination of bags and passes them to the reduction means 126. The reduction means 126 for reducing the saturated bag performs the reduce operation. If dependency relations among elements of a bag are specified, the elements may be moved out of the bag based
on the predefined rules. The reduce operation is a combination of the sprout operation and light anchor operation which are herein below explained in detail.
Sprout Operation: Sprout operation takes an anchored bag, and picks a dependent child 'c' of the head of the bag, and moves it out of the bag based on the following rules:
■ For the child 'c' of the head moved out of the bag, all its descendants must also be moved out of the bag;
■ the dependency relation between the head and the moved child element 'c' is removed. Instead, dependency relation is marked between the bag and 'c'. As per the property of bags, the dependency arc cannot cross the boundary of the bag; and
■ dependency relations among the elements moved out of the bag are left untouched. Note that for the sprout operation to be performed, the bag has to be anchored because the head is to be known, for its child to be picked.
In accordance with the rules defined in the repository 114, the head element in a bag cannot be moved out of the bag by performing a sprout operation; and when a saturated bag is sprouted, it remains a saturated bag.
Typically, the sprout operation is performed on saturated bags by the reduction means 126. A sprout operation may also be performed on an unsaturated anchored bag; however this type of sprouting is termed as safe sprouting. The rules for safe sprouting operation are defined as follows:
Safe Sprouting Operation: A sprouting operation on an unsaturated anchored bag is called safe (or a safe sprouting) provided the element that is picked has all its descendants identified and the dependency relations are marked already,in other words, no other members of the bag will be added as its descendants during further processing on the bag. In accordance with the rules, when a sprouting operation is performed on a saturated bag, it is guaranteed to be a safe sprouting. For instance, refer to elements al, a2, h (head) and a3 of the following saturated bag:
The bag may be sprouted by reduction means 126 by picking on a2:
ala2((ha3))
where a2 and its descendent al are moved out, and the relation r2 which is between h and
a2, is changed to be between the bag and a2.
Pictorially, before sprouting the tree consists of a single bag with head h:
[h] (with spanning tree inside the bag, on elements al, a2, a3 and h, and rooted on the
head h) and after sprouting the tree becomes:
[h] (with spanning tree inside the bag, on elements
a3 and h, and rooted on the head h)
In the reduced bag ((h a3)), a3 may be also picked and moved out, resulting in a bag (containing head h) which cannot be sprouted any further. Abag which contains only one element, namely its head is called a fully sprouted bag in accordance with this invention.
In accordance with the rules, bags are permitted to be inside bags as nested bags. Let a bag 'E' contain elements el to en, in which, ep is another bag:
((el, e2, ...ep,...en»
withep containing the following members:
ep = ((pl,p2,...pj...pm))
As part of the definition of bag, the dependency arcs may occur among the elements el to en, or among pi, to pm. No element inside the inner bag ep can have a dependency arc with an element in the outer bag. For example, in the nested bag below:
((the magnificent white mountains (( with silver clouds))))
NP NN NP NN
head=m name=m head=c name=c
'silver' cannot have a dependency arc with any outer elements like 'the', 'magnificent', 'white', or 'mountains'. Once the bags are saturated the representation appears as below:
((themagnificent white mountains (( with silver clouds)))) NPDT JJJJ NNNPPREPJJNN
where DT represents determinant, JJ represents adjective, nmod represents noun modifier, r-adj represents an adjective relationship amongst the elements, r-prep for preposition relationship and r-det for determinant relationship. As seen in the representation that the no arc from the head 'mountains' is directed towards any element of the inner bag, and not directly to any of the members inside the inner bag.
The present invention differentiates between nested bags and non-nested bags by providing simple and complex type of bags. These bag types are explained as follows:
Simple Bag: A simple bag is a bag, all of whose elements are either words or local word groups only. In other words, none of the elements of a simple bag is a bag. Simple bags are sometimes loosely referred to as chunks. They simply stand for unexpanded dependency trees.
Complex Bag: A complex bag is a bag, in which one or more of whose members is a bag. In other words, it is a bag containing one or more bags as its member elements. In accordance with the rules, if a bag 'c' is a member of an anchored bag 'b', and is a root of a full sub-tree in the bag 'b', then bag 'b' can be sprouted by picking bag 'c' by a safe sprouting operation. For instance, consider the following bags in which the inner bag is not saturated and not an anchored bag:
((themagnificent white mountains (( with silver clouds))))
NPDTJJ JJNN*NPPREPJJMN
The inner bag is part of the full sub-tree of the outer anchored bag, which is known usually based on predefined linguistic information. For example, it is known that prepositional-phrases attach to a noun, and not to any other element in the noun phrase. After safe sprouting, the structure becomes:
((themagnificent white mountains)) ((with silver clouds))
NPDTJJJJNN*NP PREP JJNN
Nmod
Note that the edge going into NP (with silver clouds), earlier starting from the head (mountains) is changed to start from the NP bag containing the head, as is done in any sprouting.
The present invention also proposes bags with Light Anchors and heavy anchor bags to handle scenarios in which a head of the bag is another bag or alternatively is an word. These types of bags are explained as follows:
Light-Anchor Bag: An anchored bag whose head is a word or a local word group is called a light-anchor bag. In accordance with the present invention a simple and anchored bag is a light-anchor bag. A simple bag is one which does not have recursive bags, in other words does not have a bag as its member.
Heavy-Anchor Bags: An anchored bag whose head is a bag is called a heavy-anchor bag. For instance, a complex bag is one in which a head element happens to be a bag; therefore it is a heavy-anchor bag:
(( (( the magnificent white mountains)) with silver clouds))
NPNPDT JJ JJ NN * PREPJJ NN
The head itself is a bag however, it is unanchored. Therefore, although head of the outer bag is marked, the head of the head is not marked.
In accordance with the rules, the head bag cannot be picked; therefore, a heavy- anchor bag cannot be made into light-anchor bag by the sprouting operation. Thus, the present invention proposes the use of light anchor operation (which will be explained hereinafter) instead of sprouting for heavy anchor bags.
Still further in accordance with the rules, every anchored bag must be one of two types: either light-anchor bag or heavy-anchor bag. And, when a mark head operation is performed on a simple bag, it results in a light-anchor bag. For instance, a complex saturated bag whose head is a bag is shown below:
(((( the magnificent white mountains)) ((with silver clouds ))))
NPNPDTJJJJNN *NP PREP JJNN
The non-head member element can be picked using the sprout operation on the bag in accordance with the rules hence on performing the sprout operation a fully sprouted bag is created as seen below:
(( (( themagnificent white mountains)))) ((with silver clouds))
NPNPDTJJ JJNN * NP PREP JJNN
Furthermore, in accordance with the rules, a fully sprouted bag containing a single
element can be eliminated by the reduction means 126 using eliminate operation. On
using the eliminate operation the dependency analysis can be represented as below:
(( themagnificent white mountains)))) ((with silver clouds))
NPDT JJ JJNN * NP PREP JJNN
Lighten-anchor Operation: The lighten-anchor operation takes a heavy-anchor bag 'b' whose head 'h' is a saturated bag, and does two things namely eliminates the markers of the bag 'h', and the head of h is made the head of bag b.
The 'lighten anchor' operation can be explained using the example mentioned as follows wherein a representation consists of two member elements both of which are bags. The first element is the head and a saturated bag:
(( (( the magnificent white mountains)) ((with silver clouds))))
NPNPDTJJJJNN * NPPREPJJNN
r-det
After lighten anchor operation is performed by the reduction means 126, the dependency
analysis becomes:
(( themagnificent white mountains(( with silver clouds))))
NPDTJJJJNN *NPPREPJJNN
After the lighten anchor operation 'Mountains' the head of the head bag becomes the head of the outer bag. And, the arc from the first bag (which is the head) to the second bag is changed from 'mountains' to the second bag. Thus, the arcs continues from the head (mountain) of the outer bag to the bag 'with sliver clouds' (except that the head has become lighter).
In accordance with this invention, using the starting bag as in the above example, and applying the sprout operation to the head repeatedly till a fully sprouted bag is formed ((mountain)) gives the same results as the light anchor operation as can be seen below:
(( the magnificent white ((mountains )) (( with silver clouds))))
NP DTJJJJ NP* NN *NP PREP JJNN*
As seen above that the structure obtained is similar, except that the head of the outer bag continues to be a bag. Now if lighten operation is performed on the outer bag, the same structure is achieved. Thus, in accordance with the rules, for a bag 'b' whose head 'h' is a saturated bag, repeated application of sprout operation on the head h, and lighten head operation on the bag b, irrespective of the order of application of these operations, results in the same final structure.
Thus, thereduction means 126 performs the sprout and the lighten anchor operations which are collectively known as the reduce operation. Based on the rules, the sprout operation is performed when the head of a bag is a word or a token and light anchor
operation is performed when the head of a bag is another bag. However, both the sprout and the lighten anchor operations are performed to remove the elements from the bag whose dependency relations are identified. Once all the elements and their descendants are removed from the bag and only the head element remains in the bag the reduction means 126performs the eliminate bag marker operation which will be explained hereinafter in detail.
Explained herein below are the types of bags created in accordance with the rules on performing the reduce operation.
Fully Reduced Bag: A light-anchor bag which contains only one element, namely its head, is called a fully reduced bag. In accordance with the rules, it is not possible to reduce a full reduced bag any more, or in other words, it is not possible to apply sprout or lighten anchor operation on a fully reduced bag. In accordance with the rules, a fully reduced bag has only one member element, namely, its head, and that head is a word or a local word group.
Bag with Fully Reduced Anchor: An anchored bag whose head is a word, a local word group, or a fully reduced bag is called a bag with a fully reduced anchor.
Additionally, the present invention defines a number of types of bags where the property of the member bags holds recursively on the bag and its members. These bags types are explained herein below in accordance with the rules:
Super anchored bag: A super anchored bag is an anchored bag which satisfies one of the following rules:
■ either it is a simple and anchored bag; or
■ if it is a complex anchored bag each of its member bags is a super anchored bag.
Super Saturated Bag: A super saturated bag is a saturated bag which satisfies one of the following rules:
■ either it is a simple and saturated bag; or
■ if it is a complex saturated bag each of its member bags is a super saturated bag.
Super Light-Anchor Bag: A super light-anchor bag is a light-anchor bag which satisfies one of the following:
■ either it is a simple (and light-anchored) bag; or
■ if it is a complex light-anchored bag each of its member bags is a super light-anchor bag.
In accordance with the rules, a super light-anchor bag is a super anchored bag. And, a super saturated and super light-anchor bag 'b' can be made simple by repeatedly reducing 'b' by sprouting the member bags that are children of head of 'b', and their children and the like. At the end of the operations, the bag 'b' becomes a super saturated bag.
Super Fully-Sprouted Bag: A super fully-sprouted bag is a fully-sprouted bag which satisfies one of the following:
■ either it is a simple (and fully-sprouted) bag; or
■ if it is a complex fully-sprouted bag its head, which is the lone member bag, is a super fully-sprouted bag.
In accordance with the rules, bags recursively inside a super fully-sprouted bag consist of fully sprouted bags. Further, a fully reduced bag is a simple bag with only one element namely its head which in turn is a word or a local word group. Thus, there is no need to define super fully reduced bag as it is a simple bag already.
Eliminate Bag-Marker Operation: Bag markers for a saturated bag 'b' are removed by the eliminate bag marker operation and the following tasks are performed in accordance with the rules:
■ every dependency relation between the bag 'b' and an element external to the bag are changed to be between the head 'h' of the bag 'b', and the element;
■ All the relations of the head 'h' with other elements of 'b' which were earlier inside the bag continue as before, except that they are no longer inside the bag 'b\ as the bag markers have been removed; and
■ Features of the bag 'b' are copied on the head 'h\ If there is any conflict in values of an attribute, the value of the attribute in the bag 'b' overrides the value of the same attribute in the head 'h'. In accordance with the rules, feature structure of the bag holds aggregated information computed from the feature structures of the elements of the bag, and hence they take precedence over those of the head. For instance, a saturated bag containing a2 is given as follows:
al ((a2 a3 )) a4
After elimination of the bag marker, it becomes:
fsl+fs2
Thus, the elimination operation is provided in the event that the bags are not needed after they are saturated. All the relevant information is copied appropriately, only the marker of the bag is removed. It also can be used when a bag has a single bag as its member.
In accordance with this invention, the reduction means 126 provides the sprouted elements and eliminated bag marker data along with their dependency relationships to dependency tree creation means 128. The dependency tree creation means 128 based on
pre-determined rules appends the sprouted and the eliminated bag elements to the dependency tree for completing the partial analysis.
In accordance with the present invention there is provided a method for natural language processing as seen in FIGURE 4, comprising the following steps:
• creating a repository to store transformation rules for the natural language and its corresponding language processing operations, 1000;
• receiving and analyzing elements present in a natural language input sentence and providing grammatical features for each of the elements, 1002;
• grouping the elements into local word groups based on the grammatical features of adjacent elements, 1004;
• processing elements in the local word groups to determine their dependency relationships and providing a partial dependency analysis for the input sentence, 1006;
• collating local word group(s) in the partial dependency analysis into bag(s) in the event that the dependency relationships between the elements in local word group(s) are not identified, 1008;
• progressively parsing and performing predetermined operations on elements in the bag(s) based on the transformation rules to saturate the bag in the event that the partial dependency analysis is complete, 1010;
• reducing and eliminating saturated bag(s) based on the transformation rules, 1012;and
• moving corresponding elements of the eliminated bag(s) as a part of a dependency tree for the natural language input sentence, 1014.
In accordance with the present invention, the step of providing grammatical features for each of the elements includes the steps of creating feature structures for each of the elements and storing the grammatical features in the feature structure. And, the step of
collating local word group(s) in the partial dependency analysis into bag(s) includes the steps of:
• adding bag markers to the local word groups whose elements dependency relationship is not identified; and
• creating a feature structure for the bag, wherein the feature structure stores category of bag and feature values of the elements present in the bag.
the step of progressively parsing and performing predetermined operations on elements in the bag(s) based on the transformation rules includes the steps of:
• identifying type of bag;
• fetching the rules from the repository based on type of bag; and
• performing operations on the elements of the bag based on fetched rules.
Typically, the step of reducing and eliminating the bag includes the steps of:
• moving every element of the bag except the head element out of the bag wherein the dependency relationship of the moved elements with the bag is identified;
• removing the dependency relationship of the moved elements with the head node and marking it to the bag;
• marking a dependency relationship between the moved element and the bag;
• moving the head of the bag out of the bag; and
• copying the feature structure of the bag as the feature structure of the head node and removing the bag.
TECHNICAL ADVANTAGES
A system for representing natural language partial dependency analysis as described in the present invention has several technical advantages including but not limited to the realization of a system for natural language processing.
The envisaged system introduces 'bags'; which are data structures that hold elements of a partial dependency analysis which do not have their dependency relationships identified. Along with the bags, the system introduces various rules and operations. These rules and operations enable the processing of the elements in the bags to identify firstly their dependency relationships and later eliminate the bags and merge the elements and their descendants with a dependency analysis tree.
The advantage of creation of bags is that the bag isolates the elements whose dependency relationship is not identified from other elements of the partial dependency analysis and merges the elements to the dependency tree as the dependency relationships get identified. A bag can also hold multiple bags. The proposed system provides operation and rules which enable expansion and processing of each of the inner bags simultaneously to derive the dependency analysis.
Also, the system proposes a plurality of bag types which are determined based on the group(s) of words and the operations performed on the bag. These different types of bags facilitate the selection of appropriate operations. The system also provides efficient means for handling scenarios including nesting of bags and recursive bags.
Thus, providing an intelligent system which analyzes the partial analysis/ natural language sentence and selectively performs operations on the same to save time and reduce the number of operations.
Moreover, the present invention can be used to represent partial analysis of a sentence in a variety of applications including:
• Information Extraction for extracting information from sentences by giving their dependency analysis / partial dependency analysis;
• Machine translation for automatic translation from one language to another;
• Question-answering systems to answer a question based on collection of natural language documents requires processing of question as well as sentences in the documents or their partial analysis;
• Dialogue Systems for building dialogue systems to perform useful tasks; and
• Intelligent information retrieval (IR), semantic web and summarization.
While considerable emphasis has been placed herein on the components and component parts of the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiment as well as other embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
We Claim:
1. A system for natural language processing comprising:
• a pre-processing unit adapted to analyze elements present in a natural language input sentence and provide grammatical features of said elements and further adapted to group said elements into local word groups based on said grammatical features and process elements in said local word groups to determine their dependency relationships and still further adapted to provide a partial dependency analysis for said input sentence and collate said local word group(s) in said partial dependency analysis into bag(s) in the event that the dependency relationships between the elements in local word group(s) are not identified;
• a repository adapted to store transformation rules for said natural language and their corresponding operations;
• a processing unit co-operating with said pre-processing unit and said repository adapted to progressively parse and perform predetermined operations on elements in said bag(s) based on said transformation rules and further adapted to saturate the bag(s) in the event that said partial dependency analysis is complete; and
• a post-processing unit co-operating with said processing unit and said repository adapted to reduce and eliminate saturated bag(s) based on said transformation rules and further adapted to move corresponding elements of the eliminated bag(s) as a part of a dependency tree for the natural language input sentence.
2. The system as claimed in claim 1, wherein said pre-processing means includes:
• a morphological analyzer adapted to receive and morphologically analyze elements present in a natural language input sentence and further adapted to provide grammatical features of said elements, wherein each element in said input sentence is associated with a feature structure and said feature structure is updated with information on said grammatical features;
• grouping means co-operating with said morphological analyzer adapted to group said elements into local word groups based on said grammatical features of adjacent elements in said input sentence;
• pre-processing means adapted to process elements in said local word groups to determine their dependency relationships based on predetermined linguistic knowledge and further adapted to provide a partial dependency analysis for said input sentence, wherein said partial dependency analysis consists of local word groups representing nodes on a dependency tree; and
• collation means adapted to process said partial dependency analysis and further adapted to put said local word group(s) into bag(s) in the event that the dependency relationships between the elements in group are not identified.
3. The system as claimed in claim 1, wherein said collation means includes sequence generation means adapted to generate a sequential number to identify each element present in said partial dependency analysis based on predetermined rules.
4. The system as claimed in claim 1, wherein said collation means includes bag definition means adapted to define the category and the feature structure for the bag.
5. The system as claimed in claim 1, wherein said processing unit includes:
• first fetching means adapted to fetch transformation rules from said repository; and
• a parsing engine co-operating with said fetching means to perform predetermined operations on said elements of said bag(s) based on transformation rules.
6. The system as claimed in claim 1, wherein said post-processing unit includes:
• Second fetching means adapted to fetch transformation rules from said repository;
• reduction means adapted to reduce and eliminate saturated bag(s) based on said transformation rules; and
• dependency tree creation means adapted to represent elements of said bag(s) after reduction and elimination of said bag(s) as a part of a dependency tree.
7. A method for natural language processing comprising the following steps:
• creating an optional repository to store transformation rules for the natural language and its corresponding language processing operations;
• receiving and analyzing elements present in a natural language input sentence and providing grammatical features for each of said elements;
• grouping said elements into local word groups based on said grammatical features of adjacent elements;
• processing elements in said local word groups to determine their dependency relationships and providing a partial dependency analysis for said input sentence;
• collating local word group(s) in said partial dependency analysis into bag(s) in the event that the dependency relationships between the elements in local word group(s) are not identified;
• progressively parsing and performing predetermined operations on elements in said bag(s) based on said transformation rules to saturate the bag in the event that said partial dependency analysis is complete;
• reducing and eliminating saturated bag(s) based on said transformation rules; and
• moving corresponding elements of the eliminated bag(s) as a part of a dependency tree for the natural language input sentence.
8. The method as claimed in claim 7, wherein the step of providing grammatical features for each of the elements includes the steps of creating feature structures for each of the elements and storing the grammatical features in said feature structure.
9. The method as claimed in claim 7, wherein the step of collating local word group(s) in said partial dependency analysis into bag(s) includes the steps of:
• adding bag markers to the local word groups whose elements dependency relationship is not identified; and
• creating a feature structure for said bag, wherein said feature structure stores category of bag and feature values of the elements present in the bag.
10. The method as claimed in claim 7, wherein the step of progressively parsing and
performing predetermined operations on elements in the bag(s) based on the transformation rules includes the steps of:
• identifying type of bag;
• fetching the rules from said repository based on type of bag; and
• performing operations on said elements of the bag based on fetched rules.
11. The method as claimed in claim 7, wherein the step of reducing and eliminating the bag includes the steps of:
• moving every element of the bag except the head element out of the bag wherein the dependency relationship of the moved elements with the bag is identified;
• removing the dependency relationship of the moved elements with the head node and marking it to the bag;
• marking a dependency relationship between the moved element and the bag;
• moving the head of the bag out of the bag; and
• copying the feature structure of the bag as the feature structure of the head node, and removing the bog.
| # | Name | Date |
|---|---|---|
| 1 | 0534-CHE-2010 FORM-5 02-03-2011.pdf | 2011-03-02 |
| 1 | 534-CHE-2010-FER.pdf | 2020-02-28 |
| 2 | 534-CHE-2010 CORRESPONDENCE OTHERS 12-02-2015.pdf | 2015-02-12 |
| 2 | 0534-CHE-2010 FORM-2 02-03-2011.pdf | 2011-03-02 |
| 3 | 534-CHE-2010 CORRESPONDENCE OTHERS 26-02-2014.pdf | 2014-02-26 |
| 3 | 0534-CHE-2010 FORM-13 02-03-2011.pdf | 2011-03-02 |
| 4 | 0534-CHE-2010 DRAWINGS 02-03-2011.pdf | 2011-03-02 |
| 4 | 534-CHE-2010 FORM-18 26-02-2014.pdf | 2014-02-26 |
| 5 | Form-1.pdf | 2011-09-03 |
| 5 | 0534-CHE-2010 CORRESPONDENCE OTHERS 02-03-2011.pdf | 2011-03-02 |
| 6 | 0534-CHE-2010 ABSTRACT 02-03-2011.pdf | 2011-03-02 |
| 6 | 0534-CHE-2010 CLAIMS 02-03-2011.pdf | 2011-03-02 |
| 7 | 0534-CHE-2010 POWER OF ATTORNEY 02-03-2011.pdf | 2011-03-02 |
| 7 | 0534-CHE-2010 DESCRIPTION (COMPLETE) 02-03-2011.pdf | 2011-03-02 |
| 8 | 0534-CHE-2010 POWER OF ATTORNEY 02-03-2011.pdf | 2011-03-02 |
| 8 | 0534-CHE-2010 DESCRIPTION (COMPLETE) 02-03-2011.pdf | 2011-03-02 |
| 9 | 0534-CHE-2010 ABSTRACT 02-03-2011.pdf | 2011-03-02 |
| 9 | 0534-CHE-2010 CLAIMS 02-03-2011.pdf | 2011-03-02 |
| 10 | 0534-CHE-2010 CORRESPONDENCE OTHERS 02-03-2011.pdf | 2011-03-02 |
| 10 | Form-1.pdf | 2011-09-03 |
| 11 | 0534-CHE-2010 DRAWINGS 02-03-2011.pdf | 2011-03-02 |
| 11 | 534-CHE-2010 FORM-18 26-02-2014.pdf | 2014-02-26 |
| 12 | 534-CHE-2010 CORRESPONDENCE OTHERS 26-02-2014.pdf | 2014-02-26 |
| 12 | 0534-CHE-2010 FORM-13 02-03-2011.pdf | 2011-03-02 |
| 13 | 534-CHE-2010 CORRESPONDENCE OTHERS 12-02-2015.pdf | 2015-02-12 |
| 13 | 0534-CHE-2010 FORM-2 02-03-2011.pdf | 2011-03-02 |
| 14 | 534-CHE-2010-FER.pdf | 2020-02-28 |
| 14 | 0534-CHE-2010 FORM-5 02-03-2011.pdf | 2011-03-02 |
| 1 | 2020-02-1411-56-54_14-02-2020.pdf |