Improved Xml Accelerator And Method For Hardware Implementation Of

< Back

Improved Xml Accelerator And Method For Hardware Implementation Of Tree Automata For Xml Processing

Abstract: Abstract I) AN IMPROVED XML ACCELERATOR AND A METHOD FOR HARDWARE IMPLEMENTA-TION OF TREE AUTOMATA FOR XML PRO-CESSING lb meet the challenging requirements of fast XML processing, this re¬port describes a hardware based, high throughput, area efficient and robust approach to XML parsing. A key advantage is the use of tree grammars/automata which provide a powerful, simple and theoreti¬cally sound formalism for XML processing that can be extended in a relatively straightforward manner to other XML processing tasks such as query and transform. Further, the developed parser is more effi¬cient compared to LL/LR parsers typically used if the CFG (Context Free Grammar) formalism is chosen. The efficiency of the approach is demonstrated by an XML parser implementation which is quite com¬pact and operates at a sustained throughput in excess of 1 Gbps. For the first time, hardware implementation of tree automata has been used for XML processing including parsing, schema validation and query op¬erations. Figure 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 September 2012

Publication Number

44/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SATYAM COMPUTER SERVICES LIMITED

UNIT N. 12, PLOT NO. 35/35, HI-TECH CITY LAYOUT, SY.NO.64, MADHAPUR - 500 081

Inventors

1. REETINDER PAL SINGH SIDHU

C3-404, WHITEHOUSE APTS., RT NAGAR, BANGALORE - 560 032

Specification

FIELD OF THE INVENTION
The present invention relates to an improved XML accelerator and a method for hardware implementation of tree automata for XML processing, includ¬ing parsing, schema validation and query operations.
BACKGROUND OF THE INVENTION
1 Introduction
Due to the widespread use of XML, the ability to quickly process XML documents is crucial for a variety of applications. Also, use of XML is rising rapidly. So the importance of high throughput XML processing is expected to increase significantly in the near future.
-Traditionally, XML parsers have been implemented as software running on a single processor core. The problem is that their performance is not good enough. As a result, some work has been done in developing parallel software parsers executing on multiple cores. But it does not seem very successful because the coarse grained parallelism provided by multiple cores is not a natural fit to the fine grained parallelism of XML processing. Some preliminary work has also been done in FPGA (Field Programmable Gate Array) based XML parsing but it is limited to regular expressions and cannot be used for proper XML parsing.
This document describes a high throughput, robust approach to XML parsing using FPGAs that can be extended in a relatively straightforward manner to other XML processing tasks such as query and transform. A key advantage is the use of tree grammars which provide a powerful, sound and simple formalism for XML processing which can be efficiently implemented. The1XML parser implemented operates at a throughput in excess of 1 Gbps and its logic requirements are quite low. As far as the developers are aware, this lis the world's first implementation in hardware of tree automata for XML parsing. Above observations are explained in the following sections.
2 Related Work
2.1 Theory
The two theoretical foundations used for XML processing are the ones based on Context Free Grammars (CFGs) and regular unranked tree grammars (henceforth called tree grammars). CFGs are the traditionally used formal¬ism [9]. CFGs are typically parsed using LL or LR parsers. In each iteration of an LL parser, multiple push operations may be required. In each iter-

ation of an LR parser, multiple pop operations may be required. Also, in case of both LL and LR parsers, more than one iteration may be required to process a single token. Therefore, disadvantages of CFG based parsing are that it can be slow and that its throughput is variable, depending upon the structure of the input XML document.
Tree automata have been studied since the sixties [3J and used for XML since'the nineties [12][14j. Tree grammars provide a powerful, simple for¬malism for expressing XML schemas. Tree grammars are not only useful for parsing but can provide a foundation for XML query and transform opera¬tions as well. As can be seen from [13], the class of tree languages includes other well known schema languages such as DTD and XML schema. Thus using tree grammars it seems various XML schema languages can be treated in a unified manner. Also, the parsing method for tree grammars is more efficient. Each iteration of the method requires atmost one stack opera¬tion. Further, just one iteration is required to process a token. Due to the above advantages the XML parser developed was based on a tree grammar formalism.
2.2 Software Approaches
Traditionally, XML parsers have been implemented as software running on a single processor core. The problem is that their performance is not good enough [19]. Commercial XML parsers such as libxml, Xerces and XML4C achieve a best throughput of 500 Mbps (estimated assuming a 3 GHz proces¬sor, and 70 cycles per byte [4]). Commercial XML Parsers such as from IBM (DataPower) and Intel (SOA Expressway) appear to have no performance data available. However, estimates based on available data [6] suggest that parsing throughput is somewhat low. Software XML processing throughput is thus inadequate for wire-speed and other high throughput applications.
As a result, some research work has been done in developing parallel software parsers executing on multiple cores. Typically, current approach is to'split XML data into chunks which are distributed to threads. It is a complex approach with load balancing issues and either extra pre processing • [10], or speculative execution [16]. It does not seem very successful because the coarse grained parallelism provided by multiple cores is not a natural fit to the fine grained parallelism of XML processing.
2.3 Hardware Approaches
Due to inadequacy of software solutions, several companies, such as Confor-mative Systems, Data Power, Dajeil, Tarari, Xambala and Ximpleware have worked on hardware XML processing, but only Tarari (which was bought by LSI Corp.) seems to have been somewhat successful [8].

But the approach yields only limited acceleration [7j relying on the mi¬croprocessor to do a lot of the work. Performance throughput is roughly estimated to be 750 Mbps (assuming a 3 GHz processor and the quoted 40 cycles per byte for parsing). Also, performance seems sensitive to XML doc¬ument structure. Thus, the approach appears to have serious shortcomings.
In academia, some preliminary work [11] has also been done in FPGA (Field Programmable Gate Array) based XML parsing but it is limited to regular expressions and cannot be used for proper XML parsing. A more recent approach [4], which claims a throughput of 1 Gbps seems to be most competitive.
2.4 Patents
2.4.1 Comparison with Patent no. US 7055093 [15]
• Patent method does not appear to do tag comparison at all.
• Patent method pushes init state onto stack for each begin tag.
• Comparison of number of stack/memory operations per tag:

- Patent method does 1 push for start tag and 2 pops and 1 push for end tag for an average of 2 stack operations per tag
- Our method does 1 push for start tag, 1 pop for end tag. Further we push and pop each tag as well to ensure proper start/end tag matching which also the patent method ignores. Also, the tag push/pop can be optimized to be performed simultaneously with the tag push/pop on the same wide stack in fact as is currently done in our implementation.
- So it seems compared to patent method, we need just atmost one stack operation per tag
• Focused specifically on validation only, we focus on query, tree reading
and (possibly) transform as well.
2.4.2 Comparison with Patent no. US 2010/0094906 Al (5]
• Claims seems to talk about union etc. of machines (apparently at runtime) representing automata while we have a priori fixed automata
• Model for automata are essentially claimed to be Visibly Pushdown Automata, which are not the same as the tree automata that we use.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The invention can now be described in detail with the help of the figures of the accompanying drawings in which
Figure 1 shows block diagram of parser datapath hardware logic for XML processing,
Figure 2 shows block diagram of lexer hardware logic for XML processing,
Figure 3 shows XML Accelerator block diagram,
Figure 4 shows control logic finite state transducer for parser,
Figure 5 shows control logic finite state transducer for parser with error conditions,
Figure 6 shows book schema: (a) lexical grammar, (b) tree grammar,
Figure 7 shows book example XFA (intial state: 0, final state: 1),
Figure 8 shows book example YFA,
Figure 9 shows XML conforming to book schema,
Figure 10 shows XML data tokens,
Figure 11 shows book XML parsing, and
Figure 12 shows example tree grammars for XML querying.
DESCRIPTION OF THE INVENTION
3 k Description
3.11 Definitions
3.1.1 Basics
Basics We use the standard mathematical notions and terminology of a set, ordered pair, and sequence1.
Symbols Let sym be a set of symbols*.
'Sequence being defined in terms of nested ordered pairs: (a,b,c) for example is defined to be (a,(b,c)).
sThe set sym does not contain the symbols & e, |, *, ( and ) which are utilized for various other purposes.

String The empty string, denoted by e, which contains no symbols at all, is defined to be a string. If a is a symbol in sym and s is a string over sym, then so is sa, formed by appending o to s. Nothing else is a string over sym.
Hedges and Trees The empty hedge, denoted by e, which contains no symbols at all, is defined to be a hedge. If a is a symbol in sym and h is a hedge over sym, then the ordered pair (a, h) is a tree over sym. If g and h are hedges over sym, then so is gh, formed by appending h to g. Nothing else is a hedge, or a tree.
3.1.2 Regular Expressions
Syntax Given a set of symbols sym, any s e syms, and the symbols and e'are regular expressions over sym. Further, if P and Q are regular expressions over sym, then so are (PQ), (P|Q) and (P*). Nothing else is a regular expression.
Semantics A regular expression is a pattern that matches zero or more strings. ^ matches no string, c matches the empty string, and any s 6 syms matches itself (the string s). Further, if p and q are regular expressions, then {jpq) matches any string a prefix of which is matched by p and the rest by q,\ (p\q) matches any string matched by p or q, and (p*) matches any string composed of a sequence of zero or more strings matched by p. Thus a regular expression denotes a set of strings that it matches.
3.1.3 Lexical Grammar
A lexical grammar over a set of symbols, syms, is composed of two things: a set of tokens, called tokens and a set of rules, called ruies, each element of which is a pair (t,r), whose first element t belongs to tokens and whose second element r is a regular expression over syms.
3.1.4 Tree Grammar
A tree grammar over a set of symbols, syms, is composed of three things: a set of variables, called vars, a root expression, called root-regex, which is a regular expression over vara, and a set of rules, called ruies, each of which is a triple (i,o,r) which is an ordered pair whose first element x belongs to vars, second element a belongs to syms and third element r is a regular expression over vars.
Intuitively, each (x, a, r) rule can be thought of as assigning to variable x a'tree pattern, with symbol a as the root node of tree and the regular expression r describing the pattern of child nodes of a.

3.1.5 Finite Automata (FA)
An NFA (Nondeterministic Finite Automaton) over a set of symbols sym, is composed of: a set of states, one of which is called the initial state, a set of final states (which is a subset of states) and a set transitions each element of which is a triple (p, u, q) where p and q belong to states and u belongs to sym.
A DFA (Deterministic Finite Automaton) is an NFA with the added restrictions that the set final states contains a single element (called the final state) and no two ordered pairs in trans can have the same first element (if p, u, q and (j/,u', q1) are distinct elements of transitions, then either p ^ j/ or u / u').
3.1.6 Tree Automata
An NTFA (Nondeterministic IVee Finite Automaton) is composed of seven things: a set of symbols syms, a set of states xs, another set of states ys, a set of initial states initial (subset of xs), a set of final states finals (also a subset of xs), a set of transitions xtrans each element of which is a triple (p, u, q) where p and q belong to xs and u belongs to ys, and a set of transitions ytrans each element of which is a triple (p, s, u) where p belongs to xs, s belongs to sym and u belongs to ys.
A DTFA (Deterministic Tree Finite Automaton) is an NTFA with the added restrictions that the set ini of initial states contains just a single element (called the initial state) and no two ordered pairs in either xtrans orytrans can have the same first element (if p, u, q and {jpl, u', q1) are distinct elements of xtrans or ytrans, then either p ^ j/ or u ^ u').
Given an NTFA (xs,ys,initial,Gnab,xtrans,ytrans) it can be converted into a DTFA (xs',ys',initial',8nals'jctrans',ytrans') whose states are subsets of NTFA states. The initial' state of the DTFA is the set initial of the NTFA. For each (p', s, u') in xtrans, there is a (p, a,«) in ytrans such that p is in p' and u is in u'. Also, for each (p',«', q1) in xtrans, there is a (p', u', q1) in ytrans such that p is in p7, u is in v! and q is in q1.
It is convenient to think of the tree automaton as being composed of two finite automata: the x finite automaton (XFA) and y finite automaton (YEA). The XFA is defined, as is usual for a finite automaton, in terms of sets of symbols, states, initial states, final states and transitions which are ys, xs, initial, finals and xtrans respectively. The YFA is define in terms of sets of symbols, states and transitions which are sym, xs U ys and ytrans respectively. YFA has no initial or final states and each of its transitions is form a state in xs to a state in ys.
Note that the input to the tree automaton is the input of the YFA whose states are the input to the XFA whose final states in turn are also some of the states of the YFA. Thus the tree automaton can be thought of as composed

of two closely coupled interacting finite automata, both of which operate to proce'sYeach input symbol.
3.2 Method Overview
Subsequent methods use three sets of symbols: input XML characters (chars), ,
tokens formed by lexical analysis of input XML (tots), and regular expres¬sion symbols used to specify XML tree patterns (tsyms). So for our pur¬poses, the set of syms and in the lexical grammar definition correspond to , and the sets syms and vars in the tree grammar definition correspond to toks and tsyms.
The following two sections describe methods that translate given tree grammar into tree automaton. The last section describes the operation of the tree automaton for schema validation as well as query.
3.3 !f Automaton Construction from Grammar
Method 1 performs XFA construction. The input to the method is the tree grammar and its output is the XFA. The tree grammar is represented as a pair! of root regular expression and set of rules. The XFA is represented as a triple' of transitions, initial state and set of final states. At loop termination, a nondeterministic XFA is constructed.
Method 2 performs YFA construction. The input to the method is the tree grammar rules and the states of the XFA obtained from methodl. The output is a set of transitions representing the YFA. At end of the loop, a nondeterministic YFA is constructed. The state in each triple is a final state of the" nondeterministic XFA.
From above NTFA a DTFA is obtained (as described in section 3.1.6) by computing the deterministic versions of the XFA and YFA using a variation of the subset construction technique.

3.4 Table Generation from Automaton
For the XFA and YFA constructed above, state assignment is performed and state tables xtab and ytab respectively are generated. For the XFA, an additional table x6n, used to detect final states, is also generated. The methods used are quite simple.
3.5 Parser Method
The parsing method and its use for schema validation, query and well-formedness checking is described below.
Method 4 performs schema validation. The tables xtab, ytab and xBnal are generated as described above. The processing shown in figure 11 can help understand the method. Each non-leaf node in above figure corresponds to an'open and a close token. When an open token is received, it and the XFA state are simply pushed onto the stack. When a closed token is received, a token and XFA state are popped off the stack. After verifying that the tokens match, both the XFA and YFA transitions for the node are performed by lookup of the xtab and ytab state tables and XFA state is updated. A leaf token can be thought of as an open token followed by close token in which case the stack operations and token comparison are eliminated, leaving only the state table lookups updating the XFA state. After all tokens are processed, a check on whether the XFA state is a final state or not determines validity.

3.5.1 Query
Remarkably enough, essentially the same method can perform the query operation as well. To perform a query operation, one change required is in method 1 where the Baals now are the final states of the NFAs corre¬sponding to the query matching rules (explained in section 3.6.3). Also, since the query match can occur multiple times while processing, the xiinal table lookup is now performed at the end of close and leaf token processing. Finally, appropriate action needs to be taken for a query match. If, along with each token, its location is also supplied (which the lexer can easily do), then the locations of the open and close tokens can be output. The stack contents, which describe a path from the root to the matching node, can also be output. For performing multiple queries simultaneously, the xfinal table would need to output multiple bits, each corresponding to a specific query.
3.5.2 Well-formedness Checking
The proposed approach performs basic well-formedness checking during lex¬ical analysis (checking tags are well-formed) and parsing (checking proper nesting of tags). It may be possible to perform more complete well-formedness checking using proposed XML processing approach by converting XML stan-

dard grammar rules into a tree grammar. While doing so seems feasible, performance and resources required are not clear at present.
3.6 Example
Below, a simple example tree grammar, its corresponding tree automaton, schema validation of simple XML data, and example query grammars are described. »
The lexical and tree grammars for a very simple book schema are shown in figure 63,. In the lexical grammar, for each tag rule, a pair of regular expressions is derived to match open and close tags. In the tree grammar, for each leaf rule, the regular expression is not specified and is taken to be the empty string. Informally speaking, the rules specify tree patterns such that, a book begins with either foreword or preface, followed by one or more chapters each containing zero or more sections, each preface, foreword and section containing one or more lower case text characters. Note that, although the rootjcegex in this example is a single tsym, it can be an ar¬bitrary regular expression. Also, nothing special is required for attributes. For example, for a token tok and tsyms A,B,V, W, with former two specified to be attributes and latter two their values (all using leaf rules}, the rule 'T -»tok" AV(BW)?..." specifies tok having attributes A and B, B being optional.
3.6.1 Tree Automaton (XFA and YEA)
Figures 7 and 8 shows the deterministic versions of the XFA and YFA ob¬tained using the methods 1 and 2 respectively on the above book grammar. Edge labels for the XFA are tsyms and for the YFA are toks. States 0,4,5,6 and 7 are shared between the two automata4. Remaining YFA states are tsyms which are used as input to the XFA.
3.6.2 Schema Validation
XML data conforming to the book schema is shown in figure 10(a), and the the tokens obtained after its lexical analysis (according to the book lexical grammar) are shown in figure 10(b). To form an idea of how tree automaton based parsing works, parsing of above tokens using the book example XFA
3The XML header and whitcspacc in the input XML can bo handled in a straight¬forward manner by specifying regular expressions for each in the lexical grammar and including their tokens in appropriate places in the tree grammar, but, for simplicity, both are omitted from the example.
4 As a valid tree grammar must include at least one leaf rule, its corresponding NFA for the empty string e regular expression has single state which is both initial and final. Due to this, in general, the initial state of the XFA is one of the states shared with the YFA.

and YFA is shown in figure 11. The XML data is shown as a tree. Transitions represented by dashed lines are of the XFA and those represented by dotted lines are of the YFA. Transition labels, for the former are YFA tsym labeled states while for the latter are tokens (tree nodes). Consider the operation for the text token which is child of pref. The YFA, which is in intial state 0 transitions, due to token text, into state T. The XFA, also in intial state 0, transitions due to tsym T into state 5, from which the YFA in turn transitions, due to token pref, into state P. Eventually, at the root node, the XFA | transitions from state 0, due to tsym B, into accepting state 1, thus validating the input XML.
3.6.3 Query
Figure 12 shows the tree grammars required for two example queries. The . first one,1 which conforms to the xpath //chap/sec and matches all sections in the book. In the grammar, star is a special token that matches all tokens. Also, _ is a tsym that matches arbitrary trees. The LHS tsym in the last rule is in bold indicating that a match occurs when this rule (called a matching rule) is "executed". Given an xpath composed of child and descendant axes nodes, it'can be converted in a straightforward manner into corresponding tree grammar. However, more powerful queries than above, are enabled by tree grammars. The second example matches the second section of the first chapter of a book if it contains a preface. Such queries cannot be performed using the child and descendant axes which have been so far implemented efficiently on FPGAs. Multiple queries can also be expressed in a single tree grammar.
3.7 Parse Iree Method
A parse, tree of the input XML data can be quickly generated in hardware and sent to the host system for further processing in software. The input is stream of token data generated by the DLFA from the XML data. For each token, the input data is composed of the token, its type, and its start and end locations in the XML data. The output is one or two arrays representing the parse tree, each array element corresponding to a node in the tree.
Basic Method
The method uses a stack to traverse the input tokens and writes out the parse tree in post-order: data for all subtrees of a node is written out (in the order they are encountered) immediately followed by data for the node. Data written for each node is the end location of the token (closing token for non-leaf nodes) and a pointer to the preceding sibling, except for the first sibling (which no preceding sibling) for which instead the start location of the ,token (opening token for non-leaf nodes) is written. To distinguish

between the two, a one-bit flag (Sag) is also written. A node is the first sibling if the previous stack operation was a push when its opening token is encountered. Previous operation is set to pop for a leaf token as it can be thought of as an open token followed by corresponding close token. The method takes a small, constant amount of time per token and requires a constant 'amount of space, independent of the input XML size, which is a key advantage. The method is thus quite efficient.
Data written out above can be read into an array and used as follows. Given a node's pointer (index in array), decrementing it by one produces pointer to the node's last child. From here, following preceding sibling point-ers, all siblings can be visited. Thus the tree can be traversed as required, including pre-order, post-order. Each node stores close token's end location. The node's open token's start location can be obtained by incrementing by one the end location stored in preceding sibling, except for first sibling but it stores the start location as well. For leaf nodes, above provides start and end location of the token. For non-leaf nodes, decrementing by one the start location in first child gives end location of open token, and incrementing by one the end location in last child gives start location of close token. Thus start and end location of both the open and close tokens is obtained. Tree traversal and token location can thus be done using a few array accesses and inexpensive increment/decrement operations in constant time per node using quite a compact parse tree format.
A disadvantage of the above parse tree is that siblings can be traversed in only one direction, and that too in reverse order, from last to first. The following method produces a parse tree with bidirectional sibling pointers.
Two phase method
A straightforward approach to parse tree with bidirectional sibling point¬ers would be to store the parse tree being generated in a random access memory so that pointer values can be added later when known to previously written' node data. Doing so would require relatively complex control logic and storage of the entire parse tree in hardware before it can be written out. In contrast, the following proposed method would operate in two phases: the first phase processes the token stream and writes out the node data in pre-order, while the second phase processes the reversed token stream and writes out node data in post-order. Doing so has two advantages: logic required is simpler, and storage requirements lesser since only token stream needs to be stored in hardware and not parse tree data, which can be written out as generated.
In phase one, part of each token's data (to be used in second phase) is pushed onto a separate stack (accessed by push', pop' and empty'). Data is written out pre-order (node data is immediately followed by its subtrees). Data written for each node is the start location of the token (closing token

for non-leaf nodes) and a pointer to preceding sibling (or 0 for last sibling). In phase two, tokens are popped from the separate stack, providing reversed token stream- Data is now written post-order and since stream is reversed, so is the processing for the open and closed tokens. Data written for each node is a pointer to the (preceding in reverse order hence) succeeding sibling, except for the last sibling (which has no succeeding sibling) for which instead the end location of the token (closing token for non-leaf nodes) is written. To distinguish between the two, a one-bit flag (Sag) is also written.
Data written out can be read into two arrays, one for each phase's output. Processing of tokens pre-order in first phase and of reversed tokens post-order in second phase ensures that nodes in second array occur exactly the reverse order of first. This is a simple, but crucial fact, since it enables, for any node with data at i in one array, computation of the pointer to node's data in second array simply as s - i +1 the other array, s being array size in number of elements. Thus all data for a node can be simply accessed even though it is in separate arrays. Each node is now located next to its first child, with bidirectional pointers between siblings, significantly enhancing efficient tree traversal. Location data for non-leaf nodes is accessed in dual fashion to above method since now each node stores start location of open token, with last sibling also storing end location of close token.
With only slight increase in hardware complexity, the parse tree can be annotated with other data such as subtree size, node nesting level, number of child nodes (attribute, others), pointers from parent to first non-attribute child, and last child, and parent pointer in each node. A highlight of pro¬posed approach is the representation of the parse tree using two arrays, data in one occurring in reversed node order of the other. Doing so signif¬icantly simplifies the logic complexity and also storage required since the token stream data stored on stack is quite small compared to the parse tree (especially if it stores additional data fields such as above). At the same time, all data for each node can efficiently be accessed in constant time in software:' Further, parse tree data for each node can be sent to host as soon as .it is computed, instead of waiting for the whole parse tree to be completed. This overlapping of computation with communication reduces latency significantly.
3.8 Implementation
3.8.1 Overview
The XFA and YEA construction and state table generation methods de-scribed above are used to generate state table data for the given XML schema. The data is then loaded into state tables in the XML Accelera¬tor hardware logic whose block diagram is shown in Figure 3. Next the XML data is sent, which forms the input to the lexer (lexical analyzer).

Based on the lexical grammar in its state tables, it converts the input into tokens. The tokens output by the lexer are supplied to the parser which is essentially a hardware implementation of a tree automaton. The lexer, and the datapath and control logic of the parser are described in the following sections.
3.8.2 Lexer
To achieve high throughput, the lexer should process multiple characters ev-ery clock cycle. While regular expression based lexers that process multiple characters every clock cycle can be implemented (see [1] and [18]), a more efficient and scalable approach is employed here. As can be seen from figure 2, two different types of lexers arc implemented, with the splitter sending characters of appropriate type to each, and the merger combining tokens from both. The splitter adds tag bits to the characters, which are carried through the lexers, and used by the merger to output tokens in the correct sequence. Within each lexer, FIFOs are used at input and output to even out processing flow. The splitter processes four characters every clock cycle, partitioning1'them into those forming tags (which are sent to regex lexer) and the remaining (which are sent to fixed lexer). Every clock cycle, the fixed lexer also processes four characters, essentially whitespace and text, and outputs the corresponding tokens. As above tokens are schema inde¬pendent, the fixed lexer logic is hardwired, resulting in fast and compact implementation. The splitter and lexer are implemented using parallel pre¬fix logic stages so that delays are logarithmic and not linear in the number of characters processed every cycle. So they can be scaled in a straightforward manner to much higher throughputs. The regex lexer processes tags, one character per clock cycle, using a schema dependent state table stored in block RAMs, which also has a reasonably fast and compact implementation. Also, using the splitter/merger approach, multiple regex lexers can be uti¬lized speeding up tag processing as well. Since current parser process atmost one token each cycle, the merger is implemented to output one token every cycle, but.its throughput can be scaled up like that of the splitter. Thus the current lexer implementation can efficiently process upto five characters every clock cycle, and, if required, can be scaled to much higher throughputs as well.
3.8.3 Parser Datapath
The datapath used to implement method 4 is shown in figure 1. Inspite of the apparent complexity of tree automata operation, the logic required for it turns out to be remarkably simple and efficient. The XFA and YFA state tables and the state register in logic correspond respectively to xtab, ytab and the state variable in the method. The datapath performs the stack

operation for an OPEN token in one clock cycle, while the stack operation and.XFA table lookup, followed by the YFA table lookup required for a CLOSE token takes two clock cycles as do the XFA and YFA table lookups for a LEAF token (the muxes are required for the leaf path). Thus each token is processed in one or two clock cycles.
The critical paths, as can be seen, are quite short. Not shown in the figure are the connections to an additional read/write port in each state table required for reading and writing state table data. The second port was implemented with no performance overhead using the FPGA's dual port RAM blocks. An earlier version with muxes on address and data out paths of a single port RAM proved significantly slower.
3.8.4 Parser Control Logic
The inputs and outputs of the control logic are boolean signals each of which can take a value of either 0 or 1. The control logic is described below as an FST. The FST description can be converted into a boolean logic circuit realization in a reasonably straightforward manner by a person of ordinary skill in the art of boolean logic circuit design.
The set of input boolean signals is {pause, done, leaf, open, stack.empty). For the FST, each condition is specific values of a subset of the input signals. The set of output boolean signals is {pusi, pop, load, clear, done-out, rdJSfo}. For the FST, each action is specified as a subset of output signals to which the value 1 is to be assigned (value 0 is assigned to the remaining output signals).
A basic realization of the FST is shown in figure 4. The set of states is {i,j,p,q,r,s,t,u,z}. The initial state is i and the final state is z. Each transition is represented by an edge directed from one state to another. Each edge is labeled by an ordered pair, whose first and second elements represent the condition and action respectively for the transition. The condition is represented as an ordered pair, whose first element as well as second element is a set of input signals. The condition is true if and only if all signals in the former and latter are 0 and 1 respectively. The action is represented as a set of output signals which are to be assigned the value 1 (remaining output signals being assigned the value 0) when the transition occurs. For example, in the transition from u to p, the condition is true if input signals done and pause have values 0 and 1 respectively, and the action is to assign the value' 1 to the load output signal and the value 0 to remaining output signals, if both sets for a condition are empty, it is always true. If the set for an action is empty, all output signals are assigned the value 0.
States i,j are start states, z is the final state and p,q,r,s,t,u form the core
states. Before beginning an iteration, p state is active as long as pause is
asserted] q state is entered next while asserting rdJifo (invariant for entering
q). Three possible iterations depend on values of leaf and open as shown in
r

table 1. After each iteration, p is entered if pause asserted, else q is entered with rcLGfo asserted.
Above FST is simple to understand but not very practical because it does not detect any error conditions. A more complex and more practical FST is
shown in figure 5. The FST's input signal set is {pause, done, leaf, open, sym.eq, zerostate, stack.empt and its output signal set is {push, pop, load, clear, err, donejout, rdJBfo}. Its set of states is {c, d, e, f, g, h, i, j, p, q, r, s, t, u, z}. The additional states c,d,e,f,g,h are various error states each of which corresponds to a specific error as de¬scribed in table 25.
A straightforward realization of the FST as a boolean logic circuit would require only two additional signals, the inputs elk and reset.
References
[lj BRODIE, B. C, TAYLOR, D. E., AND CYTRON, R. K. A scalable ar-chitecture for high-throughput regular-expression pattern matching. In Proceedings of the 33rd annual international symposium on Computer Architecture (Washington, DC, USA, 2006), ISCA '06, IEEE Computer Society, pp. 191-202.
[2] BRUGGEMANN-KLEIN, A., AND WOOD, D. Redefining the
xml language. http://ww.cs.ust.hk/-dwood/preprints/
xmlRulesRewritten. htm.
*The errors e-init-pause- and e-doncpause. are present because for the specific imple¬mentation it is expected that pause=l at start up as well as at end Qust before done=l).???

(3J COMON, H., DAUCHET, M., GILLERON, R., LODING, C, JACQUE-MARD, ,F., LUGIEZ, D., TISON, S., AND TOMMASI, M. Tree au¬tomata f techniques and applications. Available on: http://www. grappa.univ-lille3.fr/tata, 2007. release October, 12th 2007.
[4] DAI, Z., NI, N., AND ZHU, J. A 1 cycle-per-byte xml parsing acceler-ator. In FPGA '10: Proceedings of the 18th annual ACM/SIGDA in-ternational symposium on Field programmable gate arrays (New York, NY, USA, 2010), ACM, pp. 199-208.
[5] DELLA-LIBERA, G. M., AND LUCCO, S. E. Modular forest au¬tomata. Patent Application, 04 2010. US 2010/0094906 Al.
(6] INTEL. Intel soa expressway performance comparison to ibm datapower xi50. http: //www. omg. org/news/whitepapers/sponsors-wp/Intel_ SOAE_rjataPower_Perf_Compare_White_Paper.pdf.
|7J LEVENTHAL, M., AND LEMOINE, E. The xml chip, http://www. textscience.com/papers/TheXMLChip.pdf.
[8J LEVENTHAL, M., AND LEMOINE, E. The xml chip at 6 years. In Proceedings of the International Symposium on Processing XML Ef-ficiently: Overcoming Limits on Space, Time, or Bandwidth (august 2009), vol. 4. http://www.balisage.net/Proceedings/vol4/html/ Leventhal01/BalisageVol4-Leventhal01.html.
(9] LOWE, W. M., NOGA, M. L., AND GAUL, T. S. Foundations of fast communication via xml. Ann. Softw. Eng. 13, 1-4 (2002), 357-379.
[10] Lu, W., CHIU, K., AND PAN, Y. A parallel approach to xml parsing. In In The 7th IEEE/ACM International Conference on Grid Computing (2006).
[11] MOSCOLA, J., LOCKWOOD, J. W., AND CHO, Y. H. Reconfigurable content-based router using hardware-accelerated language parser. ACM Trans. Des. Autom. Electron. Syst. IS, 2 (2008), 1-25.
[12] MURATA, M. Hedge automata: a formal model for xml schemata. http://www.xml.gr.jp/relax/hedgejuce.html.
[13] MURATA, M., LEE, D., AND MANI, M. Taxonomy of xml schema languages using formal language theory. In Extreme Markup Languages (2001).
[14] NEUMANN, A. Parsing and Querying XML Documents in SML. PhD thesis, University of Trier, December 1999.

[15] TOZAWA, A., AND MURATA, M. Generating automata for validat¬ing xml documents, and validating xml documents. Patent, 05 2006. US 7055093.
[16] Wu, Y., ZHANG, Q., YU, Z., AND LI, J. A hybrid parallel processing for xmrparsing and schema validation. In Proceedings ofBalisage: The Markup Conference 2008 (august 2008), vol. 1. http://www.balisage. net/Pfoceedings/voll/html/WuOl/BalisageVoll-WuOl. html.
[17] XILINX. Bus master dma performance demonstration reference design for the xilinx endpoint pci express solutions, bttp://www.xilinx.com/ support/documentation/application.notes/xappl052.pdf.
[18] YAMAGAKI, N., SIDHU, R. P. S., AND KAMIYA, S. High-speed regular expression matching engine using multi-character nfa. In FPL 2008, In¬ternational Conference on Field Programmable Logic and Applications (2008), pp. 131-136.
lb
[19] ZHAND, AND KLOVETTB. Ximpleware w3c position paper, http: //www. w3. org/2003/08/binary- interchange-workshop/ 20-ximpleware-positionpaper-updated.htm.

WE CLAIM
1. A hardware Implementation method for XML processing using an improved XML accelerator,
wherein,
the XML data characters is sent to a Lexical Analyzer, which processes multiple characters every clock cycle and output tokens;
an input of the said tokens processed by the Parser which may include schema validation, query, well formedness checking and parse tree output.
2. A method as claimed in claim 1, wherein the Lexical Analyzer the characters are split and carried through plurality of Lexers and merged to output tokens in the correct sequence.
3. A method as claimed in claim 1, wherein in the Parser the tokens are processed by a hardware logic realization of tree automation based parsing method.
4. A method as claimed in claim 2, wherein in the Parser is comprised of a datapath part and a control part;
said datapath part is comprised primarily of two state machine tables, a stack and a comparator.
5. A method as claimed in claim 2, wherein in the Parser processes each token in at most two clock cycles.
6. A method as claimed in claim 1, wherein, the parse tree output operates in two phases, first phase process the token stream and writes out the node data in pre-order and second phase process the reversed token stream and writes out node data in post-order.
7. A method as claimed in claim 1, wherein the Parser performs one or more of:
schema validation;
query;
well formedness checking;
parse tree output.

Documents

Application Documents

#	Name	Date
1	3917-CHE-2012 FORM-3 20-09-2012.pdf	2012-09-20
1	3917-CHE-2012-FER.pdf	2020-03-19
2	3917-CHE-2012-Form 18-180816.pdf	2016-09-09
2	3917-CHE-2012 FORM-2 20-09-2012.pdf	2012-09-20
3	3917-CHE-2012-Form 6-230316.pdf	2016-03-28
3	3917-CHE-2012 FORM-1 20-09-2012.pdf	2012-09-20
4	3917-CHE-2012 DRAWINGS 20-09-2012.pdf	2012-09-20
4	3917-CHE-2012-OTHERS-230316.pdf	2016-03-28
5	abstract-3917-CHE-2012.jpg	2015-05-27
5	3917-CHE-2012 DESCRIPTION (PROVISIONAL) 20-09-2012.pdf	2012-09-20
6	3917CHE2012 - DRAWINGS.pdf	2013-08-05
6	3917-CHE-2012 CORRESPONDENCE OTHERS 20-09-2012.pdf	2012-09-20
7	F2 CS.pdf	2013-08-05
8	3917CHE2012 - DRAWINGS.pdf	2013-08-05
8	3917-CHE-2012 CORRESPONDENCE OTHERS 20-09-2012.pdf	2012-09-20
9	abstract-3917-CHE-2012.jpg	2015-05-27
9	3917-CHE-2012 DESCRIPTION (PROVISIONAL) 20-09-2012.pdf	2012-09-20
10	3917-CHE-2012 DRAWINGS 20-09-2012.pdf	2012-09-20
10	3917-CHE-2012-OTHERS-230316.pdf	2016-03-28
11	3917-CHE-2012 FORM-1 20-09-2012.pdf	2012-09-20
11	3917-CHE-2012-Form 6-230316.pdf	2016-03-28
12	3917-CHE-2012-Form 18-180816.pdf	2016-09-09
12	3917-CHE-2012 FORM-2 20-09-2012.pdf	2012-09-20
13	3917-CHE-2012-FER.pdf	2020-03-19
13	3917-CHE-2012 FORM-3 20-09-2012.pdf	2012-09-20

Search Strategy

1	searchstrategy3917che2012E_18-03-2020.pdf