Sign In to Follow Application
View All Documents & Correspondence

Method And Xquery Builder For Dynamically Generating An Xquery

Abstract: A method and an XQuery Builder are provided for dynamically generating an XQuery for an XML database (1) storing a plurality of non-XML documents (10), each non-XML document having a corresponding shadow XML document (20) in the XML database (1), comprising the steps of: - providing a plurality of static units (30, 33) of XQuery code, the static units (30, 33) being predefined in accordance with the non-XML documents (10). - combining the one or more static units (30, 33) with dynamic input (32) from a user to generate the XQuery.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
22 October 2007
Publication Number
18/2009
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

SOFTWARE AG
UHLANDSTR. 12 64297 DARMSTADT

Inventors

1. SAYED ZAINAB GAZIUDDIN
796, DASTUR MEHER ROAD, PUNE-411001
2. BANERJEE ARJUN
FLAT-A2, 5 PURNA MITRA LANE, TOLLYGUNGE, KOLKATA 700033

Specification

Method and XQuery Builder for dynamically generating an XQuery
1. Technical field
The present invention relates to a method and an XQuery Builder for dynamically
generating an XQuery for an XML database storing a plurality of non-XML docu-
ments.
2. The prior art
XML databases are one of the most important technical tool of modern informa-
tion societies. The high degree of flexibility of such a database allows to store and
to retrieve data in a highly efficient manner. Generally, XML databases are de-
signed for XML documents. However, in the prior art it is also known to extend
an XML database so that it is capable to store other types of documents. For ex-
ample the XML database Tamino of applicant is adapted to store non-XML
documents such as plain text files, MS Office files, PDF files, images and audio
files, etc.. To enable the future retrieval of such non-XML documents from the
database, it is known to analyze any non-XML document to be stored and to ex-
tract metadata for generating a so-called XML shadow document corresponding to
the non-XML document. Using XQuery, such shadow XML documents can later
be searched and the corresponding non-XML document can be retrieved.
Since XQuery is originally equipped only with limited Text Retrieval (TR) func-
tionality, it is known in the prior art to extend its capabilities with additional TR
indexes and dedicated TR query engines. For example, the XQuery version in
applicant's "Tamino" XML database is delivered with a package of common TR
functions, like "contains", "near", or "adjacent".
-2-

XQuery Builders are tool applications for databases, typically contained in data
administration and inspection packages. XQuery Builders enhance users' produc-
tivity by shielding them from the sometimes complicated syntax and semantics of
the query language. Instead, the user is provided with a GUI-supported access to a
restricted functionality of the query language, enabling him to do a considerable
part of his routine work without knowledge of the syntax and semantics of the
query language. An example of such an XQuery builder of the prior art is dis-
closed in the US 2006/0101002.
However, when searching through heaps of unknown and un-categorized data, in
particular text data, XQuery Builders of the prior art are not suitable, since they
still require a high level of skill of the user. The present invention is therefore in
one aspect based on the technical problem to facilitate the search among such
documents but at the same time to keep as much flexibility for the user as possible
so that he can obtain all desired information from a collection of non-XML docu-
ments.
3. Summary of the invention
In one aspect of the present invention, this problem is solved by a method accord-
ing to claim 1. In one embodiment, a method is provided for dynamically generat-
ing an XQuery for an XML database storing a plurality of non-XML documents,
each non-XML document having a corresponding shadow XML document in the
XML database, wherein the method comprises the steps of:

- providing a plurality of static units of XQuery code, the static units being
predefined in accordance with the non-XML documents.
- combining the one or more static units with dynamic input from a user to
generate the XQuery.
As a result, the user can easily generate an XQuery, which is specifically adapted
to a certain set of non-XML documents (and their corresponding shadow XML
documents) by flexibly combining the specifically adapted predefined static
XQuery code units. For example, if the non-XML documents are text documents,
the predefined code units may contain TR functions, which can then easily be
combined or adjusted by a user as required.
For implementing the described method, a "data architect" could at first predefine
the static code units and then let the user combine the units and, if desired, even
add some free-form part for the resulting query. Such a data architect could there-
fore serve as a mediator between inexperienced users and the data to be retrieved
by the XQuery, similar as a librarian facilitates the access to the books of a li-
brary. The dynamic input from the user is in one embodiment obtained by present-
ing a GUI to the user, the GUI providing one or more buttons relating to the one
or more static units.
In one embodiment, a FLWOR expression is provided in step a. comprising a
static for clause predefined in accordance with the non-XML documents. The for
clause indicates the collection and the doctype which contains the documents on
which the query is to be executed. The XQuery is executed on the shadow XML
documents.
-4-

The FLWOR expression may further comprise a static return clause predefined
in accordance with the non-XML documents. The static return clause may con-
tain the relevant information about the result set, i.e. the corresponding ino:id(s),
inordocname(s), etc.. In addition, the FLWOR expression may comprise a static
let clause for declaration of a variable, for example a creation date of the non-
XML documents. Finally, the FLWOR expression may comprise a where clause
including at least one dynamically defined user criterion for the XQuery. The
where clause is not restricted to a single criterion but may comprises a plurality of
user criteria combined by Boolean operators.
In one embodiment, the non-XML documents are text documents, in particular
Microsoft Office documents and / or Adobe PDF documents. Each text document
may comprise predefined text parts and free-form text.
According to another aspect, the present invention relates to an XQuery Builder
for an XML database storing a plurality of non-XML documents, each non-XML
document having a corresponding shadow XML document in the XML database,
the XQuery Builder being adapted to perform any of the above described meth-
ods. Such an XQuery Builder may be part of a larger database management sys-
tem (DBMS).
Finally, a computer program is provided comprising instructions adapted to per-
form any of the described methods.
4. Short description of the drawings
Fig. 1: A schematic representation of an XML database system in which an em-
bodiment of the present invention can be implemented; and
-5-

Fig. 2: A schematic representation of an XQuery generated by an embodiment
of the present invention;
5. Detailed description of preferred embodiments
In the following, exemplary embodiments of the method of the present invention
are described. It will be understood that the functionality described below can be
implemented in a number of alternative ways, for example on a single database
server, in a distributed arrangement of a plurality of database servers, with an in-
tegral storage or an external storage, etc.. None of these implementation details is
essential for the present invention.
Fig. 1 presents an overview of an exemplary XML database system 1. The system
1 generally serves to store and to retrieve XML documents (not shown) in Fig. 1.
However, the XML database system of Fig. 1 is also capable to process non-XML
documents such as the exemplary file 10 shown in Fig. 1. The file 10 can be any
type of non-XML document, e.g. a text file in any kind of format (WORD, PDF),
a video file, an audio file, a combination thereof, an image, an arbitrary set of bi-
nary data such as measurement results, etc..
For processing the file 10, the XML database system 1 comprises in one embodi-
ment a document processor 2. The document processor 2 drives the process for
storing a document. As illustrated by the dotted arrow on the left side of Fig. 1,
the file 10 is stored in the storage means 3, for example a RAID array (not shown)
or a similar storage device of the XML data base system 1. Any volatile or non-
volatile storage means known to the person skilled in the art can be used as the
storage means 3 of the XML database system 1.
In addition, the file 10 is forwarded to a schema processor 4. The operation of the
schema processor 4 and the further elements of the XML database system 1 which
are shown on the right side of Fig. 1 serves to process the file 10 so that it can be
-6-

searched and retrieved similar to other XML documents stored in the database. In
the exemplary embodiment of Fig. 1, the schema processor 4 provides info about
a server extension 5 to be called. It is to be noted that the server extension 5 could
also be integrated into the standard processing engine of a database sever of the
overall XML database system and does not have to be provided as a separate en-
tity. However, the provision of a separate server extension 5 facilitates the upgrad-
ing of an existing XML database system with the functionality for the handling of
non-XML files, such as the file 10.
The server extension 5 processes the file 10 and generates content for a shadow
XML document 20. Depending on the type of file 10, different steps can be per-
formed to generate the shadow XML document 20. For example, image process-
ing on an image file 10 may be performed leading to an output of metadata about
the image such as its resolution, colour distribution or any other type of image
related information. Other types of non-XML files may be processed similarly to
generate any kind of metadata for the shadow XML document 20. Using the
shadow XML document 20, a search can be performed, which allows to quickly
retrieve the corresponding non-XML file 10 from the database
A presently preferred embodiment of the above explained XML database system
is available from applicant under the name Tamino. The server extension of the
Tamino database system of applicant is called Tamino Non-XML Indexer. It inte-
grates non-XML files, for example Microsoft Office documents or Adobe PDF
documents, into the Tamino database system. When a non-XML file is stored or
updated in a Tamino database collection in which the Tamino Non-XML Indexer
is active, Tamino stores two objects, namely the non-XML file itself and its
shadow file comprising the raw data contained in the file, for example the plain
ASCII text in a Microsoft Word file and the metadata extracted from the file.
An XQuery for retrieving a shadow XML document and the corresponding non-
XML document will typically contain a FLOWR expression. In fact, FLWOR
-7-

expressions are at the heart of XQuery, because they allow a logically structuring
of the query. A FLWOR expression contains clauses that are introduced by the
keywords for, let, where and return. A FLOWR expression begins with at least
one of the clauses for and let, which may be followed by a where clause and ends
with the return clause. An example for a generated XQuery comprising the four
clauses is shown in Fig. 2.
Generating such an XQuery is made substantially easier for a user, who is not
familiar with the syntax and the semantics of the language, if clauses of the
FLWOR expression, which are with a high degree of likelihood repeatedly used
for different XQueries, are static, i.e. pre-defined. In the example of Fig. 1, the for
clause 30 is static and indicates the collection and doctype which contains the
documents on which the query is to be executed. The clause 30 depends on a
given set of non-XML documents and will be the same for many different XQue-
ries to be performed on this set. The let clause 31 is also used in a static way; it is
used for variable declaration e.g. CreationDate.
The where clause 32 dynamically aggregates XQuery fragments corresponding to
different user-defined criteria. The XQuery fragments may be dynamically created
by a user or also predefined and only selected by user. There may be more than
one where clause 32 in an XQuery (not shown in Fig. 2). Further, one or more
conditions can be combined within a where clause using Boolean operators, pro-
viding thereby a high amount of flexibility for the user. Using the where clause
32 conditions can be defined that the previously generated tuples must satisfy. If
the condition is met, the tuple is retained, if not, the tuple is discarded.
The static return clause 33 contains the relevant information about the result set,
i.e. the corresponding ino:id(s), ino:docname(s), etc.. It determines the result of
the whole FLOWR expression. It is invoked for every tuple that is retained after
evaluating the where clause 32. The return value can be formatted. Again, for
-8-

many XQueries on a given set of data, the return clause 33 will be identical so
that this clause can also be static.
A preferred field of use for the described mechanism relates to documents with a
considerable amount of text stored in Tamino (or a similar XML database sys-
tem). Here, the mechanism facilitates to make full use of Tamino's Text Retrieval
abilities in combination with the "normal" XQuery features.
The mechanism may for example be used for documents that combine free-form
text with predefined fixed-form fields, or with predefined standard text inside the
free-form text. An important example is a search in Curriculum Vitae (CV)
documents by a Human Resources department of a company. A CV is typically a
MS Word or a PDF document describing education, skills, and career in terms of
previous projects. Another relevant example is a search in patient data records
within any kind of healthcare system. In both situations, there are likely tens of
thousand of semi-structured text documents to be managed.
For implementing the described method in an embodiment of an XQueryBuilder
according to the invention, a "data architect" could initially prepare a set of fixed-
form XQueries which are most suitable for the respective set of non-XML docu-
ments and which are then made accessible to a user. For example, a graphical user
interface (GUI) can be provided having corresponding selection buttons. In addi-
tion to selecting one of the fixed-form XQueries, the user can generate new
XQueries with FLOWR expressions as described above, e.g. by arbitrarily com-
bining the selection of the buttons of the GUI and /or by adding some free-form
queries.
The user does not have to inspect the non-XML documents in order to generate
meaningful XQueries as explained above. If the data architect does have some
knowledge of the internal structure of the non-XML, this will be helpful for pre-
defining parts of the FLOWR expression. However, this is not necessary. Gener-
-9-

ally, the invention allows to dump all of the non-XML documents "as-is" into the
XML database system, e.g. Tamino, and the described mechanism allows to per-
form searches thereon.
-10-

Claims
1. Method for dynamically generating an XQuery for an XML database (1)
storing a plurality of non-XML documents (10), each non-XML document
having a corresponding shadow XML document (20) in the XML database
(1), the method comprising the steps of:
a. providing a plurality of static units (30, 33) of XQuery code, the static
units (30, 33) being predefined in accordance with the non-XML
documents (10).
b. combining the one or more static units (30, 33) with dynamic input
(32) from a user to generate the XQuery.
2. Method according to claim 1, wherein a FLWOR expression is provided in
step a. comprising a static for clause (30) predefined in accordance with the
non-XML documents.
3. Method according to claim 2, wherein the FLWOR expression further com-
prises a static return clause (33) predefined in accordance with the non-
XML documents.
4. Method according to any of the claims 2 or 3, wherein the FLWOR expres-
sion comprises a where clause (32) including at least one dynamically de-
fined user criterion for the XQuery.
-11-

-12-
5. Method according to claim 4, wherein the where clause comprises a plural-
ity of user criteria combined by logical operators.
6. Method according to one of the preceding claims, wherein the non-XML
documents (10) are text documents, in particular Microsoft Office docu-
ments and / or Adobe PDF documents.
7. Method according to claim 6, wherein each text document (10) comprises
free-form text and possibly predefined text parts.
8. Method according to any of the preceding claims, wherein the dynamic in-
put from the user is obtained by presenting a GUT to the user, the GUI pro-
viding one or more buttons relating to the one or more static units.
9. XQuery Builder for an XML database (1) storing a plurality of non-XML
documents (10), each non-XML document (10) having a corresponding
shadow XML document (20) in the XML database (1), the XQuery Builder
being adapted to perform a method of any of the preceding claims 1-8.
10. Database management system including an XQuery Builder according to
claim 9.
11. Computer program comprising instructions adapted to perform a method of
any of the preceding claims 1-8.

A method and an XQuery Builder are provided for dynamically generating an
XQuery for an XML database (1) storing a plurality of non-XML documents (10),
each non-XML document having a corresponding shadow XML document (20) in
the XML database (1), comprising the steps of:
- providing a plurality of static units (30, 33) of XQuery code, the static units
(30, 33) being predefined in accordance with the non-XML documents (10).
- combining the one or more static units (30, 33) with dynamic input (32)
from a user to generate the XQuery.

Documents

Application Documents

# Name Date
1 01435-kol-2007-abstract.pdf 2011-10-07
1 abstract-01435-kol-2007.jpg 2011-10-07
2 1435-KOL-2007-PA.pdf 2011-10-07
2 01435-kol-2007-claims.pdf 2011-10-07
3 1435-KOL-2007-FORM 3-1.1.pdf 2011-10-07
3 01435-kol-2007-correspondence others.pdf 2011-10-07
4 01435-kol-2007-description complete.pdf 2011-10-07
4 1435-KOL-2007-CORRESPONDENCE-1.1.pdf 2011-10-07
5 1435-KOL-2007-CORRESPONDENCE OTHERS 1.1.pdf 2011-10-07
5 01435-kol-2007-drawings.pdf 2011-10-07
6 1435-KOL-2007-ASSIGNMENT.pdf 2011-10-07
6 01435-kol-2007-form 1.pdf 2011-10-07
7 01435-kol-2007-form 3.pdf 2011-10-07
7 01435-kol-2007-form 2.pdf 2011-10-07
8 01435-kol-2007-form 3.pdf 2011-10-07
8 01435-kol-2007-form 2.pdf 2011-10-07
9 1435-KOL-2007-ASSIGNMENT.pdf 2011-10-07
9 01435-kol-2007-form 1.pdf 2011-10-07
10 01435-kol-2007-drawings.pdf 2011-10-07
10 1435-KOL-2007-CORRESPONDENCE OTHERS 1.1.pdf 2011-10-07
11 01435-kol-2007-description complete.pdf 2011-10-07
11 1435-KOL-2007-CORRESPONDENCE-1.1.pdf 2011-10-07
12 1435-KOL-2007-FORM 3-1.1.pdf 2011-10-07
12 01435-kol-2007-correspondence others.pdf 2011-10-07
13 1435-KOL-2007-PA.pdf 2011-10-07
13 01435-kol-2007-claims.pdf 2011-10-07
14 abstract-01435-kol-2007.jpg 2011-10-07
14 01435-kol-2007-abstract.pdf 2011-10-07