Multi Modal Approach To Search Query Input

< Back

Multi Modal Approach To Search Query Input

Abstract: Search queries containing multiple modes of query input are used to identify responsive results. The search queries can be composed of combinations of keyword or text input image input video input audio input or other modes of input. The multiple modes of query input can be present in an initial search request or an initial request containing a single type of query input can be supplemented with a second type of input. In addition to providing responsive results in some embodiments additional query refinements or suggestions can be made based on the content of the query or the initially responsive results.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

18 April 2013

Publication Number

33/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

lsmds@lakshmisri.com

Parent Application

Applicants

MICROSOFT CORPORATION

One Microsoft Way Redmond Washington 98052 6399

Inventors

1. LIU Jiyang

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

2. SUN Jian

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

3. SHUM Heung Yeung

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

4. YANG Xiaosong

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

5. KUO Yu Ting

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

6. ZHANG Lei

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

7. LI Yi

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

8. KE Qifa

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

9. LIU Ce

c/o Microsoft Corporation LCA International Patents One Microsoft Way Redmond Washington 98052 6399

Specification

SERIALIZING DOCUMENT EDITING COMMANDS
BACKGROUND
[0001] World Wide Web ("Web") applications have been developed that allow the
creation and editing of rich documents. For instance, Web applications are available for
creating and editing word processing documents, spreadsheets, presentations, and other
types of documents. These documents might also be created and edited in a compatible
client application. For instance, a word processing client application might be executed on
a desktop or laptop computer and utilized to create a word processing document. The
word processing document might then be edited utilizing a suitable Web application.
[0002] One problem with current Web applications occurs when a user of the Web
application edits a document simultaneously with the editing of the document by another
user utilizing a client application. In this scenario, two versions of the document are
generated. One version of the document contains the edits made using the Web
application and a second version of the document contains the edits made using the client
application. It can be difficult to reconcile the changes between the two versions of the
document.
[0003] Another problem with current Web applications occurs when a client
application, such as a Web browser application, becomes disconnected from a server
hosting the Web application. In this scenario, it can be difficult to revert an edited
document to its previous state when a connection is reestablished. Consequently, edits to
a document can be lost when a disconnection occurs.
[0004] Other problems with current Web applications can occur because it can be
difficult to migrate in-progress editing sessions between server computers. For instance, if
a Web server that implements the Web application and hosts editing sessions becomes
overloaded, it can be difficult to migrate in-progress editing sessions to another server to
balance the load. Similarly, it can be difficult to upgrade the Web application on a server
computer that has in-progress editing sessions.
[0005] It is with respect to these and other considerations that the disclosure made
herein is presented.
SUMMARY
[0006] Technologies are described herein for serializing document editing commands.
Through an implementation of the concepts and technologies presented herein, a single
document can be generated that contains modifications to a document made using both a
Web application and a client application. Through an implementation of the concepts and
technologies presented herein, the edited state of a document can also be recreated
following the disconnection from a Web application. Additionally, servers hosting Web
applications can be load balanced and upgraded even while editing sessions are inprogress.
[0007] According to one aspect presented herein, a Web application is provided for
creating and editing documents. For instance, in one implementation, the Web application
provides functionality for creating and editing a presentation document using a
conventional Web browser application program. The Web application stores the
document or has access to a network location storing the document.
[0008] Commands for modifying the document are generated through the Web
browser application program and transmitted to the Web application executing on a server
computer. The Web application receives the commands and serializes the commands.
This might include, for instance, adding data to the commands indicating the time at which
the commands were received and arranging the commands in time order. The serialized
commands are then stored in a command stream. The command stream is stored
separately from the document. It should be appreciated that the command stream
represents the difference, which may be referred to herein as a "delta", between the
original document and its current state. Application of the commands stored in the
command stream to the document will result in the current state of the document.
[0009] According to another aspect, the command stream may be applied to the
document when a request is received via the Web application to save the document. For
instance, when a request is received to save the document, the commands in the command
stream may be applied to the document in serial order (i.e. the order in which the
commands were originally made). The document may then be saved once the commands
have been applied to the document.
[0010] According to another aspect, the command stream described above may be
utilized to enable co-authoring. For instance, in one example, a client application might
modify a document to create a modified document. The Web application might be utilized
to edit the same document, resulting in the creation of a command stream. In order to
reconcile the changes between the two versions of the document, the commands in the
command stream may be applied to the modified document. In this way, the resulting
document includes edits applied to the document by way of the client application and edits
applied to the document by way of the Web application.
[0011] According to another aspect, the command stream described above can be
utilized to improve the performance of a Web application. For instance, a Web
application may be configured to maintain a command stream in a volatile memory, such
as a Random Access Memory ("RAM"), for documents as described above. In order to
free memory, the document and the command stream may be stored to a mass storage
device, such as a hard disk drive, and unloaded from the volatile memory. When
additional commands are received for the document, the document may be returned to its
current state by applying the stored command stream to the document. The additional
commands may then be serialized into a command stream in the manner described above.
[0012] According to another aspect, the command stream may be utilized to perform
dynamic load balancing on the server computers that provide the Web application. In this
implementation, one or more highly loaded server computers are identified. In-progress
document editing sessions are then identified on the highly loaded server computers. For
each of the identified editing sessions, the command stream for a document is applied to
the document. The document is then moved to a non-highly loaded server computer. In
other embodiments, the command stream and the document might be moved to the nonhighly
loaded server computer without applying the command stream to the document.
The server computer to which the document is moved then takes over responsibility for
handling the editing session.
[0013] According to another aspect, the command stream may be utilized to perform
an uninterrupted upgrade on a server computer that hosts the Web application. In
particular, an in-progress editing session is identified on a server computer that is
executing a down level version of the Web application. The document and command
stream associated with the identified in-progress editing session are then moved to a server
computer executing an up level version of the Web application. The editing session is
then resumed at the server computer to which the document and command stream have
been moved. Once all of the in-progress editing sessions on a down level server have been
moved in this manner, the Web application on the server can be upgraded. In one
implementation, the commands in the command stream are applied to the document prior
to moving the document to the server computer executing the up level Web application.
[0014] It should be appreciated that the command stream described herein might also
be utilized for other purposes, such as undo/redo, document recovery, and others. It
should also be appreciated that this Summary is provided to introduce a selection of
concepts in a simplified form that are further described below in the Detailed Description.
This Summary is not intended to identify key features or essential features of the claimed
subject matter, nor is it intended that this Summary be used to limit the scope of the
claimed subject matter. Furthermore, the claimed subject matter is not limited to
implementations that solve any or all disadvantages noted in any part of this disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIGURE 1 is a software and network architecture diagram showing one
illustrative operating environment for the embodiments disclosed herein;
[0016] FIGURE 2 is a software architecture diagram showing aspects of various
components disclosed herein for serializing document editing commands in one
embodiment disclosed herein;
[0017] FIGURE 3 is a data structure diagram showing aspects of a command stream
generated and utilized in embodiments disclosed herein;
[0018] FIGURE 4 is a flow diagram showing one illustrative process for serializing a
command stream according to one embodiment disclosed herein;
[0019] FIGURE 5 is a data structure diagram showing aspects of one process for
generating a modified document that includes edits made at both a Web application and a
client application in one embodiment disclosed herein;
[0020] FIGURE 6 is a flow diagram showing one illustrative process for optimizing
the performance of a Web application using a command stream in one embodiment
disclosed herein;
[0021] FIGURE 7 is a flow diagram showing one illustrative process for dynamically
load balancing a server computer hosting a Web application using a command stream in
one embodiment disclosed herein;
[0022] FIGURE 8 is a flow diagram showing one illustrative process for upgrading a
Web application using a command stream in one embodiment disclosed herein; and
[0023] FIGURE 9 is a computer architecture diagram showing an illustrative computer
hardware and software architecture for a computing system capable of implementing the
various embodiments presented herein.
DETAILED DESCRIPTION
[0024] The following detailed description is directed to technologies for serializing
document editing commands. As discussed briefly above, a command stream may be
generated using the technologies described herein that includes serialized commands for
editing a document. The command stream can be applied to a modified document to
generate a single document that contains modifications to the document made using both a
Web application and a client application. The command stream can also be utilized to
recreate the edited state of a document following the disconnection from a Web
application for editing the document, to load balance a server computer hosting the Web
application even while editing sessions are in-progress, to perform an upgrade of a server
hosting the Web application while editing sessions are in-progress, and for other purposes.
[0025] While the subject matter described herein is presented in the general context of
program modules that execute in conjunction with the execution of an operating system
and application programs on a computer system, those skilled in the art will recognize that
other implementations may be performed in combination with other types of program
modules. Generally, program modules include routines, programs, components, data
structures, and other types of structures that perform particular tasks or implement
particular abstract data types. Moreover, those skilled in the art will appreciate that the
subject matter described herein may be practiced with other computer system
configurations, including hand-held devices, multiprocessor systems, microprocessorbased
or programmable consumer electronics, minicomputers, mainframe computers, and
the like.
[0026] In the following detailed description, references are made to the accompanying
drawings that form a part hereof, and which are shown by way of illustration specific
embodiments or examples. Referring now to the drawings, in which like numerals
represent like elements through the several figures, aspects of a computing system and
methodology for serializing document editing commands into a command stream and for
utilizing the command stream will be described.
[0027] FIGURE 1 is a software and network architecture diagram showing one
illustrative operating environment for the embodiments disclosed herein. The operating
environment 100 illustrated in FIGURE 1 is configured for providing a Web application
114 to a client computer 104 executing a Web browser application program 102. It should
be appreciated that the term "Web application" as utilized herein is intended to encompass
an application that can be accessed and utilized through standard protocols and
technologies such as HTTP, SOAP, asynchronous JAVASCRIPT, and others. The term
"Web application" should not be limited only to applications that are available via the
World Wide Web. Rather, a Web application 114 may be accessible through virtually any
type of network 108 including, but not limited to, wide area networks, local area networks,
wireless networks, and other types of networks.
[0028] In the operating environment 100 shown in FIGURE 1, a number of front end
servers 106A-106C are provided to execute a front end component 110. Requests for the
Web application 114 received from the Web browser application program 102 are load
balanced to the front end servers 106A-106C. In this way, a front end server 106A-106C
may be assigned for a particular document editing session. Commands generated by the
Web browser application program 102 for a particular editing session are received by a
front end server component 110 on the front end server 106A-106C assigned to the editing
session. These commands are then forwarded to an instance of the Web application 114
executing on one of the back end server computers 112A-1 12C. The back end server
computers 112A-1 12C might also be load balanced in order to ensure that the Web
application 114 operates in a performant manner.
[0029] As also illustrated in FIGURE 1, each of the back end servers 112A-112C
maintains one or more disks 116A-1 16C for storing executable program code, such as an
operating system and the Web application 114. The disks 116A-1 16C might also be
utilized to store documents 118A-1 18C. The documents 118A-1 18C might also be stored
on another location accessible via the network 108 or another network.
[0030] According to one implementation, the Web application 114 provides
functionality for creating and editing one or more document types. For instance, the Web
application 114 may be configured for creating and editing a word processing document, a
spreadsheet document, a presentation document, or another type of document. As will be
described in greater detail below, a client application executing on the client computer 104
might also be configured to create and edit document types that are compatible with the
documents 118A-118C generated by the Web application 114. For instance, a document
might be created at the client computer 104 utilizing a client application and then edited by
the Web application 114. Similarly, a document might be created at the Web application
114 and then edited utilizing a client application executing on the client computer 104.
[0031] It should be appreciated that the operating environment 100 shown in FIGURE
1 is merely illustrative and other types of operating environments might also be utilized.
For instance, in other embodiments, the front end servers 106A-106C may not be utilized.
Additionally, in other embodiments, more or fewer back end servers 112A-1 12C might
also be utilized. Moreover, although a single client computer 104 is illustrated in FIGURE
1, it should be appreciated that the operating environment 100 shown in FIGURE 1 is
capable of supporting many more client computers 104 simultaneously. Other types of
operating environments capable of supporting the concepts and technologies described
herein may be apparent to those skilled in the art.
[0032] FIGURE 2 is a software architecture diagram showing aspects of various
components disclosed herein for serializing document editing commands in one
embodiment disclosed herein. As shown in FIGURE 2, and described briefly above, a
user of the client computer 104 can utilize the Web browser application 102 to interact
with the Web application 114. In particular, a command 202 for modifying a document
118 can be generated at the client computer 104 by a user. For instance, if the document
118 is a presentation document, the command 202 might be for adding a new slide to the
presentation, adding a graphical element to the presentation, adding or modifying text in
the presentation, or performing any other type of editing task. When the document 118 is
a word processing document, the command 202 may be for adding text to the document,
formatting text, adding graphics, or performing other edits to the document. It should be
appreciated, therefore, that the term command as utilized herein refers to any type of
command for modifying a document.
[0033] Each command 202 generated at the client computer 104 includes data
identifying how the edit should be made to the document 118. The data may be specified
utilizing extensible markup language ("XML"), binary encoding, or in another format.
For instance, if the command 202 is for editing text in a document 118, the data stored in
the command 202 may describe the location within the document at which the edit should
occur and how the edit should be performed. If the command 202 is for adding a slide to a
presentation, the command 202 might include data indicating the position at which the
new slide is to be added, the title of the new slide, and other information. Other types of
commands might also be represented similarly.
[0034] As discussed briefly above, a command 202 is generated at the client computer
104 and transmitted to a front end server, such as the front end server 106A. In turn, the
front end server 106A transmits the command 202 to the appropriate back end server 112,
such as the back end server 112A. As discussed briefly above, each back end server 112
executes an instance of the Web application 114. As also discussed briefly above, each
back end server 112 maintains, or has access to, a disk storage device 116 storing the
document 118 to which the command 202 should be applied. Rather than applying the
command 202 directly to a document 118, however, the Web application 114 maintains a
command stream 206.
[0035] As will be discussed in greater detail below, the command stream 206 includes
a serialized sequence of commands 202A-202N. In order to serialize the commands 202,
the Web application 114 may add data to the commands 202A-202N indicating the
absolute or relative time at which the commands were generated. Other types of data,
such as sequence number, might also be used to serialize the commands 202A-202N. The
commands 202A-202N are then stored in the command stream 206 in sequential order. In
the example shown in FIGURE 2, the command stream 206 is stored in a volatile memory
204 of a back end server 112. It should be appreciated that, in other embodiments, the
command stream 206 may be stored on a disk 116.
[0036] FIGURE 3 is a data structure diagram showing aspects of a command stream
206 generated and utilized in embodiments disclosed herein. In particular, FIGURE 3
shows the commands 202A-202N which have been serialized and placed in sequential
order according to the order in which the commands 202A-202N were generated. It
should be appreciated, therefore, that the command stream 206 storing the commands
202A-202N represents a delta between the document 118 prior to modification and its
current state. As will be described in greater detail below, the Web application 114 can
apply the commands 202A-202N to the command stream 206 in serial order in order to
generate the current state of the document 118. Details regarding this process and several
applications of this process will be described below with reference to FIGURES 4-8.
[0037] FIGURE 4 is a flow diagram showing one illustrative routine 400 for
serializing a command stream according to one embodiment disclosed herein. It should be
appreciated that the logical operations described herein with respect to FIGURE 4 and the
other FIGURES are implemented (1) as a sequence of computer implemented acts or
program modules running on a computing system and/or (2) as interconnected machine
logic circuits or circuit modules within the computing system. The implementation is a
matter of choice dependent on the performance and other requirements of the computing
system. Accordingly, the logical operations described herein are referred to variously as
operations, structural devices, acts, or modules. These operations, structural devices, acts
and modules may be implemented in software, in firmware, in special purpose digital
logic, and any combination thereof. It should also be appreciated that more or fewer
operations may be performed than shown in the figures and described herein. These
operations may also be performed in a different order than those described herein.
[0038] The routine 400 begins at operation 402, where the Web application 114
receives a command 202. In response to receiving a command, the routine 400 proceeds
to operation 404 where the Web application 114 serializes the command 202. This might
include, for instance, adding data to the command 202 indicating the absolute or relative
time at which the command 202 was received. Other types of mechanisms for serializing
the command 202 might also be utilized. Once the command 202 has been serialized, the
routine 400 proceeds from operation 404 to operation 406.
[0039] At operation 406, the serialized command 202 is stored in the command stream
206. The routine 400 then proceeds to operation 408 where the Web application 114
determines whether a request has been received to save the document 118 corresponding
to the command stream. If not, the routine 400 proceeds to operation 402, described
above, where additional commands 202 are received and serialized in the manner
described above. If a request is received at operation 408 to save the document 118, the
routine 400 proceeds to operation 410.
[0040] At operation 410, the commands 202A-202N in the command stream 206 for
the current document 118 are applied to the document 118 in serial order. In this manner,
the commands 202A-202N stored in the command stream 206 are applied to the document
118 in the order in which they were generated. The document 118 following application
of the command stream 206 represents the current state of the document 118. Once the
command stream 206 has been applied to the document 118, the routine 400 proceeds to
operation 412 where the document 118 is persisted to disk. The routine 400 then proceeds
to operation 402, where additional commands 202 are received, serialized, and stored in
the command stream 206.
[0041] FIGURE 5 is a data structure diagram showing aspects of one process for
generating a modified document that includes edits made at both a Web application and a
client application in one embodiment disclosed herein. As discussed briefly above, a
desktop client application 502 might be utilized on the client computer 104 that is capable
of editing the documents generated by the Web application 114. For instance, as
discussed briefly above, a word processing desktop client application 502 might be
utilized to edit a document 118A generated by the Web application 114. Similarly, the
Web application 114 might be utilized to edit a document 118A created by the desktop
client application 502. In the example shown in FIGURE 5, the desktop client application
502 has been utilized to make modifications 504 to an original document 118A. The
resulting document is a modified document 118D.
[0042] In one scenario, the Web browser application program 102 may utilize the Web
application 114 to also make modifications to the original document 118A. As discussed
above, however, the modifications to the original document 118A made by way of the
Web application 114 are represented in a command stream 206. For instance, in the
example shown in FIGURE 5, a command stream 206 has been generated that includes
two commands 202A-202B.
[0043] In order to reconcile the changes between the version of the document
generated by the Web application 114 and the version of the document generated by the
desktop client application 502, the Web application 114 may be configured to apply the
commands 202A-202B in the command stream 206 to the modified document 118D. In
this way, an updated document 118E is generated that includes the modifications 504
made to the document 118A by the desktop client application 502 and that also includes
the modifications made to the document by way of the Web application 114. By
generating an updated document 118E in this manner, the concepts and technologies
disclosed herein permit concurrent editing ("co-editing") utilizing a desktop client
application 502 and a Web application 114.
[0044] It should be appreciated that conflicts might exist in the updated document
118E. For instance, the desktop client application 502 might be utilized to delete a portion
of text in the document 118A. Concurrently, the Web application 114 might be utilized to
edit the text deleted by way of the desktop client application 502. In this example, a
conflict will exist when the command stream 206 is applied to the modified document
118D. It should be appreciated that various mechanisms might be utilized to resolve the
conflict. For instance, a user may be asked to choose between the conflicting edits. Other
mechanisms might also be used to resolve a conflict between modifications made to a
document at a client application 502 and at a Web application 114.
[0045] FIGURE 6 is a flow diagram showing one illustrative routine 600 for
optimizing the performance of a Web application 114 using a command stream 206 in one
embodiment disclosed herein. The routine 600 begins at operation 602, where the
commands 202 received at the Web application 114 are serialized into the command
stream 206. The routine 600 then proceeds from operation 602 to operation 604 where the
document 118 and its associated command stream 206 are saved to a disk 116. Once the
document 118 and the command stream 206 have been saved, the routine 600 proceeds to
operation 606 where the command stream 206 is unloaded from the memory 204. As
illustrated in FIGURE 2, the command stream 206 might be stored in a volatile memory
204 of a back end server 112. By unloading the command stream 206 from the volatile
memory 204, the memory 204 may be freed for other uses.
[0046] From operation 606, the routine 600 proceeds to operation 608 where the Web
application 114 determines whether an additional command 202 has been received for the
saved document 118. If not, the routine 600 proceeds to operation 608 where another such
determination is made. If a command is received, the routine 600 proceeds to operation
610 where the document 118 is loaded from disk. The command stream stored on disk
may also be loaded into a volatile memory 204 of the back end server 112.
[0047] The routine 600 then proceeds to operation 612 where the stored command
stream 206 is applied to the document 118 in the manner described above. As discussed
above, this results in a document 118 that represents the current state of the document
following application of all the commands in the command stream 206. The routine 600
then proceeds to operation 614 where the newly received command is serialized in the
command stream 206 in manner described above. From operation 614, the routine 600
proceeds to operation 616, where it ends.
[0048] FIGURE 7 is a flow diagram showing one illustrative routine 700 for
dynamically load balancing a server computer 112 hosting a Web application using a
command stream in one embodiment disclosed herein. The routine 700 begins at
operation 702, where a highly loaded back end server 112A-112C is identified. A highly
loaded server computer is a server computer that is experiencing a relatively high
utilization of its resources, such as CPU cycles, memory utilization, mass storage
utilization, and / or high utilization of other types of resources. Once a highly loaded back
end server 112A-1 12C has been identified, the routine 700 proceeds to operation 704.
[0049] At operation 704, one or more editing sessions on the identified highly loaded
back end server 112A-112C to be moved to another back end server are identified. The
in-progress editing sessions to be moved to another server 112 may be identified based
upon the resources utilized by the editing session, randomly, or in another fashion. Once
one or more in-progress editing sessions to be moved to another server 112 have been
identified, the routine 700 proceeds to operation 706.
[0050] At operation 706, some or all of the commands in the command stream 206 for
the identified editing sessions may be applied to the associated document. In this manner,
each document may be brought to its current state prior to moving the document to another
back end server 112. It should be appreciated that this process is optional and that the
command stream 206 may not be applied to a document associated with an in-progress
editing session prior to moving the editing session to another back end server 112.
[0051] From operation 706, the routine 700 proceeds to operation 708 where the
documents 118 and command streams 206 for the identified in-progress editing sessions
are moved to a non-highly loaded back end server 112A-112C. The back end server
112A-112C to which the in-progress editing sessions are moved may be identified based
upon the utilization of resources by the destination back end server, such as CPU
utilization, memory utilization, disk utilization, and / or utilization of other types of
resources. The back end server 112A-1 12C to which the in-progress editing sessions have
been moved then takes over responsibility for handling the in-progress editing sessions. In
this manner, any new commands received for the in-progress editing sessions will be
handled by the destination back end server 112A-1 12C. It should be appreciated,
therefore, that the back end servers 112A-1 12C may be dynamically load balanced without
interrupting in-progress editing sessions. From operation 708, the routine 700 proceeds to
operation 710, where it ends.
[0052] FIGURE 8 is a flow diagram showing one illustrative routine 800 for
upgrading a Web application using a command stream in one embodiment disclosed
herein. The routine 800 begins at operation 802, where an up-level version of the Web
application 114 is deployed on back end servers 112 not currently supporting any editing
sessions. The routine 800 then proceeds to operation 804 where the upgraded servers 112
are enabled to begin hosting editing sessions. Once the up level servers 112 have been
enabled for hosting editing sessions, the routine 800 proceeds to operation 806.
[0053] At operation 806, the in-progress editing sessions on a back end server 112
executing a down level Web application 114 are identified. For each identified inprogress
editing session, the commands 202 in the command stream 206 are applied to the
associated document. The routine 800 then proceeds to operation 808 where the
documents for the in-progress editing sessions are moved to the upgraded servers 112
executing the up level version of the Web application 114. The server computers to which
the documents are moved then take over responsibility for hosting the in-progress editing
session.
[0054] Once all of the in-progress editing sessions have been moved off of a down
level back end server 112, the routine 800 proceeds to operation 810 where the down level
server computers may be upgraded with an up-level version of the Web application 114.
The routine 800 then proceeds from operation 810 to operation 812, where it ends. In
view of the above, it should be appreciated that the Web application 114 maybe upgraded
without disturbing in-progress editing sessions.
[0055] FIGURE 9 is a computer architecture diagram showing an illustrative computer
hardware and software architecture for a computing system capable of implementing the
various embodiments presented herein. The computer architecture shown in FIGURE 9
illustrates a conventional desktop, laptop computer, or server computer and may be
utilized to execute the various software components described herein.
[0056] The computer architecture shown in FIGURE 9 includes a central processing
unit 902 ("CPU"), a system memory 908, including a random access memory 914
("RAM") and a read-only memory ("ROM") 916, and a system bus 904 that couples the
memory to the CPU 902. A basic input/output system ("BIOS") containing the basic
routines that help to transfer information between elements within the computer 900, such
as during startup, is stored in the ROM 916. The computer 900 further includes a mass
storage device 910 for storing an operating system 918, application programs, and other
program modules, which will be described in greater detail below.
[0057] The mass storage device 910 is connected to the CPU 902 through a mass
storage controller (not shown) connected to the bus 904. The mass storage device 910 and
its associated computer-readable storage media provide non-volatile storage for the
computer 900. Although the description of computer-readable media contained herein
refers to a mass storage device, such as a hard disk or CD-ROM drive, it should be
appreciated by those skilled in the art that computer-readable storage media can be any
available computer storage media that can be accessed by the computer 900.
[0058] By way of example, and not limitation, computer-readable storage media may
include volatile and non-volatile, removable and non-removable media implemented in
any method or technology for storage of information such as computer-readable
instructions, data structures, program modules or other data. For example, computerreadable
storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM,
flash memory or other solid state memory technology, CD-ROM, digital versatile disks
("DVD"), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic
tape, magnetic disk storage or other magnetic storage devices, or any other non-transitory
medium which can be used to store the desired information and which can be accessed by
the computer 900.
[0059] It should be appreciated that the computer-readable media disclosed herein also
encompasses communication media. Communication media typically embodies computer
readable instructions, data structures, program modules or other data in a modulated data
signal such as a carrier wave or other transport mechanism and includes any information
delivery media. The term "modulated data signal" means a signal that has one or more of
its characteristics set or changed in such a manner as to encode information in the signal.
By way of example, and not limitation, communication media includes wired media such
as a wired network or direct-wired connection, and wireless media such as acoustic, RF,
infrared and other wireless media. Combinations of the any of the above should also be
included within the scope of computer readable media. Computer-readable storage media
does not encompass communication media.
[0060] According to various embodiments, the computer 900 may operate in a
networked environment using logical connections to remote computers through a network
such as the network 920. The computer 900 may connect to the network 920 through a
network interface unit 906 connected to the bus 904. It should be appreciated that the
network interface unit 906 may also be utilized to connect to other types of networks and
remote computer systems. The computer 900 may also include an input/output controller
912 for receiving and processing input from a number of other devices, including a
keyboard, mouse, or electronic stylus (not shown in FIGURE 9). Similarly, an
input/output controller may provide output to a display screen, a printer, or other type of
output device (also not shown in FIGURE 9).
[0061] As mentioned briefly above, a number of program modules and data files may
be stored in the mass storage device 910 and RAM 914 of the computer 900, including an
operating system 918 suitable for controlling the operation of a networked desktop, laptop,
or server computer. The mass storage device 910 and RAM 914 may also store one or
more program modules. In particular, the mass storage device 910 and the RAM 914 may
store the Web browser application program 102 and/or the Web application 114, and the
other software components described above. The mass storage device 910 and RAM 914
may also store other program modules and data, such as the command stream 206.
[0062] In general, software applications or modules may, when loaded into the CPU
902 and executed, transform the CPU 902 and the overall computer 900 from a generalpurpose
computing system into a special-purpose computing system customized to
perform the functionality presented herein. The CPU 902 may be constructed from any
number of transistors or other discrete circuit elements, which may individually or
collectively assume any number of states. More specifically, the CPU 902 may operate as
one or more finite-state machines, in response to executable instructions contained within
the software or modules. These computer-executable instructions may transform the CPU
902 by specifying how the CPU 902 transitions between states, thereby physically
transforming the transistors or other discrete hardware elements constituting the CPU 902.
[0063] Encoding the software or modules onto a mass storage device may also
transform the physical structure of the mass storage device or associated computer
readable storage media. The specific transformation of physical structure may depend on
various factors, in different implementations of this description. Examples of such factors
may include, but are not limited to: the technology used to implement the computer
readable storage media, whether the computer readable storage media are characterized as
primary or secondary storage, and the like. For example, if the computer readable storage
media is implemented as semiconductor-based memory, the software or modules may
transform the physical state of the semiconductor memory, when the software is encoded
therein. For example, the software may transform the states of transistors, capacitors, or
other discrete circuit elements constituting the semiconductor memory.
[0064] As another example, the computer readable storage media may be implemented
using magnetic or optical technology. In such implementations, the software or modules
may transform the physical state of magnetic or optical media, when the software is
encoded therein. These transformations may include altering the magnetic characteristics
of particular locations within given magnetic media. These transformations may also
include altering the physical features or characteristics of particular locations within given
optical media, to change the optical characteristics of those locations. Other
transformations of physical media are possible without departing from the scope and spirit
of the present description, with the foregoing examples provided only to facilitate this
discussion.
[0065] Based on the foregoing, it should be appreciated that technologies for
serializing document editing commands into a command stream and for utilizing the
command stream have been presented herein. Although the subject matter presented
herein has been described in language specific to computer structural features,
methodological acts, and computer readable media, it is to be understood that the
invention defined in the appended claims is not necessarily limited to the specific features,
acts, or media described herein. Rather, the specific features, acts and mediums are
disclosed as example forms of implementing the claims.
[0066] The subject matter described above is provided by way of illustration only and
should not be construed as limiting. Various modifications and changes may be made to
the subject matter described herein without following the example embodiments and
applications illustrated and described, and without departing from the true spirit and scope
of the present invention, which is set forth in the following claims.
What is claimed is:
1. A computer-implemented method comprising performing computerimplemented
operations for:
storing a document;
receive a command to modify the document at a first application;
serialize the received command by way of the first application;
store the serialized command in a command stream separate from the document;
modifying the document by way of a second application to create a modified
document; and
applying the serialized commands in the command stream to the modified
document by way of the first application.
2. The computer-implemented method of claim 1, further comprising:
receiving a request at the first application to save the document; and
in response to receiving the request, applying the commands in the command
stream to the document in serial order and saving the document.
3. The computer-implemented method of claim 2, further comprising:
saving the document and the command stream to a mass storage device;
unloading the command stream from a volatile memory;
receiving a second command;
in response to receiving the second command, loading the document from the mass
storage device, applying the command stream to the document, serializing the second
command, and storing the serialized second command in the command stream separate
from the document.
4. The computer-implemented method of claim 2, further comprising:
identifying one or more highly loaded server computers;
identifying one or more document editing sessions on each of the highly loaded
server computers; and
for each of the identified document editing sessions, applying a command stream
to a document associated with the document editing session and moving the document to a
non-highly loaded server computer.
5. The computer-implemented method of claim 4, further comprising:
identifying an editing session on a server computer executing a down level
application program for editing the document;
applying the commands in the command stream to a document associated with the
editing session;
moving the document associated with the editing session to a server computer
executing an up level application program for editing the document; and
resuming the editing session on the server computer executing the up level
application program for editing the document.
6. A computer-readable storage medium having computer-executable
instructions stored thereupon which, when executed by a computer, cause the computer to:
store a document;
receive a command to modify the document;
serialize the received command;
store the serialized command in a command stream separate from the document;
receiving a request to save the document; and
in response to receiving the request, applying the commands in the command
stream to the document in serial order and saving the document.
7. The computer-readable storage medium of claim 6, wherein a first
application modifies the document to generate a modified document, and wherein a second
application applies the commands in the serialized command stream to the modified
document.
8. The computer-readable storage medium of claim 7, wherein the first
application comprises a desktop client application, and wherein the second application
comprises a web application.
9. The computer-readable storage medium of claim 8, having further
computer-executable instructions stored thereupon which, when executed by the computer,
cause the computer to:
save the document and the command stream to a mass storage device;
unload the command stream from a volatile memory of the computer;
receive a second command;
in response to receiving the second command, load the document from the mass
storage device, apply the command stream to the document, serialize the second
command, and store the serialized second command in the command stream separate from
the document.
10. The computer-readable storage medium of claim 7, having further
computer-executable instructions stored thereupon which, when executed by the computer,
cause the computer to:
identify one or more highly loaded server computers;
identify one or more document editing sessions on each of the highly loaded server
computers; and
for each of the identified document editing sessions, apply a command stream to a
document associated with the document editing session and move the document to a nonhighly
loaded server computer.

(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
(19) World Intellectual Property Organization
International Bureau
(10) International Publication Number
(43) International Publication Date
10 May 2012 (10.05.2012) 2U12/U61275 A l
(51) International Patent Classification: Way, Redmond, Washington 98052-6399 (US). LI, Yi; c/
G06F 17/30 (2006.01) o Microsoft Corporation, LCA - International Patents,
One Microsoft Way, Redmond, Washington 98052-6399
(21) International Application Number: (US). KE, Qifa; c/o Microsoft Corporation, LCA - Inter
PCT/US201 1/058541 national Patents, One Microsoft Way, Redmond, Wash
(22) International Filing Date: ington 98052-6399 (US). LIU, Ce; c/o Microsoft Corpo
31 October 201 1 (3 1.10.201 1) ration, LCA - International Patents, One Microsoft Way,
Redmond, Washington 98052-6399 (US).
(25) Filing Language: English
(81) Designated States (unless otherwise indicated, for every
(26) Publication Language: English kind of national protection available): AE, AG, AL, AM,
(30) Priority Data: AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,
12/940,538 5 November 2010 (05.1 1.2010) US CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO,
DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
(71) Applicant (for all designated States except US): MI¬ HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
CROSOFT CORPORATION [US/US]; One Microsoft KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
Way, Redmond, Washington 98052-6399 (US). ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI,
(72) Inventors: LIU, Jiyang; c/o Microsoft Corporation, LCA NO, NZ, OM, PE, PG, PH, PL, PT, QA, RO, RS, RU,
- International Patents, One Microsoft Way, Redmond, RW, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ,
Washington 98052-6399 (US). SUN, Jian; c/o Microsoft TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
Corporation, LCA - International Patents, One Microsoft ZM, ZW.
Way, Redmond, Washington 98052-6399 (US). SHUM, (84) Designated States (unless otherwise indicated, for every
Heung-Yeung; c/o Microsoft Corporation, LCA - Inter kind of regional protection available): ARIPO (BW, GH,
national Patents, One Microsoft Way, Redmond, Wash GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ,
ington 98052-6399 (US). YANG, Xiaosong; c/o Mi UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD,
crosoft Corporation, LCA - International Patents, One RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ,
Microsoft Way, Redmond, Washington 98052-6399 (US). DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT,
KUO, Yu-Ting; c/o Microsoft Corporation, LCA - Inter LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS,
national Patents, One Microsoft Way, Redmond, Wash SE, SI, SK, SM, TR), OAPI (BF, BJ, CF, CG, CI, CM,
ington 98052-6399 (US). ZHANG, Lei; c/o Microsoft GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
Corporation, LCA - International Patents, One Microsoft
Declarations under Rule 4.17 :
[Continued on next page]
(54) Title: MULTI-MODAL APPROACH TO SEARCH QUERY INPUT
Answer
444
Combined
Ra ked
© Results
FIG. 4 446
(57) Abstract: Search queries containing multiple modes of query input are used to identify responsive results. The search queries
o can be composed of combinations of keyword or text input, image input, video input, audio input, or other modes of input. The
multiple modes of query input can be present in an initial search request, or an initial request containing a single type of query in o put can be supplemented with a second type of input. In addition to providing responsive results, in some embodiments additional
query refinements or suggestions can be made based on the content of the query or the initially responsive results.
wo 2012/061275 Al III III IIII III III I lllll II I III! Ill II I II
as to applicant's entitlement to apply for and be granted Published:
a patent (Rule 4.1 7( )) — with international search report (Art. 21(3))
as to the applicant's entitlement to claim the priority of — before the expiration of the time limit for amending the
the earlier application (Rule 4.17(Hi)) claims and to be republished in the event of receipt of
amendments (Rule 48.2(h))
MULTI-MODALAPPROACH TO SEARCH QUERY INPUT
BACKGROUND
[0001] Various methods for search and retrieval of information, such as by a search
engine over a wide area network, are known in the art. Such methods typically employ
text-based searching. Text-based searching employs a search query that comprises one or
more textual elements such as words or phrases. The textual elements are compared to an
index or other data structure to identify documents such as web pages that include
matching or semantically similar textual content, metadata, file names, or other textual
representations.
[0002] The known methods of text-based searching work relatively well for text-based
documents, however they are difficult to apply to image files and data. In order to search
image files via a text-based query the image file must be associated with one or more
textual elements, such as a title, file name, or other metadata or tags. The search engines
and algorithms employed for text based searching cannot search image files based on the
content of the image and thus, are limited to identifying search result images based only
on the data associated with the images.
[0003] Methods for content-based searching of images have been developed that
analyze the content of an image to identify visually similar images. However, such
methods can be limited with respect to identifying text-based documents that are relevant
to the input of the image search.
SUMMARY
[0004] In various embodiments, methods are provided for using multiple modes of
input as part of a search query. The methods allow for search queries composed of
combinations of keyword or text input, image input, video input, audio input, or other
modes of input. A search for responsive documents can then be performed based on
features extracted from the various modes of query input. The multiple modes of query
input can be present in an initial search request, or an initial request containing a single
type of query input can be supplemented with a second type of input. In addition to
providing responsive results, in some embodiments additional query refinements or
suggestions can be made based on the content of the query or the initially responsive
results.
[0005] This Summary is provided to introduce a selection of concepts in a
simplified form that are further described below in the Detailed Description. This
Summary is not intended to identify key features or essential features of the claimed
subject matter, nor is it intended to be used as an aid, in isolation, in determining the scope
of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The invention is described in detail below with reference to the attached
drawing figures, wherein:
[0007] FIG. 1 is a block diagram of an exemplary computing environment suitable
for use in implementing embodiments of the present invention.
[0008] FIG. 2 schematically shows a network environment suitable for performing
embodiments of the invention.
[0009] FIG. 3 schematically shows an example of the components of a user
interface according to an embodiment of the invention.
[0010] FIG. 4 shows the relationship between various components and processes
involved in performing an embodiment of the invention.
[0011] FIGS. 5 - 9 show an example of extraction of image features from an
image according to an embodiment of the invention.
[0012] FIGS. 10 - 12 show examples of methods according to various
embodiments of the invention.
DETAILED DESCRIPTION
Overview
[0013] In various embodiments, systems and methods are provided for integrating
keyword or text-based search input with other modes of search input. Examples of other
modes of search input can include image input, video input, and audio input. More
generally, the systems and methods can allow for performance of searches based on
multiple modes of input in the query. The resulting embodiments of multi-modal search
systems and methods can provide a user greater flexibility in providing input to a search
engine. Additionally, when a user initiates a search with one type of input, such as image
input, a second type of input (or multiple other types of input) can then be used to refine or
otherwise modify the responsive search results. For example, a user can enter one or more
keywords to associate with an image input. In many situations, the association of
additional keywords with an image input can provide a clearer indication of user intent
than either an image input or keyword input alone.
[0014] In some embodiments, searching for responsive results based on a multi
modal search input is performed by using an index that includes terms related to more than
one type of data, such as an index that includes text-based keywords, image-based
"keywords", video-based "keywords", and audio-based "keywords". One option for
incorporating "keywords" for input modes other than text based searching can be to
correlate the multi-modal features with artificial keywords. These artificial keywords can
be referred to as descriptor keywords. For example, image features used for image-based
searching can be correlated with descriptor keywords, so that the image-based searching
features appear in the same inverted index as traditional text-based keywords. For
example, an image of the "Space Needle" building in Seattle may contain a plurality of
image features. These image features can be extracted from the image, and then correlated
with descriptor "keywords" for incorporation into an inverted index with other text-based
keyword terms.
[0015] In addition to incorporating descriptor keywords into a text-based keyword
index, descriptor keywords from an image (or another type of non-text input) can also be
associated with the traditional keyword terms. In the example above, the term "space
needle" can be correlated with one or more descriptor keywords from an image of the
Space Needle. This can allow for suggested or revised queries that include the descriptor
keywords, and therefore are better suited to perform an image based search for other
images similar to the Space Needle image. Such suggested queries can be provided to the
user to allow for improved searching for other images related to the Space Needle image,
or the suggested queries can be used automatically to identify such related images.
[0016] In the discussion below, the following definitions are used to describe
aspects of performing a multi-modal search. A feature refers to any type of information
that can be used as part of selection and/or ranking of a document as being responsive to a
search query. Features from a text-based query typically include keywords. Features from
an image-based query can include portions of an image identified as being distinctive,
such as portions of an image that have contrasting intensity or portions of an image that
correspond to a person's face for facial recognition. Features from an audio-based query
can include variations in the volume level of the audio or other detectable audio patterns.
A keyword refers to a conventional text-based search term. A keyword can refer to one or
more words that are used as a single term for identifying a document responsive to a
query. A descriptor keyword refers to a keyword that has been associated with a non-text
based feature. Thus, a descriptor keyword can be used to identify an image-based feature,
a video-based feature, an audio-based feature, or other non-text features. A responsive
result refers to any document that is identified as relevant to a search query based on
selection and/or ranking performed by a search engine. When a responsive result is
displayed, the responsive result can be displayed by displaying the document itself, or an
identifier of the document can be displayed. For example, the conventional hyperlinks,
also known as the "blue links" returned by a text-based search engine represent identifiers
for, or links to, other documents. By clicking on a link, the represented document can be
accessed. Identifiers for a document may or may not provide further information about the
corresponding document.
Receiving a Multi-Modal Search Query
[0017] Features from multiple search modes can be extracted from a query and
used to identify results that are responsive to the query. In an embodiment, multiple
modes of query input can be provided by any convenient method. For example, a user
interface for receiving query input can include a dialog box for receiving keyword query
input. The user interface can also include a location for receiving an image selected by the
user, such as an image query box that allows a user to "drop" a desired input image into
the user interface. Alternatively, the image query box can receive a file location or
network address as the source of the image input. A similar box or location can be
provided for identifying an audio file, video file, or another type of non-text input for use
as a query input.
[0018] The multiple modes of query input do not need to be received at the same
time. Instead, one type of query input can be provided first, and then a second mode of
input can be provided to refine the query. For example, an image of movie star can be
submitted as a query input. This will return a series of matching results that likely include
images. The word "actor" can then be typed into a search query box as a keyword, in
order to refine the search results based on the user's desire to know the name of the movie
star.
[0019] After receiving multi-modal search information, the multi-modal
information can be used as a search query to identify responsive results. The responsive
results can be any type of document determined to be relevant by a search engine,
regardless of the input mode of the search query. Thus, image items can be identified as
responsive documents to a text-based query, or text-based items can be responsive
documents to an audio-based query. Additionally, a query including more than one mode
of input can also be used to identify responsive results of any available type. The
responsive results displayed to a user can be in the form of the documents themselves, or
in the form of identifiers for responsive documents.
[0020] One or more indexes can be used to facilitate identification of responsive
results. In an embodiment, a single index, such as an inverted index, can be used to store
keywords and descriptor keywords based on all types of search modes. Alternatively, a
single ranking system can use multiple indexes to store terms or features. Regardless of
the number or form of the indexes, the one or more indexes can be used as part of an
integrated selection and/or ranking method for identifying documents that are responsive
to a query. The selection method and/or ranking method can incorporate features based on
any available mode of query input.
[0021] Text-based keywords that are associated with other types of input can also
be extracted for use. One option for incorporating multiple modes of information can be
to use text information associated with another mode of query input. An image, video, or
audio file will often have metadata associated with the file. This can include the title of
the file, a subject of the file, or other text associated with the file. The other text can
include text that is part of a document where the media file appears as a link, such as a
web page, or other text describing the media file. The metadata associated with an image,
video, or audio file can be used to supplement a query input in a variety of ways. The text
metadata can be used to form additional query suggestions that are provided to a user. The
text can also be used automatically to supplement an existing search query, in order to
modify the ranking of responsive results.
[0022] In addition to using metadata associated with an input query, the metadata
associated with a responsive result can be used to modify a search query. For example, a
search query based on an image may result in a known image of the Eiffel Tower as a
responsive result. The metadata from the responsive result may indicate that the Eiffel
Tower is the subject of the responsive image result. This metadata can be used to suggest
additional queries to a user, or to automatically supplement the search query.
[0023] There are multiple ways to extract metadata. The metadata extraction
technique may be predetermined or it may be selected dynamically either by a person or
an automated process. Metadata extraction techniques can include, but are not limited to:
(1) parsing the filename for embedded metadata; (2) extracting metadata from the nearduplicate
digital object; (3) extracting the surrounding text in a web page where the nearduplicate
digital object is hosted; (4) extracting annotations and commentary associated
with the near-duplicate from a web site supporting annotations and commentary where the
near-duplicate digital media object is stored; and (5) extracting query keywords that were
associated with the near-duplicate when a user selected the near-duplicate after a text
query. In other embodiments, metadata extraction techniques may involve other
operations.
[0024] Some of the metadata extraction techniques start with a body of text and
sift out the most concise metadata. Accordingly, techniques such as parsing against a
grammar and other token-based analysis may be utilized. For example, surrounding text
for an image may include a caption or a lengthy paragraph. At least in the latter case, the
lengthy paragraph may be parsed to extract terms of interest. By way of another example,
annotations and commentary data are notorious for containing text abbreviations (e.g.
IMHO for "in my humble opinion") and emotive particles (e.g. smileys and repeated
exclamation points). IMHO, despite its seeming emphasis in annotations and
commentary, is likely to be a candidate for filtering out where searching for metadata.
[0025] In the event multiple metadata extraction techniques are chosen, a
reconciliation method can provide a way to reconcile potentially conflicting candidate
metadata results. Reconciliation may be performed, for example, using statistical analysis
and machine learning or alternatively via rules engines.
[0026] FIG. 3 provides an example of a user interface suitable for receiving multi
modal search input and displaying responsive results according to an embodiment of the
invention. In FIG. 3, the user interface provides input locations for three types of query
input. Input box 3 11 can receive keyword input, such as the text-based input typically
used by a conventional search engine. Input box 313 can receive an image and/or video
file as input. An image or video file that is pasted or otherwise "dropped" into input box
313 can be analyzed using image analysis techniques to identify features that can be
extracted for searching. Similarly, input box 315 can receive an audio file as input.
[0027] Area 320 contains a listing of responsive results. In the embodiment shown
in FIG. 3, responsive results 332 and 342 are currently shown. Responsive result 332 is an
identifier, such as a thumbnail, for an image document identified as responsive to a search.
In addition to image result 332, a link or icon 334 is also provided to allow for a revised
search that incorporates the image result 332 (or the descriptor keywords associated with
image result 332) as part of the revised query. Responsive result 342 corresponds to an
identifier for a text-based document.
[0028] Area 340 contains a listing of suggested queries 347 based on the initial
query. The suggested queries 347 can be generated using conventional query suggestion
algorithms. Suggested queries 347 can also be based on metadata associated with input
submitted in image/video input 313 or audio input 315. Still other suggested queries 347
can be based on metadata associated with a responsive result, such as responsive result
332.
[0029] FIG. 4 schematically shows the interaction of various systems and/or
processes for performing a multi-modal search according to an embodiment of the
invention. In the embodiment shown in FIG. 4, the multi-modal search corresponds to a
search based on both keyword query input and image query input. In FIG. 4, a search is
started based on receiving a query. The query includes query keywords 405 and query
image 407. To process query image 407, an image understanding component 412 can be
used to identify features within the image. The features extracted from the query image
407 by image understanding component 412 can be assigned descriptor keywords by
image text feature and image visual feature component 422. An example of methods that
can be used by an image understanding component 412 is described below in conjunction
with FIGS. 5 - 9. Image understanding component 412 can also include other types of
image understanding methods, such as facial recognition methods, or methods for
analyzing color similarity in an image. Metadata analysis component 414 can identify
metadata associated with the query image 407. This can include information embedded
within the image file and/or stored with the file by the operating system, such as a title for
the image or annotations stored within the file. This can also include other text associated
with the image, such as text in a URL pathway that is entered to identify the image for use
in the search, or text located near the image for an image located on or embedded in a web
page or other text-based document. Image text feature and image visual feature
component 422 can identify keyword features based on the output from metadata analysis
414.
[0030] After identifying query terms 405 and any additional features in image text
feature and image visual feature component 422, the resulting query can optionally be
altered or expanded in component 432. The query alteration or expansion can be based on
features derived from metadata in metadata analysis component 414 and image text
feature / image visual feature component 422. Another source for query alteration or
expansion can be feedback from the UI Interactive Component 462. This can include
additional query information provided by a user, as well as query suggestions 442 based
on the responsive results from the current or prior queries. The optionally expanded or
altered query can then be used to generate responsive results 452. In FIG. 4, result
generation 452 involves using the query to identify responsive documents in a database
475, which includes both text and image features for the documents in the database.
Database 475 can represent an inverted index or any other convenient type of storage
format for identifying responsive results based on a query.
[0031] Depending on the embodiment, result generation 452 can provide one or
more types of results. In some situations, an identification of a most likely match can be
desirable, such as one or a few highly ranked responsive results. This can be provided as
an answer 444. Alternatively, a listing of responsive results in a ranked order may be
desirable. This can be provided as combined ranked results 446. In addition to an answer
or ranked results, one or more query suggestions 442 can also be provided to a user. The
interaction with a user, including display of results and receipt of queries, can be handled
by a UI interactive component 462.
Multimedia-Based Searching Methods
[0032] FIGS. 5-9 schematically show the processing of an exemplary image 500 in
accordance with an embodiment of the invention. In FIG. 5, an image 500 is processed
using an operator algorithm to identify a plurality of interest points 502. The operator
algorithm includes any available algorithm that is useable to identify interest points 502 in
the image 500. In an embodiment, the operator algorithm can be a difference of Gaussians
algorithm or a Laplacian algorithm as are known in the art. In an embodiment, the
operator algorithm is configured to analyze the image 500 in two dimensions. Optionally,
when the image 500 is a color image, the image 500 can be converted to grayscale.
[0033] An interest point 502 can include any point in the image 500 as depicted in
FIG. 5, as well as a region 602, area, group of pixels, or feature in the image 500 as
depicted in FIG. 6. The interest points 502 and regions 602 are referred to hereinafter as
interest points 502 for sake of clarity and brevity, however reference to the interest points
502 is intended to be inclusive of both interest points 502 and the regions 602. In an
embodiment, an interest point 502 is located on an area in the image 500 that is stable and
includes a distinct or identifiable feature in the image 500. For example, an interest point
502 is located on an area of an image having sharp features with high contrast between the
features such as depicted at 502a and 602a. Conversely, an interest point is not located in
an area with no distinct features or contrast, such as a region of constant color or grayscale
as indicated by 504.
[0034] The operator algorithm identifies any number of interest points 502 in the
image 500, such as, for example, thousands of interest points. The interest points 502 may
be a combination of points 502 and regions 602 in the image 500 and the number thereof
may be based on the size of the image 500. The image processing component 412
computes a metric for each of the interest points 502 and ranks the interest points 502
according to the metric. The metric might include a measure of the signal strength or the
signal to noise ratio of the image 500 at the interest point 502. The image processing
component 412 selects a subset of the interest points 502 for further processing based on
the ranking. In an embodiment, the one hundred most salient interest points 502 having
the highest signal to noise ratio are selected, however any desired number of interest
points 502 may be selected. In another embodiment, a subset is not selected and all of the
interest points are included in further processing.
[0035] As depicted in FIG. 7, a set of patches 700 can be identified that correspond to
the selected interest points 502. Each patch 702 corresponds to a single selected interest
point 502. The patches 702 include an area of the image 500 that includes the respective
interest point 502. The size of each patch 702 to be taken from the image 500 is
determined based on an output from the operator algorithm for each of the selected interest
points 502. Each of the patches 702 may be of a different size and the areas of the image
500 to be included in the patches 702 may overlap. Additionally, the shape of the patches
702 is any desired shape including a square, rectangle, triangle, circle, oval, or the like. In
the illustrated embodiment, the patches 702 are square in shape.
[0036] The patches 702 can be normalized as depicted in FIG. 7. In an embodiment,
the patches 702 are normalized to conform each of the patches 702 to an equal size, such
as an X pixel by X pixel square patch. Normalizing the patches 702 to an equal size may
include increasing or decreasing the size and/or resolution of a patch 702, among other
operations. The patches 702 may also be normalized via one or more other operations
such as applying contrast enhancement, despeckling, sharpening, and applying a
grayscale, among others.
[0037] A descriptor can also be determined for each normalized patch. A descriptor
can be a description of a patch that can be incorporated as a feature for use in an image
search. A descriptor can be determined by calculating statistics of the pixels in a patch
702. In an embodiment, a descriptor is determined based on the statistics of the grayscale
gradients of the pixels in a patch 702. The descriptor might be visually represented as a
histogram for each patch, such as a descriptor 802 depicted in FIG. 8 (wherein the patches
702 of FIG. 7 correspond with similarly located descriptors 802 in FIG. 8). The descriptor
might also be described as a multi-dimensional vector such as, for example and not
limitation, a multi-dimensional vector that is representative of pixel grayscale statistics for
the pixels in a patch. A T2S2 36-dimensional vector is an example of a vector that is
representative of pixel grayscale statistics.
[0038] As depicted in FIG. 9, a quantization table 900 can be employed to correlate a
descriptor keyword 902 with each descriptor 802. The quantization table 900 can include
any table, index, chart, or other data structure useable to map the descriptors 802 to the
descriptor keyword 902. Various forms of quantization tables 900 are known in the art
and are useable in embodiments of the invention. In an embodiment, the quantization
table 900 is generated by first processing a large quantity of images (e.g. image 500), for
example a million images, to identify descriptors 802 for each image. The descriptors 802
identified therefrom are then statistically analyzed to identify clusters or groups of
descriptors 802 having similar, or statistically similar, values. For example, the values of
variables in T2S2 vectors are similar. A representative descriptor 904 of each cluster is
selected and assigned a location in the quantization table 900 as well as a corresponding
descriptor keyword 902. The descriptor keywords 902 can include any desired indicator
that identifies a corresponding representative descriptor 904 For example, the descriptor
keywords 902 can include integer values as depicted in FIG. 9, or alpha-numeric values,
numeric values, symbols, text, or a combination thereof. In some embodiments, descriptor
keywords 902 can include a sequence of characters that identify the descriptor keyword as
being associated with non-text-based search mode. For example, all descriptor keywords
can include a series of three integers followed by an underscore character as the first four
characters in the keyword. This initial sequence could then be used to identify the
descriptor keyword as being associated with an image.
[0039] For each descriptor 802, a most closely matching representative descriptor 904
can be identified in the quantization table 900. For example, a descriptor 802a depicted in
FIG. 8 most closely corresponds with a representative descriptor 904a of the quantization
table 900 in FIG. 9. The descriptor keywords 902 for each of the descriptors 802 are
thereby associated with the image 500 (e.g. the descriptor 802a corresponds with the
descriptor identifier 902 "1"). The descriptor keywords 902 associated with the image 500
may each be different from one another or one or more of the descriptor keywords 902
may be associated with the image 500 multiple times (e.g. the image 500 might have
descriptor keywords 902 of "1, 2, 3, 4" or "1, 2, 2, 3"). In an embodiment, to take into
account characteristics, such as image variations, a descriptor 802 may be mapped to more
than one descriptor identifier 902 by identifying more than one representative descriptor
904 that most nearly matches the descriptor 802 and the respective descriptor keyword 902
therefor. Based on the above, the content of an image 500 having a set of identified
interest points 502 can be represented by a set of descriptor keywords 902.
[0040] In another embodiment, other types of image-based searching can be
integrated into a search scheme. For example, facial recognition methods can provide
another type of image search. In addition to and/or in place of identifying descriptor
keywords as described above, facial recognition methods can be used to determine the
identities of people in an image. The identity of a person in an image can be used to
supplement a search query. Another option can be to have a library of people for
matching with facial recognition technology. Metadata can be included in the library for
various people, and this stored metadata can be used to supplement a search query.
[0041] The above provides a description for adapting image-based search schemes
to a text-based search scheme. A similar adaptation can be made for other modes of
search, such as an audio-based search scheme. In an embodiment, any convenient type of
audio-based searching can be used. The method for audio-based searching can have one
or more types of features that are used to identify audio files that have similar
characteristics. As described above, the audio features can be correlated with descriptor
keywords. The descriptor keywords can have a format that indicates the keyword is
related to an audio search, such as having the last four characters of the keyword
correspond to a hyphen followed by four numbers.
Examples of Searching Based on Multi-Modal Queries
[0042] Search Example 1 - Adding image information to a text based query. One
difficulty with conventional search methods is identifying desired results for common
query terms. One type of search that can involve common query terms is a search for a
person with a common name, such as "Steve Smith". If a keyword query of "steve smith"
is submitted to a search engine, a large number of results will likely be identified as
responsive, and these results will likely correspond to a large number of different people
sharing the same or a similar name.
[0043] In an embodiment, a search for a named entity can be improved by
submitting a picture of the entity as part of a search query. For example, in addition to
entering "steve smith" in a keyword text box, an image or video of the particular Mr.
Smith of interest can be dropped into a location for receiving image based query
information. Facial recognition software can then be used to match the correct "Steve
Smith" with the search query. Additionally, if the image or video contains other people,
results based on the additional people can be assigned a lower ranking due to the keyword
query indicating the person of interest. As a result, the combination of keywords and
image or video can be used to efficiently identify results corresponding to a person (or
other entity) with a common name.
[0044] As a variation on the above, consider a situation where a user has an image
or video of a person, but does not know the name of the person. The person could be a
politician, an actor or actress, a sports figure, or any other person or other entity that can
be recognized by facial recognition or image matching technology. In this situation, the
image or video containing the entity can be submitted with one or more keywords as a
multi-modal search query. In this situation, the one or more keywords can represent the
information the user possesses regarding the entity, such as "politician" or "actress". The
additional keywords can assist the image search in various ways. One benefit of having
both an image or video and keywords is that results of interest to the user can be given a
higher ranking. Submitting the keyword "actress" with an image indicates a user intent to
know the name of the person in the image, and would lead to the name of the actress as a
higher ranked result than a result for a movie listing the actress in the credits.
Additionally, for facial recognition or other image analysis technology where an exact
match is not achieved, the keywords can help in ranking potentially responsive search
results. If the facial recognition method identifies both a state senator and an author as
potential matches, the keyword "politician" can be used to provide information about the
state senator as the highest ranked results.
[0045] Search Example 2 - Query refinement for multi-modal queries. In this
example, a user desires to obtain more information about a product found in a store, such
as a music CD or a movie DVD. As a precursor to the search process, the user can take a
picture of the cover of a music CD that is of interest. This picture can then be submitted
as a search query. Using image recognition and/or matching, the CD cover can be
matched to a stored image of the CD cover that includes additional metadata. This
metadata can optionally include the name of the artist, the title of the CD, the names of the
individual songs on the CD, or any other data regarding the CD.
[0046] A stored image of the CD cover can be returned as a responsive result, and
possibly as the highest ranked result. Depending on the embodiment, the user may be
offered potential query modifications on the initial results page, or the user may click on a
link in order to access the potential query modifications. The query modifications can
include suggestions based on the metadata, such as the name of the artist, title of the CD,
or the name of one of the popular songs on the CD. These query modifications can be
offered as links to the user. Alternatively, the user can be provided with an option to add
some or all of the query metadata to a keyword search box. The user can also supplement
the suggested modifications with additional search terms. For example, the user could
select the name of the artist and then add the word "concert" to the query box. The
additional word "concert" can be associated with the image for use as part of the search
query. This could, for example, produce responsive results indicating future concert dates
for the artist. Other options for query suggestions or modifications could include price
information, news related to the artist, lyrics for a song on the CD, or other types of
suggestions. Optionally, some query modifications can be automatically submitted for
search to generate responsive results for the modified query without further action from
the user. For example, adding the keyword "price" to the query based on the CD cover
could be an automatic query modification, so that pricing at various on-line retailers is
returned with the initial search results page.
[0047] Note that in the above example, a query image was submitted first, and then
keywords were associated with the query as a refinement. Similar refinements can be
performed by starting with a text keyword search, and then refining based on an image,
video, or audio file. .
[0048] Search Example 3 - Improved mobile searching. In this example, a user
may know generally what to ask for, but may be uncertain how to phrase a search query.
This type of mobile searching could be used for searching on any type of location, person,
object, or other entity. The addition of one or more keywords allows the user to receive
responsive results based on a user intent, rather than based on the best image match. The
keywords can be added, for example, in a search text box prior to submitting the image as
a search query. The keywords can optionally supplement any keywords that can be
derived from metadata associated with a image, video, or audio file. For example, a user
could take a picture of a restaurant and submit the picture as a search query along with the
keyword "menu". This would increase the ranking of results involving the menu for that
restaurant. Alternatively, a user could take a video of a type of cat and submit the search
query with the word "species". This would increase the relevance of results identifying
the type of cat, as opposed to returning image or video results of other animals performing
similar activities. Still another option could be to submit an image of the poster for a
movie along with the keyword "soundtrack", in order to identify the songs played in the
movie.
[0049] As still another example, a user traveling in a city may want information
regarding the schedule for the local mass transit system. Unfortunately, the user does not
know the name of the system. The user starts by typing in a keyword query of and "mass transit". This returns a large number of results, and the user is not
confident regarding which result will be most helpful. The user then notices a logo for the
transit system at a nearby bus stop. The user takes a picture of the logo, and refines the
search using the logo as part of the query. The bus system associated with the logo is then
returned as the highest ranked result, providing the user with confidence that the correct
transit schedule has been identified
[0050] Search Example 4 - Multi-modal searching involving audio files. In
addition to video or images, other types of input modes can be used for searching. Audio
files represent another example of a suitable query input. As described above for images
or videos, an audio file can be submitted as a search query in conjunction with keywords.
Alternatively, the audio file can be submitted either prior to or after the submission of
another type of query input, as part of query refinement. Note that in some embodiments,
a multi-modal search query may include multiple types of query input without a user
providing any keyword input. Thus, a user could provide an image and a video or a video
and an audio file. Still another option could be to include multiple images, videos, and/or
audio files along with keywords as query inputs.
[0051] Having briefly described an overview of various embodiments of the
invention, an exemplary operating environment suitable for performing the invention is
now described. Referring to the drawings in general, and initially to FIG. 1 in particular,
an exemplary operating environment for implementing embodiments of the present
invention is shown and designated generally as computing device 100. Computing device
100 is but one example of a suitable computing environment and is not intended to suggest
any limitation as to the scope of use or functionality of the invention. Neither should the
computing device 100 be interpreted as having any dependency or requirement relating to
any one or combination of components illustrated.
[0052] Embodiments of the invention may be described in the general context of
computer code or machine-useable instructions, including computer-executable
instructions such as program modules, being executed by a computer or other machine,
such as a personal data assistant or other handheld device. Generally, program modules,
including routines, programs, objects, components, data structures, etc., refer to code that
perform particular tasks or implement particular abstract data types. The invention may be
practiced in a variety of system configurations, including hand-held devices, consumer
electronics, general-purpose computers, more specialty computing devices, and the like.
The invention may also be practiced in distributed computing environments where tasks
are performed by remote-processing devices that are linked through a communications
network.
[0053] With continued reference to FIG. 1, computing device 100 includes a bus
110 that directly or indirectly couples the following devices: memory 112, one or more
processors 114, one or more presentation components 116, input/output (I/O) ports 118,
I O components 120, and an illustrative power supply 122. Bus 110 represents what may
be one or more busses (such as an address bus, data bus, or combination thereof).
Although the various blocks of FIG. 1 are shown with lines for the sake of clarity, in
reality, delineating various components is not so clear, and metaphorically, the lines would
more accurately be grey and fuzzy. For example, one may consider a presentation
component such as a display device to be an I/O component. Additionally, many
processors have memory. The inventors hereof recognize that such is the nature of the art,
and reiterate that the diagram of FIG. 1 is merely illustrative of an exemplary computing
device that can be used in connection with one or more embodiments of the present
invention. Distinction is not made between such categories as "workstation," "server,"
"laptop," "hand-held device," etc., as all are contemplated within the scope of FIG. 1 and
reference to "computing device."
[0054] The computing device 100 typically includes a variety of computerreadable
media. Computer-readable media can be any available media that can be
accessed by computing device 100 and includes both volatile and nonvolatile media,
removable and non-removable media. By way of example, and not limitation, computerreadable
media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable
media implemented in any method or technology for storage of information such as
computer-readable instructions, data structures, program modules or other data. Computer
storage media includes, but is not limited to, Random Access Memory (RAM), Read Only
Memory (ROM), Electronically Erasable Programmable Read Only Memory (EEPROM),
flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or
other holographic memory, magnetic cassettes, magnetic tape, magnetic disk storage or
other magnetic storage devices, carrier wave, or any other medium that can be used to
encode desired information and which can be accessed by the computing device 100. In
an embodiment, the computer storage media can be selected from tangible computer
storage media. In another embodiment, the computer storage media can be selected from
non-transitory computer storage media.
[0055] The memory 112 includes computer-storage media in the form of volatile
and/or nonvolatile memory. The memory may be removable, non-removable, or a
combination thereof. Exemplary hardware devices include solid-state memory, hard
drives, optical-disc drives, etc. The computing device 100 includes one or more
processors that read data from various entities such as the memory 112 or the I/O
components 120. The presentation component(s) 116 present data indications to a user or
other device. Exemplary presentation components include a display device, speaker,
printing component, vibrating component, and the like.
[0056] The I O ports 118 allow the computing device 100 to be logically coupled
to other devices including the I O components 120, some of which may be built in.
Illustrative components include a microphone, joystick, game pad, satellite dish, scanner,
printer, wireless device, etc.
[0057] With additional reference to FIG. 2, a block diagram depicting an
exemplary network environment 200 suitable for use in embodiments of the invention is
described. The environment 200 is but one example of an environment that can be used in
embodiments of the invention and may include any number of components in a wide
variety of configurations. The description of the environment 200 provided herein is for
illustrative purposes and is not intended to limit configurations of environments in which
embodiments of the invention can be implemented.
[0058] The environment 200 includes a network 202, a query input device 204, and a
search engine server 206. The network 202 includes any computer network such as, for
example and not limitation, the Internet, an intranet, private and public local networks, and
wireless data or telephone networks. The query input device 204 is any computing device,
such as the computing device 100, from which a search query can be provided. For
example, the query input device 204 might be a personal computer, a laptop, a server
computer, a wireless phone or device, a personal digital assistant (PDA), or a digital
camera, among others. In an embodiment, a plurality of query input devices 204, such as
thousands or millions of query input devices 204, are connected to the network 202.
[0059] The search engine server 206 includes any computing device, such as the
computing device 100, and provides at least a portion of the functionalities for providing a
content-based search engine. In an embodiment a group of search engine servers 206
share or distribute the functionalities required to provide search engine operations to a user
population.
[0060] An image processing server 208 is also provided in the environment 200. The
image processing server 208 includes any computing device, such as computing device
100, and is configured to analyze, represent, and index the content of an image as
described more fully below. The image processing server 208 includes a quantization
table 210 that is stored in a memory of the image processing server 208 or is remotely
accessible by the image processing server 208. The quantization table 210 is used by the
image processing server 208 to inform a mapping of the content of images to allow
searching and indexing of image features.
[0061] The search engine server 206 and the image processing server 208 are
communicatively coupled to an image store 212 and an index 214. The image store 212
and the index 214 include any available computer storage device, or a plurality thereof,
such as a hard disk drive, flash memory, optical memory devices, and the like. The image
store 212 provides data storage for image files that may be provided in response to a
content-based search of an embodiment of the invention. The index 214 provides a search
index for content-based searching of documents available via network 202, including the
images stored in the image store 212. The index 214 may utilize any indexing data
structure or format, and preferably employs an inverted index format. Note that in some
embodiments, image store 212 can be optional.
[0062] An inverted index provides a mapping depicting the locations of content in a
data structure. For example, when searching a document for a particular keyword
(including a keyword descriptor), the keyword is found in the inverted index which
identifies the location of the word in the document and/or the presence of a feature in an
image document, rather than searching the document to find locations of the word or
feature.
[0063] In an embodiment, one or more of the search engine server 206, image
processing server 208, image store 212, and index 214 are integrated in a single computing
device or are directly communicatively coupled so as to allow direct communication
between the devices without traversing the network 202.
[0064] FIG. 10 depicts a method according to an embodiment of the invention, or
alternatively executable instructions for a method embodied on computer storage media
according to an embodiment of the invention. In FIG. 10, an image, a video, or an audio
file is acquired 1010 that includes a plurality of relevance features that can be extracted.
The image, video, or audio file is associated 1020 with at least one keyword. The image,
video, or audio file and associated keyword are submitted 1030 as a query to a search
engine. At least one responsive result is received 1040 that is responsive to both the
plurality of relevance features and the associated keyword. The at least one responsive
result is then displayed 1050.
[0065] FIG. 11 depicts another method according to an embodiment of the invention,
or alternatively executable instructions for a method embodied on computer storage media
according to an embodiment of the invention. In FIG. 11, a query is received 1110 that
includes at least two query modes. Relevance features are extracted 1120 corresponding
to the at least two query modes from the query. A plurality of responsive results are
selected 1130 based on the extracted relevance features. The plurality of responsive
results are also ranked 1140 based on the extracted relevance features. One or more of the
ranked responsive results are then display 1150.
[0066] FIG. 12 depicts another method according to an embodiment of the invention,
or alternatively executable instructions for a method embodied on computer storage media
according to an embodiment of the invention. In FIG. 12, a query is received 1210
comprising at least one keyword. A plurality of responsive results is displayed 1220 based
on the received query. Supplemental query input is received 1230 comprising at least one
of an image, a video, or an audio file. A ranking of the plurality of responsive results is
modified 1240 based on the supplemental query input. One or more of the responsive
results are displayed 1250 based on the modified ranking.
Additional Embodiments
[0067] A first contemplated embodiment includes a method for performing a multi
modal search. The method includes receiving ( 1110) a query including at least two query
modes; extracting ( 1120) relevance features corresponding to the at least two query modes
from the query; selecting ( 1130) a plurality of responsive results based on the extracted
relevance features; ranking (1140) the plurality of responsive results based on the
extracted relevance features; and displaying ( 1150) one or more of the ranked responsive
results.
[0068] A second embodiment includes the method of the first embodiment, wherein
the query modes in the received query include two or more of a keyword, an image, a
video, or an audio file.
[0069] A third embodiment includes any of the above embodiments, wherein the
plurality of responsive documents are selected using an inverted index incorporating
relevance features from the at least two query modes.
[0070] A fourth embodiment includes the third embodiment, wherein relevance
features extracted from the image, video, or audio file are incorporated into the inverted
index as descriptor keywords.
[0071] In a fifth embodiment, a method for performing a multi-modal search is
provided. The method includes acquiring (1010) an image, a video, or an audio file that
includes a plurality of relevance features that can be extracted; associating (1020) the
image, video, or audio file with at least one keyword; submitting (1030) the image, video,
or audio file and the associated keyword as a query to a search engine; receiving (1040) at
least one responsive result that is responsive to both the plurality of relevance features and
the associated keyword; and displaying (1050) the at least one responsive result.
[0072] A sixth embodiment includes any of the above embodiments, wherein the
extracted relevance features correspond to a keyword and an image.
[0073] A seventh embodiment includes any of the above embodiments, further
comprising: extracting metadata from an image, a video, or an audio file; identifying one
or more keywords from the extracted metadata; and forming a second query including at
least the extracted relevance features from the received query and the keywords identified
from the extracted metadata.
[0074] An eighth embodiment includes the seventh embodiment, wherein ranking the
plurality of responsive documents based on the extracted relevance features comprises
ranking the plurality of responsive documents based on the second query.
[0075] A ninth embodiment includes the seventh or eighth embodiment, wherein the
second query is displayed in association with the displayed responsive results.
[0076] A tenth embodiment includes any of the seventh through ninth embodiments,
further comprising: automatically selecting a second plurality of responsive documents
based on the second query; ranking the second plurality of responsive documents based on
the second query; and displaying at least one document from the second plurality of
responsive documents.
[0077] An eleventh embodiment includes any of the above embodiments, wherein an
image or a video is acquired as an image or a video from a camera associated with an
acquiring device.
[0078] A twelfth embodiment includes any of the above embodiments, wherein an
image, a video, or an audio file is acquired by accessing a stored image, video, or audio
file via a network.
[0079] A thirteenth embodiment includes any of the above embodiments, wherein the
at least one responsive result comprises a text document, an image, a video, an audio file,
an identity of a text document, an identity of an image, an identity of a video, an identity
of an audio file, or a combination thereof.
[0080] A fourteenth embodiment includes any of the above embodiments, wherein the
method further comprises displaying one or more query suggestions based on the
submitted query and metadata corresponding to at least one responsive result.
[0081] In a fifteenth embodiment, a method for performing a multi-modal search is
provided, including receiving (1210) a query comprising at least one keyword; displaying
(1220) a plurality of responsive results based on the received query; receiving (1230)
supplemental query input comprising at least one of an image, a video, or an audio file;
modifying (1240) a ranking of the plurality of responsive results based on the
supplemental query input; and displaying (1250) one or more of the responsive results
based on the modified ranking.
[0082] Embodiments of the present invention have been described in relation to
particular embodiments, which are intended in all respects to be illustrative rather than
restrictive. Alternative embodiments will become apparent to those of ordinary skill in the
art to which the present invention pertains without departing from its scope.
[0083] From the foregoing, it will be seen that this invention is one well adapted to
attain all the ends and objects hereinabove set forth together with other advantages which
are obvious and which are inherent to the structure.
[0084] It will be understood that certain features and subcombinations are of utility
and may be employed without reference to other features and subcombinations. This is
contemplated by and is within the scope of the claims.
CLAIMS
What is claimed is:
1. A method for performing a multi-modal search, comprising:
receiving a query including at least two query modes;
extracting relevance features corresponding to the at least two query modes from
the query;
selecting a plurality of responsive results based on the extracted relevance features;
ranking the plurality of responsive results based on the extracted relevance
features; and
displaying one or more of the ranked responsive results.
2. The method of claim 1, wherein the query modes in the received query
include two or more of a keyword, an image, a video, or an audio file.
3. The method of any of the above claims, wherein the plurality of responsive
documents are selected using an inverted index incorporating relevance features from the
at least two query modes.
4. The method of claim 3, wherein relevance features extracted from the
image, video, or audio file are incorporated into the inverted index as descriptor keywords.
5. A method for performing a multi-modal search, comprising:
acquiring an image, a video, or an audio file that includes a plurality of relevance
features that can be extracted;
associating the image, video, or audio file with at least one keyword;
submitting the image, video, or audio file and the associated keyword as a query to
a search engine;
receiving at least one responsive result that is responsive to both the plurality of
relevance features and the associated keyword; and
displaying the at least one responsive result.
6. The method of any of the above claims, wherein the extracted relevance
features correspond to a keyword and an image.
7. The method of any of the above claims, further comprising:
extracting metadata from an image, a video, or an audio file;
identifying one or more keywords from the extracted metadata; and
forming a second query including at least the extracted relevance features from the
received query and the keywords identified from the extracted metadata.
8. The method of claim 7, wherein ranking the plurality of responsive
documents based on the extracted relevance features comprises ranking the plurality of
responsive documents based on the second query.
9. The method of claim 7 or 8, wherein the second query is displayed in
association with the displayed responsive results.
10. The method of any of claims 7 - 9, further comprising:
automatically selecting a second plurality of responsive documents based on the
second query;
ranking the second plurality of responsive documents based on the second query;
and
displaying at least one document from the second plurality of responsive
documents.
11. The method of any of the above claims, wherein an image or a video is
acquired as an image or a video from a camera associated with an acquiring device.
12. The method of any of the above claims, wherein an image, a video, or an
audio file is acquired by accessing a stored image, video, or audio file via a network.
13. The method of any of the above claims, wherein the at least one responsive
result comprises a text document, an image, a video, an audio file, an identity of a text
document, an identity of an image, an identity of a video, an identity of an audio file, or a
combination thereof.
14. The method of any of the above claims, wherein the method further
comprises displaying one or more query suggestions based on the submitted query and
metadata corresponding to at least one responsive result.
15. A method for performing a multi-modal search, comprising:
receiving a query comprising at least one keyword;
displaying a plurality of responsive results based on the received query;
receiving supplemental query input comprising at least one of an image, a video, or
an audio file;
modifying a ranking of the plurality of responsive results based on the
supplemental query input; and
displaying one or more of the responsive results based on the modified ranking.

Documents

Application Documents

#	Name	Date
1	3029-CHENP-2013 POWER OF ATTORNEY 18-04-2013.pdf	2013-04-18
2	3029-CHENP-2013 PCT PUBLICATION 18-04-2013.pdf	2013-04-18
3	3029-CHENP-2013 FORM-5 18-04-2013.pdf	2013-04-18
4	3029-CHENP-2013 FORM-3 18-04-2013.pdf	2013-04-18
5	3029-CHENP-2013 FORM-2 FIRST PAGE 18-04-2013.pdf	2013-04-18
6	3029-CHENP-2013 FORM-1 18-04-2013.pdf	2013-04-18
7	3029-CHENP-2013 DRAWINGS 18-04-2013.pdf	2013-04-18
8	3029-CHENP-2013 DESCRIPTION (COMPLETE) 18-04-2013.pdf	2013-04-18
9	3029-CHENP-2013 CORRESPONDENCE OTHERS 18-04-2013.pdf	2013-04-18
10	3029-CHENP-2013 CLAIMS SIGNATURE LAST PAGE 18-04-2013.pdf	2013-04-18
11	3029-CHENP-2013 CLAIMS 18-04-2013.pdf	2013-04-18
12	3029-CHENP-2013.pdf	2013-04-21
13	3029-CHENP-2013 CORRESPONDENCE OTHERS 17-05-2013.pdf	2013-05-17
14	3029-CHENP-2013 FORM-3 04-10-2013.pdf	2013-10-04
15	3029-CHENP-2013 CORRESPONDENCE OTHERS 04-10-2013.pdf	2013-10-04
16	abstract3029-CHENP-2013.jpg	2014-07-03
17	3029-CHENP-2013 FORM-6 01-03-2015.pdf	2015-03-01
18	MTL-GPOA - JAYA.pdf ONLINE	2015-03-09
19	MS to MTL Assignment.pdf ONLINE	2015-03-09
20	FORM-6-1801-1900(JAYA).43.pdf ONLINE	2015-03-09
21	MTL-GPOA - JAYA.pdf	2015-03-13
22	MS to MTL Assignment.pdf	2015-03-13
23	FORM-6-1801-1900(JAYA).43.pdf	2015-03-13
24	3029-CHENP-2013-FER.pdf	2019-07-04
25	3029-CHENP-2013-AbandonedLetter.pdf	2020-01-07

Search Strategy

1	Search3029CHENP2013_03-07-2019.pdf