Abstract: Methods and systems for information fusion are described herein. In one implementation, the method includes posting a fact on a layered memory structure (126) and indexing the posted fact to a group, such that facts with a similar property measure are indexed to same group. Further, one or more knowledge modules (122) are applied on the posted fact and one or more input facts indexed to the group. Based on the application of the one or more knowledge modules (122) an output fact is provided and it is determined whether the output fact is a deduced fact based on a probability estimate.
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES. 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
1. Title of the invention: SYSTEMS AND METHODS FOR INFORMATION FUSION
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Indian Nirmal Building, 9th Floor, Nariman Point.
SERVICES LIMITED 400021 Maharashtra. India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The present subject matter relates, in general to artificial intelligence and, and, in
particular, to systems and methods for information fusion.
BACKGROUND
With information sources becoming ubiquitous, information regarding various
day to day events can be easily obtained and shared, thereby creating large amounts of data.
Various techniques are being developed to analyze the data and obtain some form of
intelligence for decision making various domains, for example, business intelligence,
marketing, and strategy. Such intelligence is typically based on identification of an inherent
pattern underlying the data. However, pattern identification often becomes a challenge as the
input data is random and generally not restricted to the type of intelligence required. For
example, information gathered from various news agencies or other sources in order to
determine the effects of a natural disaster on a supply chain, may not be related only to the
issue under consideration. Also, the information may be received dynamically even during
data analysis, making it difficult to correlate and analyze the ever increasing amount of data.
When correlating information from many input sources, issues regarding
information fusion, for example, issues related to merging of information from different
sources with differing conceptual, contextual and typographical representations, are often
encountered. The information fusion issues can be modeled as one of hierarchical pattern
discovery. Typically, blackboard based architecture is used for hierarchical pattern discovery.
The blackboard based architecture includes a layered memory structure in which programs
called knowledge sources can read data from and write data into various layers. However,
applications of the blackboard architecture for such information fusion issues have usually
focused on small subsets of data and have not been amenable to scaling for large datasets.
SUMMARY
This summary is provided to introduce concepts related to information fusion
and the concepts are further described below in the detailed description. This summary is
neither intended to identify essential features of the claimed subject matter nor is it intended
for use in determining or limiting the scope of the claimed subject matter.
The present subject matter relates to systems and methods for information
fusion. In one implementation, a fact is posted on a layered memory structure, such as. blackboard architecture. The posted fact is indexed to a group having one or more input facts having properties similar to the posted fact. Further, one or more knowledge modules are applied on the posted fact and the input facts in the group. Based on the application, an output fact is generated and it may be determined whether the output fact is a deduced fact based on a probability estimate. Further, the output fact may be posted on the layered memory structure on a positive determination.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is described with reference to the accompanying
figures. In the figures, the left-most digit(s) of a reference number identifies the figure in
which the reference number first appears. The same numbers are used throughout the
drawings to reference like features and components:
Fig. 1 illustrates a network environment implementing an information fusion
system, in accordance with an implementation of the present subject matter.
Fig. 2a illustrates a layered memory structure, in accordance with an
implementation of the present subject matter.
Fig. 2b illustrates a Bayesian network fragment for deducing a higher level
fact, in accordance with an implementation of the present subject matter.
Fig. 2c illustrates a Bayesian network fragment for discovering a fact which is
similar to an existing fact, in accordance with an implementation of the present subject matter.
Fig. 3 illustrates a method for information fusion, in accordance with an
implementation of the present subject matter,
DETAILED DESCRIPTION
The subject matter described herein relates to systems and methods for
information fusion. Information fusion may be understood as the process of gathering
intelligence or high level information from some given facts. The intelligence garnered from
the facts can be used as a means to achieve higher level situational awareness, which in turn
may assist in decision making by a user. The facts may be obtained from a variety of
information sources, such as data sources available within an organization and public
information available over World Wide Web (www). Further, the facts may pertain to, for
example, events, incidents, physical objects, real or virtual games, artistic works, and the like.
Typically, in a layered memory structure, such as blackboard based
architecture, the intelligence is gathered by implementing a plurality of knowledge sources, which can read facts from and write facts into various layers of the layered memory structure. A knowledge source, usually, looks for and then reads a set of facts from one or more layers, performs computations using the facts, and writes the results of the computations into one or more layers in the layered memory structure. Generally, a knowledge source needs to scan through the facts available on the layered memory structure to identify those facts that the knowledge source can work on. This can be done easily when the dataset of facts is small in size.
However, when the volume of data as in number of facts is large, information
fusion may require considerable computational time and resources. For example, intelligence and law enforcement organizations regularly need to scan large volumes of data, such as, phone call records, financial transactions, and emails. In another example, an organization may need to closely monitor data available on social networking sites, such as, Twitter™, to gather information regarding products and services offered by the organization. It would be appreciated by a person skilled in the art that the present examples are indicated for the sake of illustration only, and should not be construed to be a limitation of the present subject matter.
Considering that there are n facts available on the layered memory structure
and a knowledge source needs to operate on only two facts, then there are 0(n2) input combinations to choose from. Further, the computation may become more challenging in case of complex correlations involving large number of facts. It can be observed that in case n is large, the large number of combinations to choose from at each steps results in a combinatorial explosion, which in turn may lead to computational intractability. Thus, information fusion in such cases may become a time and resource consuming task.
According to an embodiment of the present subject matter, systems and
methods for information fusion are described. For the purpose of explanation, a fact on which a knowledge source is applied is referred to as an input fact and a fact generated upon application of the knowledge source may be referred to as an output fact. It will be understood that an output fact of one knowledge source can be an input fact for some other knowledge source. In one implementation, an input fact posted on the layered memory structure is indexed to a group based in part on one or more fact properties associated with the input fact. The input fact may be indexed using a fact indexing technique, such as, locality sensitive hashing (LSH).
Further, based on the fact properties associated with the input fact, similarity or
nearness of the input fact and other input facts, which are already posted on the layered memory structure can be determined. Accordingly, the input fact can be indexed to a group having similar input facts or to a new group in case the input fact is not similar to any of the input facts already posted on the layered memory structure. Thus, the input fact is indeved to a group based on a similarity property measure. The group of facts thus includes those facts that are pair-wise close to each other with respect to at least one of their property dimensions. In one implementation, the input fact is indexed using Locality-Sensitive Hashing (LSH). In addition to fact properties, other fact parameters may also be associated with an input fact. Examples of the fact parameters include, but are not limited to,fact type, fact level, a confidence probability, an availability indicator and one or more properties p, associated with the input fact.
Once the input fact is indexed, one or more Knowledge sources are applied on
the input fact to determine an output fact, which can be a new fact discovered from the input facts, to gain intelligence. The knowledge sources may be interchangeably referred to as knowledge modules. The knowledge modules can be selected based on a control criterion. In one example, the control criterion is a randomized control strategy, which selects knowledge sources at random. The randomized control strategy may be based on parallel terraced scan (PTS) strategy. However, it will be understood that other control criteria may also be implemented, for example, an exhaustive strategy, where lower level knowledge sources are exhausted first and then the higher level knowledge sources are considered. Upon application of a knowledge source, the input fact may be analyzed with respect to one or more input facts
which are indexed to the same group. Thus, instead of considering all possible combinations
of the facts, only those combinations are considered that appear to be similar to the input fact
under consideration. Further, in one implementation, analysis includes identifying the input
facts on which the knowledge source is to be applied based on a fact type associated with an
input fact. For example, a knowledge source may be configured to perform computations for a
particular type of input facts, say, type x. In said example, the group on which the knowledge
source is to be applied may include input facts with type x and type y. In such a case, the
knowledge source is only applied on those input facts that have a fact type x.
Upon application of the knowledge source, an output fact is generated using.
for example, a probabilistic estimate and computation functions specific to the knowledge
source. Thus, the knowledge source may implement a reasoning logic to generate an oulpu!
fact. In one example, the output fact is generated based on Bayesian reasoning. Further, it is
determined if the output fact is a deduced fact or not. For the purpose of explanation, a
deduced fact can be understood as a fact, which provides a higher level information using one
or more input facts, which are consumed for generating the deduced fact. For example,
consider the case of Tweeter™, where multiple input facts based on tweets posted by users.
are available on the layered memory structure. In said example, the deduced fact can be
understood as a new extension to an already built story or a new story altogether.
In one implementation, the determination that the output fact is the deduced
fact or not is made based on a posteriori probability associated with the output fact. In case the output fact is not a deduced fact, the output fact is discarded and the input facts are kept available for further analysis. Otherwise, the deduced fact is posted on the layered memory structure and is treated as a new input fact, i.e., the deduced fact is indexed and further analyzed by one or more knowledge sources as discussed above. The deduced fact may correspond to the same level as the input facts or to a next higher level. In one implementation, a deduced fact has at least one fact property same as that of the input fact from which it has been deduced. Also, the deduced fact may or may not lie in the same group as that of this input fact. Further, the input facts that were used to generate the deduced fact are marked as "consumed" on the layered memory structure to indicate that those input facts have been used for analysis and can not be used further.
While aspects of systems and methods for information fusion can be
implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system architecture(s).
Fig. 1 illustrates a network environment 100 implementing an information
fusion system 102, in accordance with an embodiment of the present subject matter. In one implementation, the network environment 100 can be a public network environment, including thousands of personal computers, laptops, various servers, such as blade servers, and other computing devices. In another implementation, the network environment 100 can be a private network environment with a limited number of personal computers, servers, laptops and other computing devices.
The information fusion system 102 (hereinafter referred to as IF system 102) is
communicatively connected to a plurality of user devices 104-1, 104-2,...104-N, collectively referred to as the user devices 104 and individually referred to as a user device 104. through a network 106. The IF system 102 and the user devices 104 may be implemented as any of a variety of conventional computing devices, including, servers, a desktop personal computer, a notebook or portable computer, a workstation, a mainframe computer, a mobile computing device, and a laptop. Further, in one implementation, the IF system 102 may itself be a distributed or centralized network system in which different computing devices may host one or more of the hardware or software components of the IF system 102. In another implementation, the various components of the IF system 102 may be implemented as a part of the same computing device.
The IF system 102 is connected to the user devices 104 over the network 106
through one or more communication links. The communication links between the IF system 102 and the user devices 104 are enabled through a desired form of communication, for example, via dial-up modem connections, cable links, digital subscriber fines (DSL), wireless or satellite links, or any other suitable form of communication.
The network 106 may be a wireless network, a wired network, or a
combination thereof. The network 106 can also be an individual network or a collection of many such individual networks, interconnected with each other and functioning as a single large network, e.g., the Internet or an intranet. The network 106 can be implemented as one of
the different types of networks, such as intranet, local area network (LAN), wide area network (WAN)., the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), etc., to communicate with each other. Further, the network 106 may include network devices, such as network switches, hubs, routers, for providing a link between the IF system 102 and the user devices 104. The network devices within the network 106 may interact with the IF system 102 and the user devices 104 through the communication links.
In operation, the IF system 102 can receive information, referred to as input
facts hereinafter, from one or more of the user devices 104 over the network 106. The input facts may pertain to, for example, events, incidents, physical objects, real or virtual games, artistic works, and the like. In one example, if the input facts pertain to an event that has already occurred, the input facts may include information about the place of occurrence of the event, the time at which the event had occurred, event details indicating what had happened, and details about the sources of the information. In another example, if the input facts pertain to an image, the input facts may include information regarding the composition of the image. the basic geometrical shapes in the image, the colors in the image, and the like. Further, the input facts may be received from various information sources including internal information sources, i.e., data sources within an organization, and external information sources, such as public information available on the World Wide Web (www). The input facts may be received either directly from the user devices 104 or as a result of operations of a software application, such as a search engine, a web crawler, or a data tracking program.
On receiving the input facts, the IF system 102 performs operations to garner
intelligence from the input facts. For this purpose, the IF system 102 includes one or more processor(s) 108, a system memory 112 coupled to the processor(s) 108, and interface(s) 110. The processor(s) 108 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) 108 are configured to fetch and execute computer-readable instructions and data stored in the system memory 112.
The interface(s) 110 may include a variety of software and hardware
interfaces, for example, interface for peripheral device(s) such as a keyboard, a mouse, an external memory, a printer, etc. Further, the interface(s) 110 may enable the IF system 102 to communicate over the network 106, and may include one or more ports for connecting the IF system 102 with other computing devices, such as web servers and external databases. The interface(s) 110 may facilitate multiple communications within a wide variety of protocols and networks, such as a network, including wired networks, e.g.. LAN, cable, etc., and wireless networks, e.g., WLAN, cellular, satellite, etc.
The system memory 112 may include any computer-readable medium known
in the art including, for example, volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The system memory 112 also includes program modules 114 and program data 116.
The program modules 114 include routines, programs, objects, components,
data structures, etc., which perform particular tasks or implement particular abstract data types. The program modules 114 further include a control module 118, an indexing module 120, knowledge modules 122, and other modules 124. The other modules 124 may include programs or coded instructions that supplement applications and functions on the information fusion system 102, for example, programs in the operating system.
The program data 116, amongst other things, serves as a repository for storing
data processed, received, and generated by one or more of the modules 114. The program data 116 includes a layered memory structure 126, facts 128 and other data 130. The other data 130 may include data generated as a result of the execution of one or more modules in the other modules 114.
It will be understood that the program data 116 has been shown to include the
layered memory structure 126 in one embodiment of the present subject matter, for the purpose of discussion only. However, in alternate embodiments, the layered memory structure 126 may be hosted on external memory device(s) or external computing device(s) communicatively connected to the IF system 102. The layered memory structure 126 can be any hierarchical memory structure known in the art. for example, a Blackboard memory
structure, in which software programs, such as some of the program modules 114, can read data from and write data into the various layers. It will also be understood that the layered memory structure 126 may not be physically partitioned into layers, and that the term level or layer refers to the level of abstraction of a data point or fact present in the layered memory structure 126.
In operation, once the input facts are received by the IF system 102, the control
module 118 may store the input facts as a part of the facts 128 in the layered memory structure 126, as an entry level layer or the lowest layer level fact, or at any other level. The control module 118 may then indicate to the indexing module 120 that new facts have been stored in the facts 128. In another implementation, the indexing module 120 may continually scan the facts 128 for new facts.
The indexing module 120 then indexes the input facts by associating various
fact parameters with the input facts. For example, the indexing module 120 may associate fact
parameters, such as fact type, fact level, a confidence probability, an availability indicator and
one or more properties p, with the input facts. Generally, using the availability indicator, the
input facts are marked as available. The properties/? can be chosen from any dimension7" on
which a distance can exist, for example, numbers, strings, etc. Thus, each fact ƒx is associated
with a set of properties {pxj}- Further, for each j there is a distance measure, so that distances
dj(pXj,pyj) between pXJ- and pyj, corresponding to facts fx and fy, can be computed. The distances
dj thus determined would be indicative of the similarity or nearness between the facts fx and fy,.
Thus, an input fact can be considered to be a part of group including other facts, which are
potentially similar to the input fact, if the pair-wise distance between the facts in the group
and the input fact is less than a threshold. It will be understood that in case none of the input
facts are similar to an input fact under analysis, this input fact may be indexed to a new group.
In one implementation, the indexing module 120 uses Locality-Sensitive
Hashing (LSH) for associating the properties with the input facts. Typically, LSH is a probabilistic technique to identify the similar facts and map the similar facts to the same group. The use of LSH, as per one implementation of the present subject matter, will be explained in detail later with reference to Fig. 2a. However, other probabilistic techniques known for indexing to identify nearest neighbors or similarities between data points can also be used in other implementations.
Once the input facts have been indexed, the control module 118 applies one or
more of the knowledge modules 122 on the facts 128 for discovering new facts from the input
facts, and thereby gaining intelligence. The control module 118 can use any known strategy
for the application of the knowledge modules 122 to discover new facts. For example, the
control module 118 may implement a randomized strategy, or an exhaustive strategy, or a
randomized strategy that implements a parallel terraced scan (PTS) strategy.
Each knowledge module s is a function or program that takes a set of facts, say
Fs={f...fk}z from the facts 128 in the layered memory structure 126 as an input in order to produce an output fact. In one implementation, the knowledge modules 122 are configured to produce an output fact only if the distance between any facts fx and fy, in Fs is less than a given threshold, i.e., if the condition given by equation 1 is satisfied.
where j is a pre-specified dimension and S is a pre-specified threshold. Thus, the knowledge modules 122 act in a locally selective manner and consider only those subsets of input facts which include input facts that are pair-wise close to each other with respect to at least one of their property dimensions, i.e. input facts that can be considered to be indexed to the same group. As a result, the knowledge modules 122 can discover new facts in linear time as opposed to quadratic time, which is generally taken up if the input facts are compared pair-wise.
Further, to produce the output facts and discover new facts in the process, the
knowledge modules 122 include modules configured for probabilistically estimating new facts. For example, some of the knowledge modules 122 may implement Bayesian reasoning as will be explained in detail with reference to Figs. 2b and 2c. It will be understood that not all knowledge modules 122 need to implement Bayesian reasoning. The knowledge modules 122 can implement different reasoning logics and functions based on the type and level of the input facts that the knowledge modules 122 operate on and the type of the input facts that the knowledge modules 122 are configured to discover.
Once one or more of the knowledge modules 122 generate the output facts, the
output facts are sent to the control module 118. The control module 118 then determines whether the output facts correspond to deduced facts or relate to existing facts itself, based on
the content of the output facts and a probability and a confidence level with which the
knowledge modules 122 assert the output facts. The deduced facts can be understood as either
completely new facts or new extensions to existing facts, thus belonging to either a next
higher level of facts or to the same level of facts as the input facts. Based on the
determination, the control module 118 then either selects the output facts for posting in the
layered memory structure 126 as a part of the facts 128 or decides to not post the output facts.
In case, an output fact is found to be a deduced fact and is posted in the layered
memory structure 126, the input facts that were used to generate the output fact are then
marked as consumed. Otherwise, if an output fact is not found to be a deduced fact, the output
fact is discarded and the input facts that were used to generate the output fact are kept as
available. Thus, input facts that have been used to generate deduced facts are excluded from
further analysis, while ones that did not generate deduced facts are kept for further analysis.
The posted output facts are then treated as input facts and are indexed by the
indexing module 120 and marked as available as described above. Further, the control module
118 applies the knowledge modules 122 for discovering newer facts as discussed above.
Thus, the IF system 102 can effectively handle large volumes of data and
deduce new facts continually in linear time. For example, the IF system 102 can extract information pertaining to various events from social media networks, such as Twitter1 M, blogs, discussion forums, websites, etc, and garner intelligence regarding popular public perception of the events. In another example, the IF system 102 can extract customer feedback from data available within the organization, such as direct email communications containing consumer feedback, complaints and suggestions, as well as transaction logs generated as a result of direct customer communications. The IF system 102 can therefore mine both external sources and internal sources of information to generate intelligence, for example, provide information on what is the latest topic of discussion in Twitter™, or what is the most common reason for customer complaints, etc.
Fig. 2a illustrates a layered memory structure corresponding to a representation
of the layered memory structure 126, in accordance with an embodiment of the present
subject matter. The layered memory architecture 126 can be considered to include a plurality
of input facts 128 in various layers or levels 201-1, 201-2 201-N, and 201-N+l. As
mentioned earlier, the layers correspond to levels of abstraction of the facts and may not correspond to physical or virtual layers in the memory.
Generally, an input fact received by the control module 118 from an
information source as described above, can be posted on or written into the layered memory structure 126 at a level, for example, level 0 201-1, based on the fact type of the input fact. Once the input facts, such as fact a 202 and fact b 203 have been posted on the layered memory structure 126, the indexing module 120 indexes these input facts. As mentioned above, the indexing module 120 associates various parameters including different fact types, fact levels, and fact properties p, with the input facts. The properties {px) assigned to a fact can be of any basic type, for example number, string, etc., from any dimension j on which a distance measure exists so that the difference d1{pXf.Py/) between px1 and pyf corresponding to the facts fx and f , can be computed.
The control module 118 then applies one or more of the knowledge modules
122 on the input facts 202, 203 to generate deduced fact 204. In one implementation, the knowledge modules 122 operate in a locally selective knowledge manner, i.e.,. a knowledge module s produces a new or deduced fact if and only if for any two facts fx and fy in Fa,
dj(pxi- pyj) < d for some j with δ being a fixed threshold.
The deduced fact 204 are then posted, for example, on the next higher level,
i.e.. level I 201-2. It will be understood that, based on the computation, the deduced fact may be posted at the same level, i.e., level 0 202-1 as well. The deduced fact 204 posted on level 1 is now treated as input fact and indexed by the indexing module 120 as described above. Further, the set of input facts 202, and 203 used by the knowledge modules 122 are declared to be consumed and the remaining input fact 208 on the blackboard remain available. For example, in fig. 2a, fact b 202 and fact c 203 which are stored in level 0 201-1 are used by the knowledge source to produced a new fact c 204 in level 1 201-2. Thus, facts a 202 and fact b 203 are the consumed facts and fact c 204 is a deduced fact which is available. Similarly, facts d 209 and fact e 208 can be available facts,
The control module 118 then applies one or more of the knowledge sources
122 as discussed, to further discover new facts and thus obtain higher levels of facts, for example, fact/205, fact g 206 and fact h 207 in level N and level N+l. Thus, the layered
memory structure 126 supports hierarchical pattern detection; wherein different knowledge
modules 112 are invoked to produce higher-level facts from lower-level ones, starting with
input facts that are initially posted at the lowest level, say. level 0 201 -1.
An exemplary implementation of the IF system 102 and indexing module 120
can be understood using information received from social media, such as Twitter™. Each tweet is treated as a bag of words: Say there are m distinct words in a set of n tweets. Consider a random numbering of words,/ that maps each word to an integer between 0 and m-1. The minhash of a tweet / under/is the lowest value of/(w,) among all words w(- contained in /. It
turns out that the probability of two tweets (treated as sets of words) sharing the same minhash value is exactly the set (Jaccard) distance between them, i.e., the size of their intersection divided by the size of their union; let us call this distance d. Now consider r similar random numberings of the m words, fx ... .fy. The probability that two tweets share the same minhash under all r functions is dr. Now we take b such groups of r hash functions, i.e., a total of hr functions. The probability that the minhash of two tweets will match in at least one group of r functions out of the b computed is 1 — (1 — dr)b. The shape of this function, which is a sigmod function, is the key concept that is used here, i.e., by choosing r and b appropriately any tweets within any fixed distance d are mapped to the same group with arbitrarily high probability, and the tweets farther than d will not map to the same group with an equally high probability. Further, computing br minhashes for each tweet is independent of the total number of tweets n, i.e., computation function computes each tweet's LSH independently, in constant time, and tweets that are close together are mapped to the same group.
Further, to obtain the deduced facts, the knowledge modules 122 use various
logic frameworks, for example, Bayesian network fragments. A Bayesian network fragment is made of a set of nodes and directed edges. The directed edges are attached with the conditional probability tables which form a connected Bayesian network with one root node. The root node corresponds to the type of fact that can be produced by a knowledge source or module for placement on the layered memory structure 126. Further, certain nodes in a Bayesian network are chosen as input, based on the properties of the input fact, which is to be matched with the input facts of the same type present on the layered memory structure 126.
Furthermore, certain nodes in the Bayesian network are chosen as evidence
nodes whose values are observed during the process of evaluating by the knowledge module 122. In the Bayesian network, these observations are modeled as soft evidence or noisy sensors. The conditional probabilities of this soft evidence are computed from the properties of the matched input facts during the matching process using a set of computation functions specific to the knowledge source. Likewise, computation functions are also defined to compute the properties of the output fact or deduced fact, i.e., corresponding to the root node. The Bayesian knowledge source places its output fact on the blackboard, i.e., the fact corresponding to the root node, if the a-posteriori probability φ of the root node crosses a
threshold T as a result of the matching process.
The generation of output facts corresponding to deduced facts or to new
extensions of existing facts is explained further with reference to figs. 2b and 2c.
Fig. 2b illustrates a Bayesian network fragment producing a higher level fact
on the layered memory structure 126, in accordance with an implementation of the present subject matter. The functionality is described with the help of the real-life application of social media, such as Twitter™ where the tweets arrive rapidly, say, hundreds each second. Each tweet needs to be analyzed and a decision taken as to whether it represents a new event or one that has already been reported. Each tweet can be viewed as a vector / in a high-dimensional space, i.e. that of words, with i( being one if the /"' word is present in that tweet.
A pair of tweets can be compared using set similarly (Jaccard distance) or cosine similarity, If a new tweet is sufficiently close to an existing one, it can be considered as
describing the same story; i.e. creates a 'building story' defined by it alone.
Once a building-story acquires sufficient strength, and with sufficient speed, it
is analyzed to determine if it is indeed a 'story-of-interest', based on the actual words contained in its constituent set of tweets. Further nuances include detecting when a new tweet adds sufficient additional insight to a breaking story to warrant it being tagged as a new branch discussing a novel aspect of that story, and so on. Thus, breaking-story, story-of-interest, new-branch, etc. are higher-level concepts than need to be detected in real-time from a large and rapidly arriving stream of data.
Now referring to Fig. 2b, consider new input facts that are not close to any fact
already posted on the blackboard. The newly arrived input facts, i.e., new story facts are
consumed by an appropriate knowledge source and posted on the layered memory structure
126. Subsequent new input facts are matched with the new story facts as shown in the fig. 2b.
As discussed above, only facts that are close together with respect to one of
their property, such as, cosine similarity, are compared by the knowledge modules 122. As
shown in fig.2b, a knowledge module 122 corresponding to the matching node 213, with
which the computation function is associated, verifies that the selected pair is actually
comparable. For example, during verification, the knowledge module 122 may check at the
matching node 213 that the meanings of the words in the new fact 212 and new story/building
story/ new branch are not different due to, say, a difference in the order of words used. The
controller module 118 can also use the knowledge modules 122 to add new input facts 212 to
existing stories, by replacing building story nodes 211 that may already be on the layered
memory structure 126, while incrementing the existing stories strength.
Fig. 2c illustrates a Bayesian network fragment discovering a higher level fact
which is similar to an existing fact on the blackboard but not close enough, and so may
represent a 'new approach' to the existing fact. The network shown in fig. 2c identifies an
input fact 216 that is similar to an existing fact 215 but is found to be not sufficiently near on
more careful assessment. Therefore, such situations represent a new perspective to the
existing information. Such situations are detected by the computation function associated with
the new features node 217, as shown in fig. 2c. The new features node 217 saliently checks
the part of information contained in the existing fact 215 as to whether they are likely to
constitute a new branch 214 while remained linked to the existing information.
Fig. 3 illustrates a method 300 for information fusion in a scalable memory
architecture, in accordance with an implementation of the present subject matter. The exemplary method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed
computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a
limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternative method. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
At block 302, a fact is posted on a layered memory structure, such as. the
layered memory structure 126. The fact can be understood as a piece of information
corresponding to an event or an object. The fact may indicate, for example, event details
indicating what has happened, place of occurrence of the event, time at which the event has
occurred, etc. Further, the fact can be provided by a user through internal sources, such as,
facts available over intranet or external sources, such as, facts available over Internet.
At block 304, the posted fact is indexed by associating the fact with various
parameters, such as fact type, fact level, availability indicator, confidence probability, and one or more fact properties. In one example, the posted fact can be mapped to a group including other facts, which are potentially similar to the posted fact, if the pair-wise distance between the facts in the group and the posted fact is less than a threshold. In one implementation, the indexing module 120 indexes the posted fact. Further, the indexing module 120 implements a fact indexing technique, such as locality-sensitive hashing, to index the facts on the layered memory structure.
At block 306, one or more knowledge modules are applied on the posted fact
based on the indexing. In an example, a knowledge module is applied on the posted fact to analyze the posted fact with respect to one or more other facts which are indexed to the same group as that of the posted fact. Further, the knowledge modules, which are applied to the posted fact, are selected based on a control criterion. In one example, the control criterion is based on the randomized control strategy, which implements parallel terraced scan (PTS). In one implementation, the control module 138 selects one or more knowledge modules 122 which are to be applied on the posted fact.
At block 308, upon application of one or more of the knowledge modules, an
output fact is received. In one implementation, the output fact is generated using Bayesian
reasoning by the applied knowledge modules 122.
At block 310, it is determined if the output fact is a deduced fact or not based
on a probability estimate. A deduced fact can be understood as a fact, which provides a higher
level information using one or more input facts indexed to the group of the posted fact. If it is
determined that the output fact is not a deduced fact ("No" branch of block 310). the method
300 proceeds to block 312. The probability estimate may be understood as a comparison of a
posteriori probability associated with the output fact with a threshold probability.
At block 312, the output fact is discarded and the posted fact is kept as
available on the layered memory structure 126. Thus, the posted fact remains available for
further analysis by other knowledge modules.
However, if on the application of the knowledge modules on the posted fact, it
is determined that the output fact is a deduced fact ("Yes" branch of block 310), the method
300 proceeds to block 314. For example, if it is ascertained that a posteriori probability
associated with the output fact is greater than a threshold probability, the output fact is
identified as the deduced fact.
At block 314, the posted fact is indicated as "consumed" on the layered
memory structure 126 to indicate that this fact has been processed and would not be available
for further processing. Further, the inputs facts using which the deduced fact is generated are
also marked as "consumed".
At block 316, the deduced fact is posted in the layered memory structure 126.
Thus, a new fact or piece of information is discovered by the IF system 102. The deduced fact
may correspond to the same or a next higher level of abstraction as the input facts from which
it was deduced.
As illustrated in the figure, from block 316, the method 300 proceeds to block
304, where the deduced fact, after being posted on the layered memory structure 126, is
indexed as described above. It will be understood that in case the posted fact is substantially
different from facts, which have already been posted in the layered memory structure 126, the
posted fact is indexed so as map to a new bucket. Thus the method 300 can continue
iteratively to continually discover new facts.
Although embodiments for systems and methods for information fusion have
been described in language specific to structural features and/or methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary implementations for the information fusion systems and methods.
I/We claim:
1. A method for information fusion comprising:
posting a fact on a layered memory structure;
indexing the posted fact to a group of facts, such that facts with a similar property measure are indexed to a group;
applying one or more knowledge modules (122) on the posted fact and one or more input facts indexed to the group to provide an output fact; and
determining whether the output fact is a deduced fact based on a probability estimate,
2. The method as claimed in claim 1, wherein the indexing comprises associating one or more fact parameters with the posted fact.
3. The method as claimed in claim 2, wherein the one or more fact parameters comprise a fact type, a fact level, a confidence probability, an availability indicator, and a fact property.
4. The method as claimed in claim 1, wherein the applying comprises analyzing the posted fact with respect to the one or more input facts indexed to the group to determine a posteriori probability associated with the output fact.
5. The method as claimed in claim 4, wherein the analyzing comprises generating the output fact based on Bayesian reasoning.
6. The method as claimed in claim 1, wherein the determining comprises:
ascertaining whether a posteriori probability associated with the output fact is greater than a threshold probability; and
identifying the output fact as the deduced fact, when the posteriori probability is greater than the threshold probability.
7. The method as claimed in claim 1, wherein the method further comprises selecting the one or more knowledge modules (122) based on one of a random control strategy based on parallel terraced scan (PTS) and an exhaustive strategy.
8. The method as claimed in claim 1, wherein the method further comprises posting the deduced fact on the layered memory architecture, when the output fact is determined to be the deduced fact.
9. The method as claimed in claim 1, wherein the method further comprises indicating the one or more input facts as "consumed", when the output fact is determined to be the deduced fact.
10. The method as claimed in claim 1. wherein the method further comprises indicating the one or more input facts as "available", when the output fact is not determined to be the deduced fact.
11. An information fusion system (102) comprising:
a processor (108); and
a system memory (112) coupled to the processor (108), the system memory (112) comprising:
an indexing module (120) configured to index each of a plurality of input facts (128) based in part on a fact property associated with each of the plurality of input facts (128), wherein the plurality of input facts (128) are grouped into a plurality of groups based on a similarity property measure; and
at least one knowledge module (122) configured to:
generate an output fact based on analysis of one or more input facts indexed to a group from the plurality of groups; and
determine whether the output fact is a deduced fact based on a posteriori probability associated with the output fact.
12. The information fusion system (102) as claimed in claim 11, wherein the indexing module (120) is further configured to index the plurality of input facts (128) based on locality sensitive hashing (LSH).
13. The information fusion system (102) as claimed in claim 11, wherein the indexing module (120) is further configured to associate one or more fact parameters corresponding to the deduced fact for indexing the deduced fact.
14. The information fusion system (102) as claimed in claim 11, wherein the information fusion system (102) further comprises a control module (118) configured to post the deduced fact on the layered memory structure (126).
15. The information fusion system (102) as claimed in claim 11, wherein the information fusion system (102) further comprises a control module' (118) configured to select the at least one knowledge module (122) based on one of a random control strategy and an exhaustive strategy.
16. The information fusion system (102) as claimed in claim 11 wherein the at least one knowledge module (122) is configured to analyze the one or more input facts based in part on Bayesian reasoning.
17. A computer-readable medium having embodied thereon a computer program. for executing a method comprising:
indexing a fact posted on a layered memory structure (126) to a group of facts such that facts with a similar property measure are indexed to a group;
applying one or more knowledge modules (122) on the posted fact and one or more input facts indexed to the group;
obtaining an output fact based in part on reasoning logic applied by the one or more knowledge modules (122); and
determining whether the output fact is a deduced fact based on a probability estimate.
| # | Name | Date |
|---|---|---|
| 1 | 1924-MUM-2011-Information under section 8(2) (MANDATORY) [12-07-2018(online)].pdf | 2018-07-12 |
| 2 | 1924-MUM-2011-FORM 3 [12-07-2018(online)].pdf | 2018-07-12 |
| 3 | 1924-MUM-2011-OTHERS [20-07-2018(online)].pdf | 2018-07-20 |
| 4 | 1924-MUM-2011-FER_SER_REPLY [20-07-2018(online)].pdf | 2018-07-20 |
| 5 | 1924-MUM-2011-CORRESPONDENCE [20-07-2018(online)].pdf | 2018-07-20 |
| 6 | 1924-MUM-2011-COMPLETE SPECIFICATION [20-07-2018(online)].pdf | 2018-07-20 |
| 7 | 1924-MUM-2011-CLAIMS [20-07-2018(online)].pdf | 2018-07-20 |
| 8 | ABSTRACT1.jpg | 2018-08-10 |
| 9 | 1924-MUM-2011-PETITION UNDER RULE-138(10-1-2012).pdf | 2018-08-10 |
| 10 | 1924-mum-2011-form 5.pdf | 2018-08-10 |
| 11 | 1924-mum-2011-form 3.pdf | 2018-08-10 |
| 12 | 1924-MUM-2011-FORM 3(23-8-2012).pdf | 2018-08-10 |
| 13 | 1924-MUM-2011-FORM 26(27-9-2011).pdf | 2018-08-10 |
| 14 | 1924-mum-2011-form 2.pdf | 2018-08-10 |
| 16 | 1924-mum-2011-form 2(title page).pdf | 2018-08-10 |
| 17 | 1924-MUM-2011-FORM 18(18-8-2011).pdf | 2018-08-10 |
| 18 | 1924-mum-2011-form 1.pdf | 2018-08-10 |
| 19 | 1924-MUM-2011-FORM 1(10-1-2012).pdf | 2018-08-10 |
| 20 | 1924-MUM-2011-FORM 1 (10-1-2012).pdf | 2018-08-10 |
| 21 | 1924-MUM-2011-FER.pdf | 2018-08-10 |
| 22 | 1924-mum-2011-drawing.pdf | 2018-08-10 |
| 23 | 1924-mum-2011-description(complete).pdf | 2018-08-10 |
| 24 | 1924-mum-2011-correspondence.pdf | 2018-08-10 |
| 25 | 1924-MUM-2011-CORRESPONDENCE(27-9-2011).pdf | 2018-08-10 |
| 26 | 1924-MUM-2011-CORRESPONDENCE(23-8-2012).pdf | 2018-08-10 |
| 27 | 1924-MUM-2011-CORRESPONDENCE(18-8-2011).pdf | 2018-08-10 |
| 28 | 1924-MUM-2011-CORRESPONDENCE(10-1-2012).pdf | 2018-08-10 |
| 29 | 1924-MUM-2011-CORRESPONDENCE (10-1-2012).pdf | 2018-08-10 |
| 30 | 1924-mum-2011-claims.pdf | 2018-08-10 |
| 32 | 1924-mum-2011-abstract.pdf | 2018-08-10 |
| 34 | 1924-MUM-2011-HearingNoticeLetter.pdf | 2019-05-24 |
| 35 | 1924-MUM-2011-Correspondence to notify the Controller (Mandatory) [07-06-2019(online)].pdf | 2019-06-07 |
| 36 | 1924-MUM-2011-Written submissions and relevant documents (MANDATORY) [04-07-2019(online)].pdf | 2019-07-04 |
| 37 | 1924-MUM-2011-PatentCertificate27-08-2019.pdf | 2019-08-27 |
| 38 | 1924-MUM-2011-IntimationOfGrant27-08-2019.pdf | 2019-08-27 |
| 39 | 1924-MUM-2011-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 40 | 1924-MUM-2011-RELEVANT DOCUMENTS [28-09-2021(online)].pdf | 2021-09-28 |
| 41 | 1924-MUM-2011-RELEVANT DOCUMENTS [27-09-2022(online)].pdf | 2022-09-27 |
| 42 | 1924-MUM-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf | 2023-09-26 |
| 1 | searchstrategy_20-11-2017.pdf |