Abstract: The present invention proposes a system and method for segmenting a multimedia document into physical or logical, spatial and temporal segments. Accordingly, the Proposed method creates a content based description of each multimedia document segment by automated, semi-automated or manual means. The description can be in terms of textual annotations, either provided by a user or machine generated, extracted media features, or a combination of the above. Also, the present invention integrates the retrieved multimedia document segments, and other document segments that are semantically linked to these document segments, into a complete presentation that can be interactively and non-linearly navigated.
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention: A SYSTEM AND METHOD FOR NON-LINEAR ACCESS OF MULTIMEDIA DATA
FROM A MULTIMEDIA COLLECTION
APPLICANT:
TATA Consultancy Services Limited A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
FIELD OF THE INVENTION:
The present invention relates to the field of multimedia data management. Particularly, the present invention relates to accessing of multimedia data from multimedia documents in a collection in nonlinear manner.
PRIOR-ART REFERENCES
1. Sujal S Wattamwar, Surjeet Mishra, and Hiranmay Ghosh, "Multimedia Explorer: Content based Multimedia Exploration" IEEE Tencon, Hyderabad (India) November 2008.
2. Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang, "Image Retrieval: Ideas, Influences, and Trends of the New Age", ACM Computing Surveys, 40(2), pp. 1-60, 2008.
3. Ying Liu, Dengsheng Zhang, Guojun Lu, Wei-Ying Ma. "A survey of content-based image retrieval with high-level semantics". Pattern Recognition. 40(1), pp. 262-282, 2007.
4. Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta and Ramesh Jain. "Content-Based Image Retrieval at the End of the Early Years". IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(12), pp. 1349-1380. 2000.
5. Cees G. M. Snoek and Marcel Worring. "Multimodal Video indexing: A Review of the State-of-the-art". Multimedia Tools and Applications. 25(1), pp. 5-35, 2005.
6. M. Davy and S.J. Godsill. "Audio Information Retrieval: A Bibliographical Study". Technical Report. University of Cambridge: Department of Engineering, Cambridge, UK. 2002.
7. H. Ghosh, S. Chaudhury, K. Kashyap and B. Maiti. "Ontology Specification and Integration for Multimedia Applications". In Ontologies: A Handbook of Principles, Concepts and Applications in Information Systems, Ed. R. Sharman, R. Kishore and R. Ramesh. Springer, 2007, pp. 265-296.
8. Marco Bertini, Alberto Del Bimbo, Carlo Torniai, Constantino Grana, Rita Cucchiara. "Dynamic pictorial ontologies for video digital libraries annotation". Workshop on multimedia information retrieval on the many faces of multimedia semantics. ACM Multimedia Conference, 2007.
9. Sujal Wattamwar and Hiranmay Ghosh. "Spatio-Temporal Query for Multimedia Database". Workshop on Workshop on Multimedia Semantics. ACM Multimedia Conference, 2008.
10. Gaurav Harit, Santanu Chaudhury and Hiranmay Ghosh. "Using Multimedia Ontology for Generating Conceptual Annotations and Hyperlinks in Video Collections". Web Intelligence 2006: 211-217.
BACKGROUND OF THE INVENTION:
The inventors of the present invention have found that a broader area of content based information retrieval (CBR) and specifically multimedia information processing and retrieval are highly researched subjects in both science and technology domains. However, even though the research is intense in this field, technological implementation for desired results is' still a challenge. Several research attempts have been made in this domain, some of them know to us mentioned above and described below.
Affordability and ubiquity of recording devices, such as digital cameras, web cameras and microphones and advances in Internet technology have resulted in huge collections of networked multimedia data. However, browsing through such collections in search of specific information is not an easy task.
There has been significant research in content based retrieval of image, video and audio documents. Reference [1] describes a generic archit ecture of a video repository that supports content based exploration.
References [2], [3] and [4] provide the state of the art in content based image retrieval.
Reference [5] summarizes state of the art in video indexing and [6] summarizes state of art of audio indexing and retrieval.
References [7] and [8] describe method to enrich ontology with audio-visual properties and use them for multimedia data retrieval.
Reference [9] describes a method for querying multimedia repository with spatial and temporal constraints between objects and events.
Reference [10] describes a method to create semantic hyperlinks in a video collection.
Several inventions have been made in this domain some of them known to us are described below:
US Patent 7124149 describes a method and apparatus for extracting a model vector representation from multimedia documents. A model vector provides a multidimensional representation of the confidence with which multimedia documents belong to a set of categories or with which a set of semantic concepts relate to the documents. A model vector can be associated with multimedia documents to provide an index of its content or categorization and can be used for comparing, searching, classifying, or clustering multimedia documents. A model vector can be used for purposes of information discovery, personalizing multimedia content, and querying a multimedia information repository. However, the proposed invention indexes multimedia documents with a number of concept
terms derived through machine processing of media contents. The model vector referred to in this invention represents an array of semantic concepts with their respective confidence value, e.g. a tree is likely in this picture with 80% confidence, a human being with 70% confidence, etc. This vector is created by machine interpretation of low level media features using machine learning principles. The present inventors do not attempt to create any such model vector as metadata for the collection, because such interpretation needs to be contextual. Instead, the inventors do the interpretation during the search phase (50), using ontology (60), in context of a specific query. However, there is a provision of manual labeling of multimedia artifacts and contents therein during metadata generation phase (30).
US patent 7184959 provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary. However, the technology described in the patent '959 does not really address an arbitrary combination of different media forms. In contrast, the proposed invention caters to a variety of multimedia data, such as still images, 2D/3D graphics, animations, video, speech and non-speech audio. It does not depend on any particular modality of information.
US patent application 20080208872 teaches about an approach of accessing audio or multimedia content uses associated text sources to segment the content and/or to locate entities in the content. A user interface then provides a user with a way to navigate the content in a non-linear manner based on the segmentation or linking of text entities with locations in the content. The user interface can also provide a way to edit segment-specific content and to publish individual segments of the content. The output of the system, for instance the individual segments of annotated content, can be used to syndicate and/or to improve discoverability of the content. However, the patent 872's approach is to use a textual transcript of speech or associated synchronized text (e.g. closed-caption text with news telecast) to index an audio / video stream (e.g. broadcast / telecast news) and to support retrieval of media segments using the index. Though the term 'multimedia' has been used in the patent text, the technology does not really address an arbitrary combination of different media forms.
Thus the prior art failed t o recognize the significance of creating a repository and content-based retrieval system that deals with multiple media types, to qualify to be a true multimedia repository, integration of multiple content types into a coherent multimedia presentation and use of a human-computer collaborative environment, where both computers and humans synergize in annotating the multimedia contents.
In order to solve the above mentioned problems, the present invention proposes a system and method for segmenting a multimedia document into physical or logical, spatial and temporal segments. Accordingly, the method creates a content based description of each multimedia document segment by automated, semi-automated or manual means. The description can be in terms of textual annotations, either provided by a user or machine generated, extracted media features, or a combination of the above.
Other features and advantages of the present invention will be explained in the following description of the invention having reference to the appended drawings.
OBJECTS OF THE INVENTION:
The primary object of the invention is to provide a non-linear access of multimedia data from a multimedia collection.
Another object of the invention is to provide a system and method for segmenting a multimedia document into physical or logical, spatial and temporal segments.
Another object of the invention is to provide a system and method for creating content based description of each multimedia document segment by automated, semi-automated or manual means. The description can be in terms of textual annotations, either provided by a user or machine generated, extracted media features, or a combination of the above.
Another object of the invention is to provide a system and method for semantic interpretation of annotations and media feature descriptors associated with multimedia documents and document segments.
Another object of the invention is to provide a system and method for creating hyper-media documents by creating links across multimedia document segments based on their semantic similarity.
Another object of the invention is to provide a system and method for searching of specific media segments in different ways, e.g. by textual keywords, audio-visual features of multimedia objects, scenes and events, audio-visual examples of multimedia objects, scenes and events and semantic specification.
Another object of the invention is to provide a system and method for integrating the retrieved multimedia document segments, and other document segments that are semantically linked to these document segments, into a complete and coherent presentation that can be interactively and non-linearly navigated.
SUMMARY OF THE INVENTION;
The present invention provides a multimedia repository system comprising a multimedia storage system which comprises physical storage device, processing device, input-output devices, filing system and database system to store the multimedia data and metadata. The present invention proposes a system and method for segmenting a multimedia document into physical or logical, spatial and temporal segments. Also, the proposed method creates a content based description of each multimedia document segment by automated, semi-automated or manual means. The description can be in terms of textual annotations, either provided by a user or machine generated, extracted media features, or a combination of the above.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings example constructions of the invention; however, the invention is not limited to the specific system and method disclosed in the drawings:
Figure 1 shows a block diagram of the present invention according to one embodiment of the invention.
J
Figure 2 shows a block diagram of creation of coherent multimedia presentation.
Figure 3 shows a block diagram for the working example of the present invention.
DETAIL DESCRIPTION OF THE INVENTION
Some embodiments of this invention, illustrating its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the fisted item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any methods, and systems similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred methods, and systems are now described.
Accordingly the present invention provides a system for non-linear access of multimedia data from a multimedia collection, the said system comprising of:
storage means having a database to store multimedia data, metadata, semantic descriptors and extracted media features; user interface means:
a. to enter multimedia data and metadata in the storage means;
b. to provide for semantic annotation to media objects and segments of stored multimedia
data created by computers, by way of user annotation;
user-input interface means to present a query in the form of textual keywords or descriptors to the system by the user;
a processing module means having program instructions stored in memory of the system that are configured to cause the processor according to the query entered by user to execute the following steps:
a. segmenting the stored multimedia data in to physical and logical segments and
creation of the metadata for each such segment;
b. indexing of multimedia segments and media objects with keywords and media
descriptors;
c. retrieving of the multimedia segments and media objects with textual keywords or
media descriptors;
d. resolving semantic description to textual and media feature based on the description
for semantic retrieval of multimedia content;
e. integration of retrieved multimedia document segments of one or more format types in
to a single coherent presentation document;
user-output interface means to present the user with integrated single coherent presentation document
The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms.
Figure 1 shows a block diagram of this invention according to one embodiment of the invention wherein the Ingestion (10) module enables a user to ingest a multimedia document to the multimedia repository system. It supports user interface for selecting different type of multimedia data, for example still images, animation, 2D and 3D graphics, speech and non-speech audio, video, multimedia presentations, screen-captures, etc. A user can define arbitrary media segments and can associate arbitrary metadata and annotations with each media segment or the complete multimedia document. This is done in conjunction with segmentation and metadata extraction module, which facilitates the process.
Segmentation and metadata extraction (20) module complements ingestion module by automatically segmenting media data and by automatically creating metadata. The media data can be segmented either spatially or temporally by using different segmentation criteria, such as segments of fixed sizes
and/or durations, identified image regions or video shots, silence or speaker transition in speech, etc. depending on specific embodiment of the invention. The media instances can either be physically segmented, i.e. new media instances created or logically segmented, i.e. segments are marked on the original media instance depending on the specific embodiment of the invention. The generated descriptors can be of different types, for example spatial and temporal segment description, extracted media features for the media instance or its segments, annotations for the media instance or its segments created interactively by the users, geo-tagging data, or time and means of creation of the media, etc. or a combination thereof. The module can be realized as one processing element or several processing elements with different audio-visual media processing capabilities depending on the specific embodiment of the invention.
Object and event detection (30) module detects specific objects like human faces and events like entry of a person in the scene, depending on the specific embodiment of the invention. The extracted objects and events can be annotated by the user interactively, for example, a name assigned to a face or the make and model for an automobile. The metadata so generated comprises high level semantic descriptions and complements the metadata generated in module- 20. The module can be realized as one processing element or several processing elements with specialized functions and domain-specific media processing capabilities depending on the specific embodiment of the invention.
Media and metadata storage (40) module stores the media instance as well as the metadata containing information about media segments, media features and annotations, which are either manually or automatically generated. Different embodiments of the invention can employ different techniques, such as MPEG-7 compliant files, XML or other semi-structured databases, relational databases, inverted files, R-Tree and its variants, or combinations thereof for efficient access to the data.
The key elements in the central storage of Multimedia Explorer are the video descriptions, with which all the executable modules interact. The actual videos are referred to by their URL's and can be present anywhere in the network. However, they are temporarily downloaded from external sources and can be optionally stored on the local host. The content descriptions are stored in MPEG-7 format. It provides a rich set of standardized tools to describe multimedia content for human users and automatic systems that process audiovisual information..
Search (50) module encodes algorithms for text based and content based search for the multimedia documents and document segments available with the multimedia repository. The algorithms optimize search with available metadata, media descriptors, databases and index tables. When required media descriptors are not available, the algorithms analyze the media contents to generate the descriptors dynamically. The module can be realized as one processing element or several processing elements with specialized functions and media processing capabilities depending on the specific embodiment of the invention.
Multimedia Ontology (60) module encodes media property of semantic concepts and resolves a semantic query to a search specification comprising media property descriptions as well as textual keywords. The media property description can either be simple, e.g. color property of an object, or can be more complex with spatial and temporal restrictions, such as two persons entering a scene within some finite bound of temporal difference to signify them "entering together". The ontology can be encoded in different ways, for example, using Topic Maps, Resource Description Framework (RDF), Web Ontology Language (OWL), Multimedia Web Ontology Language (MOWL), or any other representation. The semantic query so expanded is actionable by the search module and is used to search the multimedia repository.
Integration (70) module encodes the algorithm for creating a coherent and interactive hypermedia presentation by linking the retrieved document segments and other document segments that are semantically similar to these document segments. The semantic similarity can be computed in different ways, such as based on similarity of concepts or events presented in the media segments, use of similar keywords in annotations, similarity of the objects contained, similarity of place or time of occurrence of the events, means of creation of the media (e.g. through a specific model of a camera), etc., or a combination thereof, based on specific embodiment of the invention. The hypermedia presentation can either be a linear temporal concatenation of retrieved media elements (e.g. for video segments), or a spatial organization on the screen (e.g. for still images) depending on the specific embodiment of this invention and the media type(s) retrieved. For temporal presentations, a "table of contents" can be automatically created with thumbnail images and/or text representing the constituent components to provide a quick overview and to provide an interactive means to reach a specific media segment faster in a non-linear manner. The techniques of integration can be creation of a SMIL script or an interactive Flash presentation based on the specific embodiment of this invention.
User interface (80) module provides for human-machine interaction. it enables a user to submit different forms of queries, using semantic concept or event description, textual keywords, media feature specifications, or a combination thereof. It also enables the user to browse through the hypermedia presentation created by the system in a non-linear interactive manner.
Figure 2 shows a block diagram of creation of coherent multimedia presentation (220) according to one embodiment of the invention wherein coherent multimedia presentation is created by integrating the retrieved media elements (205) using a presentation schema. A user can define new schema or may use one or more schema from a schema library (210). The integration module (215) populates the place-holders in the selected schema with the retrieved media elements. Different policies for fitment of the media elements in the schema, e.g. most-relevant element first or most recent element first, can be chosen depending on the specific embodiment of the invention.
According to one embodiment of the invention, human-computer collaboration environment adapted to
provide for semantic annotation to the segments created by computers, synergizing human efforts in annotating the multimedia contents. In spite of the advances of computer vision technology, it is not possible to generate complete semantic annotations automatically for unrestricted types of videos. Therefore, the inventors of the present invention have a provision for manual content annotation in this human in- the-loop system. In this phase, the video is described at different levels which help in interacting with them.
According to another embodiment of the invention, an input query from the user can be taken in different forms, text, multimedia data, or a combination thereof.
The search engine creates a ranked list of multimedia artifacts with respect to a query. A multimedia artifact consists of several media elements, each of which contains several media objects. The search engine ranks a multimedia artifact in three steps. As a first step, the search engine computes the similarity of the query objects with the objects contained in the media elements. In the second step, the search engine evaluates the similarity of the media elements with the query as a function of object similarities and their spatio-temporal relations. Finally, the similarity of a multimedia artifact is computed as a function of the similarity scores of the constituent media elements. The multimedia artifacts are ranked on the basis of this final similarity score.
In Textual query case, the user types some text in a query box provided with the system. The query can either be one or more keywords, a phrase, or a natural language sentence. This input query is used to find its relevance with the semantic annotations of the media objects, which are stored in the MPEG-7 files, In finding the relevance of the object descriptors with respect to the input query, only the keywords, and not stop-words, are taken into consideration, Stop words are the words in the text that are of very less significance in the context of the text. In general, the stop words include articles, prepositions, pronouns like a, an, the, and, etc. Further the list of stop words can be adapted to suit the specific requirements of an embodiment of the invention.
In this search based on textual query, the query is first preprocessed for stop-word removal. Then the word-roots of the keywords are derived by using Porter's stemming algorithm. The stemming is appropriate for search because, mostly, the morphological variants of words have similar semantic interpretations. The word-roots are then used for searching the object descriptors in the MPEG-7 files. The relevance is calculated by using the classical probabilistic model. We have explored the possibilities of using various retrieval models namely Boolean model, vector space model. The Boolean model gives the binary score which is not desirable as the retrieval is based on binary decision criteria with no notion of partial matching. The vector space model assumes the independence of index terms, which is not desirable. The probabilistic model gives the probability that the document/textual description are relevant to the user query and it gives ranking of the documents according to their relevance. Thus the probabilistic model is suited for this framework. The BIR model estimates the
probability that a specific document dm will be judged relevant with respect to a specific query qk. The probabilistic approach is used because the user need cannot be expressed precisely.
Moreover, the textual query is semantically interpreted by a multimedia ontology to generate a set of other conceptual or media based descriptors, which are used for text-based and content based retrieval. These conceptual descriptors are used in conjunction with user specified text based descriptors and the media descriptors are used in conjunction with any additional user specified media in the query.
Media feature based queries are used when the user wants to find multimedia data that contain some objects that are similar to the objects present in the media instance supplied as the query and they bear similar spatio-temporal relations as in the query media instance. The users can submit a complete multimedia artifact, when all objects in the artifact become objects for search. Alternatively, a user can mark specific data objects in a multimedia data stream for searching. Different user interfaces and methods of marking the objects in the multimedia data stream can be deployed depending on the specific embodiment of the invention. Media feature based search is also used for searching the media descriptors that are generated by multimedia ontology as a result of semantic query interpretation. The search engine extracts various media features from that query objects such as color, texture and shape primitives and uses them for content based search. Multiple visual features are generally preferred over a single feature, since no single media feature can characterize and differentiate real-world objects. A weighted sum of scores for individual audio-visual features is used as overall visual similarity as follows:
Svisual = k1 * Scolorr + K2 * texture + k3*ssift +...
Where, Svisual = visual score
k1, k2, k3 ... = weights assigned to the features
k1 +k2 + k3 + ... = 1 In general, both textual and media features are used while searching for desired multimedia objects, when the weighted scores for the media objects (Sobj) are calculated by combining textual and media features.
Sobj = k*Stextual + (1+K)●S visual (1)
Where
Stextual = probabilistic textual score
Svisual - visual score K, 1-k = weights given to the features
The scores for contained media objects are combined to compute the overall score of a media element. The spatio-temporal relations between the objects are also considered during the process. The spatio-temporal relations are determined in a three dimensional space (x, y and t) and fuzzy algebra is deployed. Thus, the element score (Selement) is obtained as
Selement = F (S,R)
Where, S represents the set of object scores, R represents the set of spatio-temporal relations between
the objects and F represents a fuzzy function.
Finally, the score for the multimedia artifact is computed as the maximum of the score of its constituent
elements.
Sartifact = max Selement (2)
BEST MODE / EXAMPLE OF WORKING OF THE INVENTION:
The invention is described in the example given below which is provided only to illustrate the invention and therefore should not be construed to limit the scope of the invention.
FIG 3 provides a block diagram for the example. The Ingestion Interface (310) allows a user to ingest a media artifact, typically a video or a still image, into a library (320). The library hosts multimedia artifacts in a specific domain, e.g. the scientific world of Space Exploration. The user can annotate the various media objects in the media artifact during ingestion. The media artifact is also decomposed into several media elements and media features based descriptors, typically color histogram and texture patterns, are computed. The semantic and media descriptors are stored as MPEG-7 files together with the media artifacts in the library. Index files for the keywords and the media feature descriptors are created for faster search.
The Search Interface (330) allows a user to submit a query by typing some text in a text-box, or by uploading an example image, or both. The textual part of the query is interpreted by ontology (340), to generate related conceptual descriptors (keywords) and more example images. The ontology (340) encodes knowledge about Space Exploration using TopicMaps schema, which allows establishing relation between the different concepts in the domain and associating them with example images. The keywords supplied by the user as well as the additional keywords generated by the onto\ogy are used by the Search Engine (350) to search the semantic descriptors in MPEG-7 files in the library using the index files. In addition, the Search Engine (350) extracts media features of user supplied and ontology generated images and uses these features to search the media descriptors in the MPEG-7 files in the library using the index files. Further, the Search Engine (350) integrates the match scores obtained from the text-based and media-based comparators and computes an overall score for the available media artifacts. The top results are now integrated using an integration schema to create an interactive multimedia presentation that plays on the Search Interface (300).
CLAIMS:
1. A system for non-linear access of multimedia data from a multimedia collection, the said
system comprising of:
storage means having a database to store multimedia data, metadata, semantic descriptors and media feature descriptors;
user interface means:
a. to enter multimedia data and metadata in the storage means;
b. to provide for semantic annotation to the segments of stored multimedia data created
by computers, by way of user annotation;
user-input interface means to present a query in the form of textual keywords or descriptors to the system by the user;
a processing module means having program instructions stored in memory of the system that are configured to cause the processor according to the query entered by user to execute the following steps:
a. segmenting the stored multimedia data in to physical and logical segments and
creation of the metadata for each such segment;
b. indexing media objects and multimedia segments with textual keywords or media
feature descriptors
c. retrieving of media objects and multimedia segments with textual keywords or media
feature descriptors;
d. resolving semantic description to textual and media feature based on the description
for semantic retrieval of multimedia content;
e. integration of retrieved multimedia document segments of one or more format types in
to a single coherent presentation document;
user-output interface means to present the user with integrated single coherent presentation document
2. A system as claimed in claim 1, wherein the said processing module employs a fuzzy logic to capture at least one temporal segment of at least one multimedia data item, wherein each temporal segment is presented for human annotation.
3. A system as claimed in Claim 1, wherein the single coherent presentation document is created by integrating the retrieved multimedia segments using at least one schema selected from the group of schemas stored in a schema library.
4. A system as claimed in Claim 3, wherein the single coherent presentation document is created
by integrating the retrieved multimedia segments using at least one schema selected by the
user.
5. A system as claimed in claim 1, wherein each multimedia document segment of a multimedia document in the coherent presentation document is semantically linked to form a complete presentation which can be interactively and non-linearly be navigated.
6. A system as claimed in Claim 1, wherein the coherent and interactive multimedia presentation format is selected from at least one of SMIL and Flash.
7. A system as claimed in Claim 1, wherein at the user interface means the computer means are used to determine the temporal segments in the multimedia data and user annotates the multimedia data;
8. A method for non-linear access of multimedia data from a multimedia collection wherein the said method comprising the steps of:
storing multimedia data and metadata in a database;
Providing a user interface to enter a multimedia data and metadata in the said storage and for semantic annotation to the segments of stored multimedia data created by computers, by way of user annotation;
Providing a user-input interface to present a query in the form of textual keywords or descriptors to the system by the user;
a processing module means having program instructions stored in memory of the system that are configured to cause the processor according to the query entered by user to execute the following steps:
a. segmenting the stored multimedia data in to physical and logical segments and
creation of the metadata for each such segment;
b. retrieving of the multimedia segments with textual keywords or descriptors;
c. resolving semantic description to textual and media feature based on the description
for semantic retrieval of multimedia content;
d. integration of retrieved multimedia document segments of one or more format types in
to a single coherent presentation document;
providing a user-output interface to present the user with integrated single coherent
presentation document
9. A method as claimed in claim 8, wherein the said processing module employs a fuzzy logic to capture at least one temporal segment of at least one multimedia data item, wherein each temporal segment is presented for human annotation.
10. A method as claimed in claim 8, wherein the single coherent presentation document is created by integrating the retrieved multimedia segments using at feast one schema selected from the group of schemas stored in a schema library.
11. A method as claimed in Claim 8, wherein the single coherent presentation document is created by integrating the retrieved multimedia segments using at least one schema selected by the user.
12. A method as claimed in claim 8, wherein each multimedia document segment of a multimedia document in the coherent presentation document is semantically linked to form a complete presentation which can be interactively and non-linearly be navigated.
13. A method as claimed in Claim 8, wherein the coherent and interactive multimedia presentation format is selected from at least one of SMIL and Flash.
14. A method as claimed in Claim 8, wherein at the user interface means the computer means are used to determine the temporal segments in the multimedia data and user annotates the multimedia data.
15. A system and method substantially as herein described with reference to and as illustrated by the accompanying drawings.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 1742-MUM-2010-RELEVANT DOCUMENTS [27-09-2023(online)].pdf | 2023-09-27 |
| 1 | Examination Report Reply Recieved [20-04-2016(online)].pdf | 2016-04-20 |
| 2 | 1742-MUM-2010-RELEVANT DOCUMENTS [30-09-2022(online)].pdf | 2022-09-30 |
| 2 | Description(Complete) [20-04-2016(online)].pdf | 2016-04-20 |
| 3 | Claims [20-04-2016(online)].pdf | 2016-04-20 |
| 3 | 1742-MUM-2010-RELEVANT DOCUMENTS [23-09-2021(online)].pdf | 2021-09-23 |
| 4 | Abstract [20-04-2016(online)].pdf | 2016-04-20 |
| 4 | 1742-MUM-2010-RELEVANT DOCUMENTS [30-03-2020(online)].pdf | 2020-03-30 |
| 5 | 1742-MUM-2010-RELEVANT DOCUMENTS [26-03-2019(online)].pdf | 2019-03-26 |
| 5 | 1742-MUM-2010-Correspondence to notify the Controller (Mandatory) [06-11-2017(online)].pdf | 2017-11-06 |
| 6 | 1742-MUM-2010-Written submissions and relevant documents (MANDATORY) [23-11-2017(online)].pdf | 2017-11-23 |
| 6 | 1742-MUM-2010-IntimationOfGrant26-11-2018.pdf | 2018-11-26 |
| 7 | 1742-MUM-2010-PatentCertificate26-11-2018.pdf | 2018-11-26 |
| 7 | 1742-MUM-2010-MARKED COPIES OF AMENDEMENTS [23-11-2017(online)].pdf | 2017-11-23 |
| 8 | 1742-MUM-2010-AMMENDED DOCUMENTS [23-11-2017(online)].pdf | 2017-11-23 |
| 8 | 1742-mum-2010-abstract.pdf | 2018-08-10 |
| 9 | 1742-MUM-2010-Amendment Of Application Before Grant - Form 13 [23-11-2017(online)].pdf | 2017-11-23 |
| 9 | 1742-mum-2010-claims.pdf | 2018-08-10 |
| 10 | 1742-MUM-2010-CORRESPONDENCE(17-6-2010).pdf | 2018-08-10 |
| 10 | Form 2_Clean Copy.pdf | 2018-08-10 |
| 11 | 1742-MUM-2010-CORRESPONDENCE(6-7-2010).pdf | 2018-08-10 |
| 11 | FER Response (1742-MUM-2010) 20 April 2016.pdf | 2018-08-10 |
| 12 | 1742-MUM-2010-CORRESPONDENCE(IPO)-(FER)-(23-4-2015).pdf | 2018-08-10 |
| 12 | Amended Claims_Markup copy.pdf | 2018-08-10 |
| 13 | 1742-mum-2010-correspondence.pdf | 2018-08-10 |
| 13 | Amended Abstract_clean copy.pdf | 2018-08-10 |
| 14 | 1742-mum-2010-description(complete).pdf | 2018-08-10 |
| 14 | abstract1.jpg | 2018-08-10 |
| 15 | 1742-mum-2010-drawing.pdf | 2018-08-10 |
| 15 | 1742-MUM-2010_EXAMREPORT.pdf | 2018-08-10 |
| 16 | 1742-MUM-2010-FORM 1(6-7-2010).pdf | 2018-08-10 |
| 16 | 1742-MUM-2010-HearingNoticeLetter.pdf | 2018-08-10 |
| 17 | 1742-mum-2010-form 3.pdf | 2018-08-10 |
| 17 | 1742-mum-2010-form 1.pdf | 2018-08-10 |
| 18 | 1742-mum-2010-form 18.pdf | 2018-08-10 |
| 18 | 1742-MUM-2010-FORM 26(17-6-2010).pdf | 2018-08-10 |
| 19 | 1742-mum-2010-form 2(title page).pdf | 2018-08-10 |
| 19 | 1742-mum-2010-form 2.pdf | 2018-08-10 |
| 20 | 1742-mum-2010-form 2(title page).pdf | 2018-08-10 |
| 20 | 1742-mum-2010-form 2.pdf | 2018-08-10 |
| 21 | 1742-mum-2010-form 18.pdf | 2018-08-10 |
| 21 | 1742-MUM-2010-FORM 26(17-6-2010).pdf | 2018-08-10 |
| 22 | 1742-mum-2010-form 1.pdf | 2018-08-10 |
| 22 | 1742-mum-2010-form 3.pdf | 2018-08-10 |
| 23 | 1742-MUM-2010-FORM 1(6-7-2010).pdf | 2018-08-10 |
| 23 | 1742-MUM-2010-HearingNoticeLetter.pdf | 2018-08-10 |
| 24 | 1742-MUM-2010_EXAMREPORT.pdf | 2018-08-10 |
| 24 | 1742-mum-2010-drawing.pdf | 2018-08-10 |
| 25 | 1742-mum-2010-description(complete).pdf | 2018-08-10 |
| 25 | abstract1.jpg | 2018-08-10 |
| 26 | 1742-mum-2010-correspondence.pdf | 2018-08-10 |
| 26 | Amended Abstract_clean copy.pdf | 2018-08-10 |
| 27 | 1742-MUM-2010-CORRESPONDENCE(IPO)-(FER)-(23-4-2015).pdf | 2018-08-10 |
| 27 | Amended Claims_Markup copy.pdf | 2018-08-10 |
| 28 | 1742-MUM-2010-CORRESPONDENCE(6-7-2010).pdf | 2018-08-10 |
| 28 | FER Response (1742-MUM-2010) 20 April 2016.pdf | 2018-08-10 |
| 29 | 1742-MUM-2010-CORRESPONDENCE(17-6-2010).pdf | 2018-08-10 |
| 29 | Form 2_Clean Copy.pdf | 2018-08-10 |
| 30 | 1742-MUM-2010-Amendment Of Application Before Grant - Form 13 [23-11-2017(online)].pdf | 2017-11-23 |
| 30 | 1742-mum-2010-claims.pdf | 2018-08-10 |
| 31 | 1742-MUM-2010-AMMENDED DOCUMENTS [23-11-2017(online)].pdf | 2017-11-23 |
| 31 | 1742-mum-2010-abstract.pdf | 2018-08-10 |
| 32 | 1742-MUM-2010-PatentCertificate26-11-2018.pdf | 2018-11-26 |
| 32 | 1742-MUM-2010-MARKED COPIES OF AMENDEMENTS [23-11-2017(online)].pdf | 2017-11-23 |
| 33 | 1742-MUM-2010-Written submissions and relevant documents (MANDATORY) [23-11-2017(online)].pdf | 2017-11-23 |
| 33 | 1742-MUM-2010-IntimationOfGrant26-11-2018.pdf | 2018-11-26 |
| 34 | 1742-MUM-2010-RELEVANT DOCUMENTS [26-03-2019(online)].pdf | 2019-03-26 |
| 34 | 1742-MUM-2010-Correspondence to notify the Controller (Mandatory) [06-11-2017(online)].pdf | 2017-11-06 |
| 35 | Abstract [20-04-2016(online)].pdf | 2016-04-20 |
| 35 | 1742-MUM-2010-RELEVANT DOCUMENTS [30-03-2020(online)].pdf | 2020-03-30 |
| 36 | Claims [20-04-2016(online)].pdf | 2016-04-20 |
| 36 | 1742-MUM-2010-RELEVANT DOCUMENTS [23-09-2021(online)].pdf | 2021-09-23 |
| 37 | 1742-MUM-2010-RELEVANT DOCUMENTS [30-09-2022(online)].pdf | 2022-09-30 |
| 37 | Description(Complete) [20-04-2016(online)].pdf | 2016-04-20 |
| 38 | 1742-MUM-2010-RELEVANT DOCUMENTS [27-09-2023(online)].pdf | 2023-09-27 |
| 38 | Examination Report Reply Recieved [20-04-2016(online)].pdf | 2016-04-20 |