Source Video Based Video Summarization

Abstract: A system and method for embedding video summarization in user data of source stream is disclosed. Based on the user inputs such as "entry and exit" points, "rankings" of specific genre video sequence, the Video(V)/Audio(A) summary embedding system summarizes the source content (video) by using standard summarization algorithms. Further, audio content is summarized based on some predefined parameters such as frequencies present, histogram of audio spectrum and amplitude, male or female voices, background noise and so on. Later, the V/A summary embedding system embed the summarized video and audio content into source stream by using "Group Of Pictures (GOP) user data start code". Furthermore, when a user queries the V/A summary embedding system for particular genre content, appropriate content is identified by considering the user"s threshold time and displays the summary results to the user. FIG. l

Patent Information

Application #

Filing Date

25 July 2013

Publication Number

35/2013

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

HCL TECHNOLOGIES LIMITED

HCL TECHNOLOGIES LTD. 50-53 GREAMS ROAD, CHENNAI - 600006

Inventors

1. KOUSIK SANKAR RAMASUBRAMANIAM

HCL TECHNOLOGIES, #690, 4TH FLOOR GOLD HILL SQUARE HOSUR ROAD, BOMMANAHALLI, BANGALURR 560 068

2. KADARI SUBBARAO SUDEENDRA THIRTHA KOUSHIK

HCL TECHNOLOGIES, #690, 4TH FLOOR GOLD HILL SQUARE HOSUR ROAD, BOMMANAHALLI, BENGALURU 560 068

Specification

FIELD OF INVENTION

[001] The embodiments herein relate to video summarization and, more particularly, to embed video summarization in user data of source stream.

BACKGROUND

[002] In recent years, as the number of transmitted channels and video recordings increase, it becomes increasingly difficult for the end-user to keep track of programs that have been watched and those that haven't been watched. 'Movie summarization' provides a nice way for the end-user to determine whether or not a movie is worth watching, or to determine whether or not, he/she has already watched the movie or not.

[003] In most of the cases, video (for example 'movie') content is of mixed genres which may be at most three or four different genres, rarely five or more. Hence the user will recall very easily his most liked genre from different genre-based scenes of the same movie. For example, in a thriller movie, if the user's recall profile is romance, then the romance/intimate scenes will be very easily recallable by the user. There is a high probability that such scenes will be ignored by current movie summarization algorithms, since such algorithms will work based on the relative video attributes such as brightness, contrast and so on. Thus, currently, it is impossible for movie summarization algorithms to cater to the multitude of users. Further, there is no guarantee that the algorithms will work for all genres of video broadcasts such as news, sports and other genres of movies.

[004] Furthermore, it is not possible to extend the present summarization algorithms easily for various genres of movies (suspense, action, romance, thriller, horror, drama) where it is crucial that the movie summarization must not reveal the secretive plot. Hence there is always some form of manual intervention required to verify the correctness of the summarized video output and edit it, if required. Even, the time taken is far too much especially if it requires partial video decoding, analysis and comparison with other video frames. Further, the current technology of video summarization is more heuristic in nature with a variety of thresholds and analysis techniques to extract summary information. This is also true for audio content where a lot of content are without tags and when searching from a bank of audio content is a time consuming and expensive process.

[005] What is therefore required is a system and method which proposes standard video summarization information which is embedded into the source stream and uses corresponding audio tracks to make the decision on genre more accurately.

SUMMARY

[006] In view of the foregoing, an embodiment herein provides a method for source content based video summarization. The method comprises fetching an input source stream, creating a video summary corresponding to the fetched input source stream, and embedding the video summary to the source stream.

[007] Embodiments further disclose a system for source content based video summarization using a Video/Audio summary embedding system. The Video/Audio summary embedding system is further configured for fetching an input source stream using an Input/Output (I/O) module, creating a video summary corresponding to the fetched input source stream using a summary creation and genre content identification module, and embedding the video summary to the source stream using an embedding module.

[008] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings.

BRIEF DESCRIPTION OF THE FIGURES

[009] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:

[0010] FIG. l illustrates a general block diagram of Video(V)/Audio(A) summary embedding system environment, as disclosed in the embodiments herein;

[0011] FIG. 2 illustrates a block diagram that shows various components of V/A summary embedding system, as disclosed in the embodiments herein;

[0012] FIG. 3 is a flow diagram which shows various steps involved in the process of source video based video summarization, as disclosed in the embodiments herein; and

[0013] FIG. 4 is a flow diagram which shows various steps involved in the process of embedding V/A summary to source stream, as disclosed in the embodiments herein.

DETAILED DESCRIPTION OF INVENTION

[0014] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[0015] The embodiments herein disclose a system and method to enable embedding video/audio summarization information into the source stream. Referring now to the drawings, and more particularly to FIGS. 1 through 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown embodiments.

[0016] FIG. l illustrates a general block diagram of Video (V)/Audio (A) summary embedding system environment, as disclosed in the embodiments herein. This consists of a V/A summary embedding system 101 which has suitable interfaces to fetch source stream and required input parameters. Based on the inputs fetched, V/A summary embedding system 101 summarizes the source content and embeds the summary in the user data fields of the given source stream which may be a video or an audio content.

[0017] FIG. 2 illustrates a block diagram that shows various components of V/A summary embedding system, as disclosed in the embodiments herein. The system further comprises of an input (I) /output (O) module 101.a, an embedding module l0l.b, a summary creation and genre content identification module l0l.c and a memory module l0l.d. The I/O module 10La fetches the required input parameters such as 'entry and exit' points of specific genre video sequence, 'rankings' for each available genre and 'threshold time' from a user by providing suitable interfaces. Further, when the user queries for particular genre content, I/O module 10 La fetches the appropriate content from memory module l0l.d and displays the resulted outputs to user through any suitable interfaces such as touch screen, display screen and so on.

[0018] Further, the summary creation and genre content identification module l0l.c fetches the configured input parameters from I/O module 10La. Based on the fetched input parameters, summary creation and genre content identification module l0l.c creates corresponding V/A summary of the given V/A source stream by using any suitable standard video summarization 7/22 techniques. Further the created V/A summary is embedded into user data fields of given source stream. Later, when a user queries the V/A summary embedding system 101 for particular genre content, the summary creation and genre content identification module 10l.c identifies the appropriate genre content based on user requests such as genre type and finally displays the identified summary results to the user through I/O module 101 .a.

[0019] FIG. 3 is a flow diagram which shows various steps involved in the process of source video based video summarization, as disclosed in the embodiments herein. In order to embed V/A summary of a particular V/A into its source stream, initially the user has to configure the source stream and corresponding input parameters such as 'entry and exit' points of specific video sequence, 'rankings' for each available genre, 'threshold time' specifying summary length and so on through I/O module 101.a. The configured input parameters specify the user's interest on the source stream. Further, 'various genre entry points' acts as gate way to insert the 'genre' information into the multimedia stream. Generally, these 'genre entry points' refer to the user data hooks that are used to store the 'genre category'. For example, 'genre' may refer to any specific category such as comedy, suspense, action, romance, and thriller and so on. The genre category within the video will be limited to the sub-genres present in the video, for convenience of the user. Furthermore, when a user queries for particular genre content, the selected genre content will be assembled and played back.

[0020] The inputs 'entry and exit' points define the starting and ending points of the video clip that is to be summarized in the entire source stream. The end points (entry and exit points) are marked by data points in the user data fields of each video frame. Basically, the end points are indicated by a flag which indicates whether the current set of video frames (normally referred to as Group Of Pictures (GOP)), should be included as part of the video summary or not. This provides an efficient mechanism to parse the video to extract the summary information. Thus the V/A summary embedding system 101 can offer a mechanism via user interface to accept input from the user to mark a set of video frames as being a part of the summary. 'Rankings' are defined within the video content for each available genre. These rankings are intended to rank the chosen video frames, based on their relevance. A higher ranking indicates a higher relevance of the scene to be a representative of the movie. Further, the same set of video frames cannot have two rankings at the same time. The input 'threshold time' defines the maximum time a user interested to watch particular genre content i.e., the length of the summary. Further, the genre contents are selected based on the rankings mentioned to that particular genre content. Now, based on the threshold time given by the user, contents are selected one by one in the order of top to bottom until the sum of display time of those contents is less than or equal to the total threshold time. For example, one broadcaster wants the summary to be 60 seconds long; another might want it to be 120 seconds long and so on. For the 60 seconds requirement, the top N rankings are chosen such that the length of the movie summary is of the order of 60 seconds.

[0021] Further, the summary creation and genre content identification module 101.C fetches (302) these input parameters from I/O module 101.a and creates (304) a V/A summary of the given source stream by using suitable standard video summarization techniques which are stored in memory module 101.d. In an embodiment, the video frames can also be chosen and summarized by using an automated summarization algorithm. Each video frame or set of video frames (GOP) of any video stream are capable of storing some user data in their user data fields. Further the embedding module 101.c embeds (306) the encoded video and audio summary in this user data field of corresponding source stream. Furthermore, the created summary can be changed (add/delete video frames) based on the requirements of the user.

[0022] In case of audio summary, certain kinds of sounds are marked as romance or action or sports or voice based on frequencies present, histogram of audio spectrum and amplitude, male or female voices, background noise and so on. Depending on these parameters, inferences are drawn or tagging is done at source stream as various genres for the corresponding video content. The audio summary thus formed serves as an additional input for the decision¬making process for the genre. When an audio summary along with video summary is present, this helps the user as follows: For example, consider that the user searches for 'action' content. Let VSl and ASl be the video and audio summaries respectively for content 1. Similarly VS2 and AS2 are the video and audio summaries respectively for content 2. VSl is tagged with genre 'action', ASl with genre 'romance', VS2 is tagged with genre 'action', AS2 with genre 'action'. When user queries for 'action' content, content 2 will be shown first and then content 1 in decreasing order of search results, since for content 2, both video and audio are tagged as 'action. Thus, audio summary can also be rendered in addition to the video summary which helps the user to make better choices on the specific genre content which is queried. In an embodiment, the V/A summary embedding system 101 can summarize the audio files alone and an 'audio ancillary data' can be used to store genre information of corresponding audio files.

[0023] Furthermore, the V/A summary embedding system 101 stores the created V/A summary in the user data fields of source stream in a standard format so that while porting the source stream from one device to another, the V/A summary will also be ported automatically along with the source stream. When a user wants to watch a video of specific genre such as comedy or action or romance, the user can query the V/A summary embedding system 101 via the I/O module 101.a by entering the specific genre type such as comedy or action or romance.

[0024] Based on the user query, the summary creation and genre content identification module 101.c identifies (308) the corresponding genre type video contents from the list of summarized videos present at the user's end by using any suitable video genre identification methods. Appropriately the associated audio files will be automatically played back along with the videos identified. Finally, the summarized results are displayed (310) in a small window for the user in which user can do further modifications if required. In an embodiment, the user can create or edit any existing V/A summary while watching a video stream, as the system offers a mechanism via user interface to accept input from user to make a set of video frames as being part of the summary. Further, at any point in time, the user can choose the playback of V/A summary to verify its accuracy. Accuracy is determined by deciding whether the chosen video frames are representative of the entire video sequence.

[0025] For example, let us consider a scenario where there are lots of content owned by the user split across multiple devices, both at home and possibly on the cloud. The user queries the system for 'action' movies. Let VS1 and AS1 be the video and audio summaries respectively for content 1. Similarly VS2 and AS2 are the video and audio summaries respectively for content 2. VS1 is tagged with genre 'action', AS1 with genre 'romance', VS2 is tagged with genre 'action', AS2 with genre 'action'. When user queries for 'action' content, content 2 will be shown first and then content 1 in decreasing order of search results, since for content 2, both video and audio are tagged as 'action. Now the user may decide that content 1 is not really 'action' but in reality, 'romance'. He/she can then choose to modify the tag for content 1 (VS1) and re-tag it as 'romance'. Thus if this change is synchronized with content 1, then future search results will correctly show content 1 as 'romance'.

[0026] The various actions in method 300 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 3 may be omitted.

[0027] FIG. 4 is a flow diagram which shows various steps involved in the process of embedding V/A summary to source stream, as disclosed in the embodiments herein. Initially, the source stream is parsed (402) to extract the 'GOP user data start code' present in the corresponding source stream. A 'GOP user data start code' is a code which is extracted from Moving Pictures Expert Group (MPEG) standard. The 'GOP user data start code' indicates that the data following this start code is the user data for that particular GOP. Further, the embedding module l0l.c checks (404) whether the 'GOP user data start code' is available or not. If the user data start code is present, the details of the V/A summary information are added (406) to the source stream directly by using suitable data structures. In other case, if the 'GOP user data start code' is not available in the source stream, required start code is added (408) first and then the video/audio summarization related data structures are added (406) to the source stream. Thus the summarization of video/audio is easy which further consumes no extra bandwidth apart from normal video decoding bandwidth. The various actions in method 400 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 4 may be omitted.

[0028] Depicted below is an example data structure which indicates the byte structure within the user data block of MPEG-2 video frames:

[0029] The logical interpretation of each of the bytes is explained below:

[0030] MOV_SUMM: Values 0 or 1. Indicates whether this GOP will be used for movie summarization or not.

[0031] RANK: (Values range from 1 to 5) Ranking of the current GOP from 1 to 5 based on the relevance for the movie summarization. Between 2 GOPs (A and B), if A is ranked 4 and B is ranked 5 and the length of the summarized movie is required to be 60 seconds, then B is chosen first and if there are no other GOPs with ranking=5 and the total length of the summarized movie is less than 60 seconds, then and only then A is chosen.

[0032] GENRE: 32 bits defined by the broadcaster or an option chosen by the user.

[0033] FWD_DIR_EXIT_COND and BWD_DIR_EXIT_COND: (Values 0 or 1), the value T indicates "Exit to the next GOP pointed to by NXT_GOP_FWD_LNK or NXT_GOP_BWD_LNK". Further, the value '0' indicates "Continue to the consecutive frame".

[0034] NXT_GOP_FWD_LNK: Indicates the PTS (Presentation Time Stamp) of the next GOP to jump to in the forward direction.

[0035] NXT_GOP_BWD_LNK: Indicates the PTS (Presentation Time Stamp) of the next GOP to jump to in the backward direction.

[0036] Bit-rate Considerations: The total number of bits per representative GOP is about 105 bits (112 bits after byte boundary rounding) of user data. Doing a worst-case video analysis of 3 GOPs per second with each GOP containing 112 bits of extra user data in each GOP will result in a (3 X 112 X 100)/(6000000) = 0.0056% overhead for video decoding. All video decoders can easily handle this extra pay load.

[0037] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements.

The network elements shown in Fig. 1 to Fig. 2 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.

[0038] The embodiment disclosed herein specifies a system for source video based video summarization (ViSum). The mechanism allows embedding video/audio summarization information into the source stream itself providing a system thereof. Therefore, it is understood that the scope of the protection is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The method is implemented in a preferred embodiment through or together with a software program written in e.g. Very high speed integrated circuit Hardware Description Language (VHDL) another programming language, or implemented by one or more VHDL or several software modules being executed on at least one hardware device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof, e.g. one processor and two FPGAs. The device may also include means which could be e.g. hardware means like e.g.

An ASIC, or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means are at least one hardware means and/or at least one software means. The method embodiments described herein could be implemented in pure hardware or partly in hardware and partly in software. The device may also include only software means. Alternatively, the invention may be implemented on different hardware devices, e.g. using a plurality of CPUs.

[0039] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the claims as described herein.

CLAIMS

1. A method for source content based video summarization, said method comprises:

fetching an input source stream; creating a video summary corresponding to said fetched input source stream; and embedding said video summary to said source stream.

2. The method as in claim 1, wherein said video summary is created based on at least one of a ranking, threshold time and entry and exit points.

3. The method as in claim 2, wherein said at least one of ranking, threshold time and entry and exit points are preconfigured.

4. The method as in claim 1, wherein embedding said video summary to said input source stream further comprises:

parsing said input source stream;

identifying if a user data start code is associated with said parsed input source stream;

adding a user data start code to said input source stream if said user data start code is not associated with said input source stream; and adding said video summary to said user data start code.

5. A system for source content based video summarization using a Video/Audio summary embedding system, said Video/Audio summary embedding system configured for:

fetching an input source stream using an Input/Output (I/O) module;

creating a video summary corresponding to said fetched input source stream using a summary creation and genre content identification module; and embedding said video summary to said source stream using an embedding module.

6. The system as in claim 5, wherein said summary creation and genre content identification module is further configured to create said video summary based on at least one of a ranking, threshold time and entry and exit points.

7. The system as in claim 5, wherein said Video/Audio summary embedding system is further configured to provide option for preconfiguring at least one of a ranking, threshold time and entry and exit points using said I/O module.

8. The system as in claim 5, wherein said Video/Audio summary embedding system is further configured to embed said video summary to said input source stream using said embedding module by:

parsing said input source stream; identifying if a user data start code is associated with said parsed input source stream; adding a user data start code to said input source stream if said user data start code is not associated with said input source stream; and adding said video summary to said user data start code.

Documents

Application Documents

#	Name	Date
1	3322-CHE-2013 POWER OF ATTORNEY 25-07-2013.pdf	2013-07-25
1	3322-CHE-2013-AbandonedLetter.pdf	2020-01-06
2	3322-CHE-2013-FER.pdf	2019-07-02
2	3322-CHE-2013 FORM-1 25-07-2013.pdf	2013-07-25
3	abstract3322-CHE-2013.jpg	2013-08-22
3	3322-CHE-2013 CORRESPONDENCE OTHERS 25-07-2013..pdf	2013-07-25
4	3322-CHE-2013 FORM-5 25-07-2013..pdf	2013-07-25
4	3322-CHE-2013 FORM-18 07-08-2013.pdf	2013-08-07
5	3322-CHE-2013 FORM-3 25-07-2013..pdf	2013-07-25
5	3322-CHE-2013 FORM-9 07-08-2013.pdf	2013-08-07
6	3322-CHE-2013 FORM-2 25-07-2013..pdf	2013-07-25
6	3322-CHE-2013 DESCRIPTION (COMPLETE) 25-07-2013..pdf	2013-07-25
7	3322-CHE-2013 DRAWINGS 25-07-2013..pdf	2013-07-25
7	3322-CHE-2013 ABSTRACT 25-07-2013..pdf	2013-07-25
8	3322-CHE-2013 CLAIMS 25-07-2013..pdf	2013-07-25
9	3322-CHE-2013 DRAWINGS 25-07-2013..pdf	2013-07-25
9	3322-CHE-2013 ABSTRACT 25-07-2013..pdf	2013-07-25
10	3322-CHE-2013 DESCRIPTION (COMPLETE) 25-07-2013..pdf	2013-07-25
10	3322-CHE-2013 FORM-2 25-07-2013..pdf	2013-07-25
11	3322-CHE-2013 FORM-3 25-07-2013..pdf	2013-07-25
11	3322-CHE-2013 FORM-9 07-08-2013.pdf	2013-08-07
12	3322-CHE-2013 FORM-5 25-07-2013..pdf	2013-07-25
12	3322-CHE-2013 FORM-18 07-08-2013.pdf	2013-08-07
13	abstract3322-CHE-2013.jpg	2013-08-22
13	3322-CHE-2013 CORRESPONDENCE OTHERS 25-07-2013..pdf	2013-07-25
14	3322-CHE-2013-FER.pdf	2019-07-02
14	3322-CHE-2013 FORM-1 25-07-2013.pdf	2013-07-25
15	3322-CHE-2013-AbandonedLetter.pdf	2020-01-06
15	3322-CHE-2013 POWER OF ATTORNEY 25-07-2013.pdf	2013-07-25

Search Strategy

1	3322che2013searchstrategy_24-06-2019.pdf