Abstract: A method for shot boundary detection from a compressed H.264 input video data using inter, intra and bi-directional prediction is disclosed. The method in accordance with this invention is based on three features which can be directly obtained from the compressed domain H.264 stream by partial decoding. Particularly, a method is envisaged where the entire video is split into several smaller logical units, called shots using compressed domain features of H.264. Therefore, the method is designed to detect the type of cut and also locate the position of the cut in real time.
FORM -2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
PROVISIONAL
Specification
(See Section 10 and rule 13)
METHOD AND APPARATUS FOR AUTOMATIC SHOT BOUNDARY DETECTION USING COMPRESSED DOMAIN INFORMATION
TATA CONSULTANCY SERVICES LTD.,
an Indian Company of Nirrnal Building, 9th floor, Nariman Point, Mumbai 400 02 1
Maharashtra. India
THE FOLLOWING SPECIFICATION DESCRIBES THE INVENTION
This invention relates to detection of cuts (shot boundaries) of video sequences.
In particular, this invention envisages a novel way of estimating the motion of frames of input videos using the concept of motion histogram.
In particular, this invention envisages a novel way of shot boundary detection (cut) of a video using compressed domain information.
In particular, this invention relates to automatic detection of cuts from an input video sequence.
In particular, this invention relates to a non-threshold based technique for detection of cuts in an input sequence.
Background of the Invention:
Browsing a digital video library can be very tedious especially with an ever-expanding collection of multimedia materials. Locating intended information effectively and efficiently presents a great challenge to the researchers in the field of video retrieval. The very first step in video retrieval is to identify the shot boundaries. Shot boundary detection on an embedded platform is an additional challenge because of the speed and memory constraints of the hardware.
With increased availability and an exponential growth in the number of image and video applications, the issue of analyzing, automatically indexing
2
and summarizing video content by retrieving highly representative information is of paramount importance and an active area of research.
A shot in a video sequence is a contiguous sequence of one or more video frames recorded by a single camera. The nature of transitions or boundaries between shots is varied. Depending on the nature, shot boundaries may be classified as being a cut, a fade, a wipe or a dissolve. A cut is an abrupt shot change that occurs in a single frame.Shot boundaries can be categorized in two ways. A hard cut is a drastic change between two frames, whereas a soft: cut includes fade-in, fade-out, wipe, dissolve etc.
A fade is a slow change in brightness usually resulting in or starting with a black frame. A dissolve occurs when the frames of the first shot get dimmer and the frames of the second get brighter, with frames within the transition showing one image superimposed on the other. A wipe occurs when pixels from the second shot replace those of the first shot in a regular pattern such as in a line from the left edge of the frames. Of course, other types of gradual transitions are possible.
Video shot boundary detection techniques are an active area of research in recent years and have to challenge the difficulty of finding shot boundaries in the presence of camera, object motion and illumination variations.
Several approaches have been reported in the literature for automatic shot boundary detection. The major automatic shot boundary detection techniques can be classified into five categories: pixel based, statistics based, histogram based, feature based and transform based.
3
Of these techniques one class exploits color intensity variations, another class exploits classification based on mathematical and statistical models; still other methods exploit edge difference and edge comparisons between successive frames. Most of the cut detection schemes are threshold based. They identify a cut if the frame difference between consecutive frames exceeds a certain threshold. Similarity between frames is expressed by extracting features between frames. Various types of features have been reported in literature. The features are based on:
Luminance/color - The average grayscale luminance is used as a feature but it is susceptible to illumination changes.
Color Histogram - Because of its insensitivity to camera motion, this technique is widely used.
Transform Coefficients- They can be used to describe texture which can be used as a feature.
Edges- Edge features have been used to detect cuts as they are invariant to illumination and motion.
The size of the region from which features are extracted plays a significant role on the performance of any shot detection algorithm. If the ROI is large, there is a tendency to miss transitions between similar shots. The spatial features used in literature include the following: Single frame pixel per picture e.g. luminance, edge strength. Arbitrary shaped region- In order to extract discontinuous regions better, arbitrary regions have been made the ROI for each frame to exploit homogeneity.
Whole frame- They have the advantages to be resistant to motion but the performance may be poor in detecting transition between shots.
4
Different approaches have their particular idiosyncrasies. In the pixel based approach, pixel wise difference is computed for two or more frames. The number of pixels (c) having a difference in intensity value more than some threshold is computed. It is concluded that a shot boundary exists if c is greater than some second threshold. Zhang et. al. uses a 3x3 averaging filter to reduce noise and camera motion effects. Hampur et. al. computed chromatic images by dividing the change in gray level of each pixel between frames by the gray level of that pixel in the second frame. Their method can detect the gradual transitions like wipe, dissolve and fades. However pixel based approaches are highly sensitive to camera motion and noises. Moreover the automatic threshold selection method is slow and manually adjusting the threshold is unlikely to be practical.
In the statistics based approach, the image is split into some regions and some statistical information like mean and standard deviation is obtained for each such region. This approach can overcome the problem of noise sensitivity, but is slow due to the complexity of the statistical formulas and generates many false alarms.
In the histogram based approach, some statistics on color and gray level histograms are used to detect, the shot boundary. This method gives a good tradeoff between accuracy and speed.
Zabih et. al. proposed a feature based approach. In this approach, the shot boundary is detected by detecting sudden changes in edges. Gradual transitions can be identified by looking at the values of the entering and exiting edge percentage.
5
Some recent works use multiple cues and some classifier like SVM to detect shot boundaries.
From a survey of the state of the art it is found that region based comparisons, motion vectors and running differences produces good result. But the problem with these methods is the complexity of the algorithm. Additionally, all the above mentioned approaches are slow and thus not capable of meeting the real time criteria. To achieve a fast algorithm, Arman et. al. proposes DCT based approach. Meng et. al. uses the motion vector of a MPEG stream. Naphade et.al. proposes a clustering and post filtering based approach which gives fairly high accuracy without producing many false positives. But this method uses complex algorithms like clustering and thus reduces the performance in terms of speed.
The present invention uses compressed domain approach.
The main advantage of compressed domain approach is that here some features computed during encoding can be readily used, thus saving time.
Some methods for detecting special effects like wipe and dissolve from
MPEG stream can be found in the patents US7050T15B2, US6940910B2,
US20020027616A1, US20020027616A1, EP1132812B1, and
EP1132812A1. Patent US20070030584A1 claims the detection of commercials in compressed domain.
In accordance with this invention a system is envisaged where the entire video is split into several smaller logical units, called shots using compressed domain features of H.264.
6
In accordance with this invention therefore the system is designed to detect the type of cut and also locate the position of the cut in real time.
In accordance with this invention there is provided a method for shot boundary detection from a compressed H.264 input video data using inter, intra and bi-directional prediction comprising the following steps:
(a) calculating the first order difference of DC coefficients of integer transformed residual part of luma and chroma components in the P frames/fields;
(b) calculating the number of macroblocks coded as intra in a P frame/field;
(c) calculating the edge strengths computed from de-blocking filter; and
(d) Detecting a shot boundary by evaluating the parameters calculated at the step (a), (b) and (c).
Typically, the step (a) calculates the first order difference of DC coefficients of Luma and chroma value for each 4x4 sub block.
Particularly, the step (a) includes the steps of:
Obtaining a region at which
(a)a change is found in the DC values of integer transformed coefficients of Luma and chroma component;
7
(b) applying a prescribed low pass filter to smoothen the data, and
(c) defining a threshold value based on which the candidate shot boundary can be detected.
Typically, the step (b) detects the shot boundary by computing the number of macroblocks coded as Intra in a P frame/field
Particularly, the step (b) includes the step of:
(a) computing the threshold from the local peaks by finding the mean of all P frames having non zero value of I coded macroblocks; and
(b) Finding the candidate shot boundary region by looking at the local peaks having a value greater than the threshold defined in claim 5 (a)
Typically, the step (c) includes the step of: using the de-blocking filter based features in shot boundary detection.
Particularly, the step (c) includes the step of: using the edge strength of the de-blocking filter.
Still particularly, the step (c) includes the step of: using the horizontal and vertical edges having an edge strength greater than or equal to 2, the edges being defined as strong edges.
Further particularly, the step (c) includes the step of: calculating the number of macroblocks having a strong edge.
8
Again particularly, the step (c) includes the step of: computing the first order difference of the count of strong edges for two consecutive frames.
Yet again particularly, the step (c) includes the step of: computing the mean and standard deviation as a method for computing threshold for identifying a candidate shot boundary
Still in another particular embodiment , the step (c) includes the step of: computing the candidate shot boundary if the strong edge count of any frame crosses the threshold defined hereinabove.
The method of the invention may further comprise the steps of: estimating a candidate shot boundary region using the methods described herein above .
The method may further comprise the steps of: not considering any shot boundary occurring within a specific time interval from the last shot boundary.
Typically, the step (d) includes the step of: computing a membership value for candidate shot boundary using the methods described in steps (a) to (c) hereinabove.
Particularly, the step (d) includes the step of: detecting a shot boundary using the membership value defined hereinabove.
9
In accordance with one preferred embodiment of the invention the method of may further comprise the steps of: judging the type of shot boundary using the gradient of the peaks obtained by the methods described herein above.
Particularly, in accordance with the method of the invention where any shot boundary is marked as gradual transition if the gradient of the peak is less than a threshold; and the shot boundary is marked as sharp cut if the gradient of the peak is greater than a threshold
Brief Description of the Accompanying Drawings:
The invention will now be described with reference to the accompanying drawings, in which
Figure 1 shows: An overview of the proposed algorithm;
Detailed Description of the Invention:
The method in accordance with this invention is based on three features which can be directly obtained from compressed domain H.264 stream by partial decoding.
The overview of the process is described in Figure 1.
Prediction mode based feature:
In H.264, like in any other hybrid video codec, any MB or block is predicted from video samples that have already been encoded - either in the same slice or in a previously encoded slice. If the current MB is predicted from a
10
previously encoded slice, the prediction mode is said to be of P type. If the prediction is done from previously encoded MBs from the same slice, the prediction mode is said to be of I type. Moreover the H.264 video codec supports different partitioning of the MB like 16x16, 16x8, 8x16, 8x8, 4x8, 8x4 and 4x4 for P mode and 16x16 and 4x4 for I mode. So, in the encoder side, Mean of Absolute Difference (MAD) is computed for each MB and sub-blocks for each type of prediction mode and the mode with minimum cost is selected as the prediction mode. It is observed that in P frames, P mode is selected if there is a temporal redundancy within two frames and I mode is selected when there is no correlation within two consecutive frames. So the number of MBs selected as I predicted mode gives an indication of its correlation with the previous frame. The pseudo code of the algorithm in the proposed method is given below:
1. Count the I predicted MBs in a P frame
2. Store them in a array a.
3. Compute the mean (μ) for non zero at
Equation 1
4. Find all local peaks of a, a,≤μ and store them
5. Find the width of each peak
De-blocking filter based feature:
Edge histogram is one of features used by Zabih et. al for shot boundary detection. But in this approach, additional time complexity is required for detecting edges. One of the new features of H.264 is the de-blocking (DB)
11
filter. In accordance with this invention the edge information obtained during the process of decoding is used without computing the edges explicitly.
A conditional filtering is applied to all 4x4 luma block edges of a picture, except for edges at the boundary of the picture and any edges for which the DB filter is disabled by a parameter specified in the standard as disable_deblocking_filter_idc, as specified in the slice header. This filtering process is performed on a macroblock after the picture construction process, and prior to DB filter process for the entire decoded picture. All macroblocks in a picture are processed in order of increasing macroblock address. For each macroblock and each component, vertical edges are filtered first. The process starts with the edge on the left-hand side of the macroblock proceeding through the edges towards the right-hand side of the macroblock in their geometrical order. Then horizontal edges are filtered, starting with the edge on the top of the macroblock proceeding through the edges towards the bottom of the macroblock in their geometrical order.
This invention only uses the value of edge strength.
The pseudo code for selecting candidate frames using this feature is given below:
1. Get the strength of DB filter for each MB.
2. If it is a strong horizontal edge or a strong vertical edge increase the candidate MB count for this frame by one and store these counts for each frame in an array (adh)
12
3. Compute the first order difference (d(adh)) of adh for two consecutive frames
4. Find the mean (μdh) and standard deviation (adh) for non zero d{adb) values
5. The threshold {rdh) is computed as
Equation 2
6. If (d(adh)) greater than tdh increase the candidate MB count for this frame by one and store these counts for each frame in an array (adh)
7. Apply median filtering on adh
8. Find the local peaks from adhand the frame number corresponding to the peak gives the possible candidate frame for shot boundary
DC component of transformed Luma coefficient based features:
Arman et. al. proposes a method where they compute DCT of each frame
and take it as a feature for detecting shot boundaries. In H.264 4x4 Integer
transformation is used which is different from the 8x8 DCT transformation
of MPEG series video codec. Integer transformation reduced the problem of
round off and floating point realization in a fixed point DSP.
The pseudo code of the algorithm in the proposed method is given below:
1. Get the Luma DC value (dc,) for each 4x4 sub block from decoder
2. Compute the first order difference {d{dc,)) of dc, with neighboring sub blocks in x and y direction.
13
3. If Ə(dC1)greater than a experimentally obtained threshold value,
increase the candidate MB count for this frame by one and store these counts for each frame in an array (at)
4. Apply median filtering on a,
5. Find the local peaks from a1and the frame number corresponding to the peak gives the possible candidate frame for shot boundary
Decision making process:
Each of the above mentioned features identifies some candidate frames
which possibly mark the beginning of a new shot.
Here is the pseudo code for decision making process.
1. All the candidate frames selected by the three features are stored in a buffer. Assume that no shot boundary exists within a time interval of 0.5 second.
2. Remove the duplicate frames and the fames within this range of 0.5 second
3. If multiple peaks are found in a short interval of time and if the peak gradient is less, it represents a gradual transition (GT). Otherwise it is a hard cut.
4. In case of a GT, we mark the start and end frames of the transition.
In accordance with this invention, therefore a novel method and apparatus is presented for automatic shot detection using compressed domain information.
14
In this method, the H.264 video stream is used as input and uses some compress domain features obtained during the parsing and decoding process of the compressed video stream to identify the shot boundaries.
In accordance with this invention, similar features obtained from the compress domain are used, thus avoiding the complex operations needed to find those features. On the encoder side, two consecutive frames are compared at the block level. The DC coefficient of transformed luma coefficient contains the major information about the luma component of that block. Moreover if the object contains high motion, it is encoded as I MB. So the DC value of transformed luma coefficient and the number of I MB in P frame gives the region based information. Since these features can be obtained during the process of decoding, no additional time is required to obtain them. Similarly, as the edge strength computed during de-blocking filtering gives the information about edges, no additional time for finding edges is required. Finally, these cues are used to conclude whether a shot boundary occurs or not.
The features used in this approach are
• Number of I Macroblocks (MBs) in a P frame
• No of sub-blocks containing strong de-blocking filter edges, and
• DC component of integer transformed luma coefficients
This invention uses a decision making process based on these features to determine whether a shot boundary has occurred or not.
By studying the behavior of the features, the decision about the type of scene transition is taken.
16
Although the invention has been described in terms of particular embodiments and applications, one of ordinary skill in the art, in light of this teaching, can generate additional embodiments and modifications without departing from the spirit of or exceeding the scope of the chained invention. Accordingly, it is to be understood that the drawings and descriptions herein are offered by way of example to facilitate comprehension of the invention and shouJd not be construed to limit the scope thereof.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 1945-MUM-2008-FORM 18(18-11-2010).pdf | 2010-11-18 |
| 1 | 1945-MUM-2008-RELEVANT DOCUMENTS [28-09-2023(online)].pdf | 2023-09-28 |
| 2 | 1945-MUM-2008-CORRESPONDENCE(18-11-2010).pdf | 2010-11-18 |
| 2 | 1945-MUM-2008-RELEVANT DOCUMENTS [26-09-2022(online)].pdf | 2022-09-26 |
| 3 | 1945-MUM-2008-RELEVANT DOCUMENTS [30-09-2021(online)].pdf | 2021-09-30 |
| 3 | 1945-MUM-2008-CORRESPONDENCE(IPO)-(FER)-(14-12-2015).pdf | 2015-12-14 |
| 4 | Petition Under Rule 137 [09-03-2016(online)].pdf | 2016-03-09 |
| 4 | 1945-MUM-2008-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 5 | OTHERS [09-03-2016(online)].pdf | 2016-03-09 |
| 5 | 1945-MUM-2008-ORIGINAL UR 6(1A) FORM 26-311218.pdf | 2019-06-14 |
| 6 | Other Document [09-03-2016(online)].pdf | 2016-03-09 |
| 6 | 1945-MUM-2008-IntimationOfGrant26-02-2019.pdf | 2019-02-26 |
| 7 | Examination Report Reply Recieved [09-03-2016(online)].pdf | 2016-03-09 |
| 7 | 1945-MUM-2008-PatentCertificate26-02-2019.pdf | 2019-02-26 |
| 8 | Description(Complete) [09-03-2016(online)].pdf | 2016-03-09 |
| 8 | 1945-MUM-2008-Written submissions and relevant documents (MANDATORY) [18-01-2019(online)].pdf | 2019-01-18 |
| 9 | 1945-MUM-2008-FORM-26 [31-12-2018(online)].pdf | 2018-12-31 |
| 9 | Claims [09-03-2016(online)].pdf | 2016-03-09 |
| 10 | 1945-MUM-2008-ExtendedHearingNoticeLetter_10Jan2019.pdf | 2018-12-14 |
| 10 | Abstract [09-03-2016(online)].pdf | 2016-03-09 |
| 11 | 1945-MUM-2008-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [29-11-2018(online)].pdf | 2018-11-29 |
| 11 | abstract1.jpg | 2018-08-09 |
| 12 | 1945-MUM-2008-HearingNoticeLetter.pdf | 2018-11-07 |
| 12 | 1945-MUM-2008_EXAMREPORT.pdf | 2018-08-09 |
| 13 | 1945-MUM-2008-ABSTRACT(14-9-2009).pdf | 2018-08-09 |
| 13 | 1945-MUM-2008-FORM 5(14-9-2009).pdf | 2018-08-09 |
| 14 | 1945-mum-2008-form 3.pdf | 2018-08-09 |
| 15 | 1945-MUM-2008-CORRESPONDENCE(11-9-2009).pdf | 2018-08-09 |
| 15 | 1945-mum-2008-form 26.pdf | 2018-08-09 |
| 16 | 1945-MUM-2008-CORRESPONDENCE(14-7-2010).pdf | 2018-08-09 |
| 16 | 1945-MUM-2008-FORM 26(14-7-2010).pdf | 2018-08-09 |
| 17 | 1945-mum-2008-form 2.pdf | 2018-08-09 |
| 17 | 1945-MUM-2008-CORRESPONDENCE(18-3-2009).pdf | 2018-08-09 |
| 18 | 1945-mum-2008-correspondence.pdf | 2018-08-09 |
| 19 | 1945-MUM-2008-DESCRIPTION(COMPLETE)-(14-9-2009).pdf | 2018-08-09 |
| 19 | 1945-mum-2008-form 2(title page).pdf | 2018-08-09 |
| 20 | 1945-MUM-2008-FORM 2(TITLE PAGE)-(14-9-2009).pdf | 2018-08-09 |
| 21 | 1945-mum-2008-description(provisional).pdf | 2018-08-09 |
| 21 | 1945-mum-2008-form 2(14-9-2009).pdf | 2018-08-09 |
| 22 | 1945-MUM-2008-DRAWING(14-9-2009).pdf | 2018-08-09 |
| 22 | 1945-mum-2008-form 1.pdf | 2018-08-09 |
| 23 | 1945-mum-2008-drawing.pdf | 2018-08-09 |
| 23 | 1945-MUM-2008-FORM 1(18-3-2009).pdf | 2018-08-09 |
| 24 | 1945-mum-2008-drawing.pdf | 2018-08-09 |
| 24 | 1945-MUM-2008-FORM 1(18-3-2009).pdf | 2018-08-09 |
| 25 | 1945-MUM-2008-DRAWING(14-9-2009).pdf | 2018-08-09 |
| 25 | 1945-mum-2008-form 1.pdf | 2018-08-09 |
| 26 | 1945-mum-2008-form 2(14-9-2009).pdf | 2018-08-09 |
| 26 | 1945-mum-2008-description(provisional).pdf | 2018-08-09 |
| 27 | 1945-MUM-2008-FORM 2(TITLE PAGE)-(14-9-2009).pdf | 2018-08-09 |
| 28 | 1945-MUM-2008-DESCRIPTION(COMPLETE)-(14-9-2009).pdf | 2018-08-09 |
| 28 | 1945-mum-2008-form 2(title page).pdf | 2018-08-09 |
| 29 | 1945-mum-2008-correspondence.pdf | 2018-08-09 |
| 30 | 1945-MUM-2008-CORRESPONDENCE(18-3-2009).pdf | 2018-08-09 |
| 30 | 1945-mum-2008-form 2.pdf | 2018-08-09 |
| 31 | 1945-MUM-2008-CORRESPONDENCE(14-7-2010).pdf | 2018-08-09 |
| 31 | 1945-MUM-2008-FORM 26(14-7-2010).pdf | 2018-08-09 |
| 32 | 1945-MUM-2008-CORRESPONDENCE(11-9-2009).pdf | 2018-08-09 |
| 32 | 1945-mum-2008-form 26.pdf | 2018-08-09 |
| 33 | 1945-mum-2008-form 3.pdf | 2018-08-09 |
| 34 | 1945-MUM-2008-ABSTRACT(14-9-2009).pdf | 2018-08-09 |
| 34 | 1945-MUM-2008-FORM 5(14-9-2009).pdf | 2018-08-09 |
| 35 | 1945-MUM-2008-HearingNoticeLetter.pdf | 2018-11-07 |
| 35 | 1945-MUM-2008_EXAMREPORT.pdf | 2018-08-09 |
| 36 | 1945-MUM-2008-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [29-11-2018(online)].pdf | 2018-11-29 |
| 36 | abstract1.jpg | 2018-08-09 |
| 37 | 1945-MUM-2008-ExtendedHearingNoticeLetter_10Jan2019.pdf | 2018-12-14 |
| 37 | Abstract [09-03-2016(online)].pdf | 2016-03-09 |
| 38 | Claims [09-03-2016(online)].pdf | 2016-03-09 |
| 38 | 1945-MUM-2008-FORM-26 [31-12-2018(online)].pdf | 2018-12-31 |
| 39 | Description(Complete) [09-03-2016(online)].pdf | 2016-03-09 |
| 39 | 1945-MUM-2008-Written submissions and relevant documents (MANDATORY) [18-01-2019(online)].pdf | 2019-01-18 |
| 40 | Examination Report Reply Recieved [09-03-2016(online)].pdf | 2016-03-09 |
| 40 | 1945-MUM-2008-PatentCertificate26-02-2019.pdf | 2019-02-26 |
| 41 | Other Document [09-03-2016(online)].pdf | 2016-03-09 |
| 41 | 1945-MUM-2008-IntimationOfGrant26-02-2019.pdf | 2019-02-26 |
| 42 | OTHERS [09-03-2016(online)].pdf | 2016-03-09 |
| 42 | 1945-MUM-2008-ORIGINAL UR 6(1A) FORM 26-311218.pdf | 2019-06-14 |
| 43 | Petition Under Rule 137 [09-03-2016(online)].pdf | 2016-03-09 |
| 43 | 1945-MUM-2008-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 44 | 1945-MUM-2008-CORRESPONDENCE(IPO)-(FER)-(14-12-2015).pdf | 2015-12-14 |
| 44 | 1945-MUM-2008-RELEVANT DOCUMENTS [30-09-2021(online)].pdf | 2021-09-30 |
| 45 | 1945-MUM-2008-CORRESPONDENCE(18-11-2010).pdf | 2010-11-18 |
| 45 | 1945-MUM-2008-RELEVANT DOCUMENTS [26-09-2022(online)].pdf | 2022-09-26 |
| 46 | 1945-MUM-2008-FORM 18(18-11-2010).pdf | 2010-11-18 |
| 46 | 1945-MUM-2008-RELEVANT DOCUMENTS [28-09-2023(online)].pdf | 2023-09-28 |