"A Method And System For Extracting A Scaled Image From A Video

< Back

"A Method And System For Extracting A Scaled Image From A Video Stream"

Abstract: A method for extracting a scaled image from a video stream comprising the steps of:defining the scaling factor to be applied; -extracting the principal frame from a video bit stream; -checking the frame for perceptual information and thereafter selection of the I - frame comprising perceptual information; -down sampling of the macro block of the selected perceptual frame followed by inverse discrete cosine of the down sampled macro block; and -deinterlacing of the down sampled macro block so as to get the final scaled image. 12

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 October 2006

Publication Number

46/2009

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SAMSUNG INDIA ELECTRONICS PVT

B-1 SECTOR 81 PHASE 11,NOIDA 201 305 UP INDIA

Inventors

1. RAJNEESH KHANNA

C/O B-1 SECTOR -81 PHASE 11,NOIDA 201 305, U INDIA

2. SUNIL KUMAR MORYA

C/O B-1 SECTOR -81 PHASE 11,NOIDA 201 305, U INDIA

3. PARTH DIXIT

C/O B-1 SECTOR -81 PHASE 11,NOIDA 201 305, U INDIA

Specification

Field Of Invention
The instant invention relates to the field of processing of video streams and more particularly to the fast extraction of spatially reduced frames from Intra coded frames of video bitstream so as to represent the extracted frames as thumbnails.
Background of the Invention
Thumbnails are small or reduced size versions of images or of some selected frames in a video bit stream. The thumbnails are used to identify the pictures or video rather than opening the whole video or image. Thus the miniaturized version of the image may be used so as to recognize the video or image rather than spending a lot of time in downloading an image or video for a non-useful image or video. Thus thumbnails, which are, reduced size versions of pictures or selected frames of video are used to make it easier to scan, and identify them, serving the same role for images as a normal text index does for words. As a design element the thumbnail enables a user to see many images without the overhead of full-sized versions.
Thumbnails have been used for various purposes including creation of photo galleries or online gallery. Herein a series of thumbnails may be displayed to help the viewer select the images of his choice out of a photogallery or online gallery or video gallery..
A number of thumnail producing systems and methods have been available, but just reduction of size of the picture does not result in a quality thumbnail.Some conventional thumbnailers just reduce the dimensions of a large image in pixels rather than producing a smaller version of the image. Similarly, thumbnails can be used as link to get the contents of use.
A number of conventional methods and system produce thumbnails of a video but do not result in good thumbnails either due to ineficient reduction of size or

i due to non checking of the information of uttter imporatnce in the video so as to produce thumbnails. Thus the thumbnails do not give a fair idea of the video. Some methods or systems available for thumbnail creation are either made for a particular application or require a lot of time.
Description of the invention
In order to obviate the above drawbacks the instant invention provides a method and system for extracting the most relevant reduced sized images so as to represent as thumbnails.
Advantageously, the instant invention provides a method and system to check the perceptual information of the frames before extraction of the relevant frame.
Still, the instant invention provides a fast method and system to extract the relevant frame as a thumbnail as video stream is decoded for the particular I-frame only.
A method for extracting a scaled image from a video stream comprising" the steps of defining the scaling factor to be applied; extracting the principal frame from a video bit stream; checking the frame for perceptual information and thereafter selection of the I - frame comprising perceptual information; down sampling of the macro block of the selected perceptual frame followed by inverse discrete cosine of the down sampled macro block; and deinterlacing of the down sampled macro block so as to get the final scaled image.
A system for extracting a scaled image from a video stream comprising the steps of: means connected to the user system for defining the scaling factor to be applied; means for extracting the principal frame from a video bit stream; means for checking the frame for perceptual information and thereafter selection of the I - frame comprising perceptual information; means for down sampling of the macro block of the selected perceptual frame followed by inverse discrete cosine

W
of the down sampled macro block; and means for deinterlacing of the down •*n-
sampled macro block so as to get the final scaled image.
The method of extraction of scaled images by reducing the size of the images in the video stream may be divided in three steps including perceptual information checking, partial inverse direct cosine transform and De-Interlacing.
Checking of Perceptual information:
The desired Intra coded frame, which is to be displayed as thumbnail needs to be checked for perceptual information. In perceptual information, the intra coded frame is checked for information, which can describe the video stream.
The frame is checked at the following points:
(w/2, h/2 ); (1/4w, 1/4h); ( 3/4w, 3/4h ); ( 1/4w, 3/4h ) and ( 3/4w, 1/4h ). Herein w refers to the width of the image and h refers to the height of the image. The following procedure is used to check the perceptual information in the frame.
1. Determine DC values at (w/2, h/2); (1/4w, 1/4h); (3/4w, 3/4h); (1/4w, 3/4h) and
(3/4w, 1/4h). If the absolute dc difference at the points is greater than the threshold value, then image is perceptible.
2. If the dc difference at the point is less than threshold value, then absolute dc
difference is determined for the next point.
3. The procedure continues till all the points have been checked for perceptual
information or a point has been found wherein the dc difference is greater than the threshold or all the points have been checked. If all the points have been checked for no perceptual information than the image is not perceptible. If for any of absolute difference is greater than the threshold then image is perceptible.

2. Partial IDCT and Down Sampling:
The selected l-Frame having the perceptual information is partially decoded and down sampled in the abovementioned stage. Herein, full decompression is not necessarily required. The partially decoded l-frame preserves its content very well and achieves reasonable perceptual quality in terms of visual inspections. Since full decompression of an l-frame to spatial domain to extract the thumbnail takes time. As Partial Decoding technique is used, the decoding process does not require extra time to fulfill inverse discrete cosine transform (IDCT).
The down sampling scheme in DCT domain (compressed data) is explained below by taking a DCT coded block of 16x16 pixels i.e. four 8x8 pixel blocks. Herein, each Direct cosine transformed (DCT) block is an 8x8,array and thus. "M x M" is the reduced size of the DCT block (M < 8). In MPEG-2, macroblocks are encoded via a two-dimensional Direct Cosine Transform (DCT). A two dimensional (2D) DCT is essentially a concatenated set of horizontal' and vertical set of 1D DCT. A DCT produces an array of coefficients equal in dimension to the array of pixels that it transforms. Conversely, the inverse DCT process produces pixels equal in the dimension to the DCT block upon which it operates. To perform a two-dimensional IDCT on a DCT block that has had coefficients eliminated, it is necessary to make a compensating adjustment in the IDCT transform to account for the difference in the transform length. A 2D IDCT can be expressed as follows:

Low Frequency Component Selection Process: Each DCT block is an 8x8 array and is represented by a 2-D dimensional array. The low frequency

components for each of the reduced size image is selected as per the scaling requirement of the final thumbnail image.
a) Sampling of image to Vz the size of original image: In order to sample an image to half of its original size each macro block of 16x16 pixels, a video frame is made up of four 8x 8 DCT blocks. Thereafter, 4x4 low frequency components from each of the DCT block is chosen to get a reduced size 4x4 DCT block. The 8x8 DCT blocks are represented by B1, B2, B3 and B4 and C1, C2, C3 and C4 represent 4x4 low frequency DCT blocks. The D1, D2, D3 and D4 are pixel blocks obtained after applying IDCT to each of the 4x4 C blocks. Z1 is 8x8 block represent a down sampled macro block.

B1 =

/ YOO Y01 Y02 Y03 Y04 Y05 Y06
Y20 Y21 Y22 Y23 Y24 Y25 Y26 Y27 Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37 Y40 Y41 Y42 Y43 Y44 Y45 Y46 Y47 Y50 Y51 Y52 Y53 Y54 Y55 Y56 Y57 Y60 Y61 Y62 Y63 Y64 Y65 Y66 Y67 Y70 Y71 Y72 Y73 Y74 Y75 Y76 Y77

The C1 (4x4 low frequency DCT block) extracted from B1 is as shown below:

C1 =

' \.
YOO Y01 Y02 Y03
Y10 Y11 Y12Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
\ /

The 2D Inverse discrete cosine transform is than applied on C1 so as to get a inverse discrete cosine transformed block i.e. D1.

D1 =

POO P01 P02 P03
P10 P11 P12 P13
P20 P21 P22 P23
P30 P31 P32 P33

Thus, by merging the inverse discrete transformed matrices D1, D2, D3 and D4 we get a final down sampled 8x8 pixel block Z1 is achieved.

Z1 =

D1 D2 D3 D4

b.) Down sampling of the image to 1A of the original image size:
As done in the method to get Vz of the original image here 2x2 low frequency components are selected from the DCT block rather than 4x4 components. In order to get 1/4 image size of the original image. 2x2 low frequency blocks are selected from each of the 8x8 DCT block to get a reduced size 2x2 DCT block. In this case Z1 i.e. down sampled macro block is a 4x4 block rather than a 8x8 block. Herein 2x2 IDCT matrices are derived by applying IDCT on low frequency DCT blocks i.e.d, C2, C3 and C4.so as to get D1, D2, D3 and D4.

C1 =

YOO Y01 Y01 Y11

POO P01

D1 =

P10P11

4x4 IDCT pixel block

Z1 =

D1 D2 D3 D4

4x4 scaled block

C) Down sampled to 1/8 of the Image size: To get the 1/8 of the image size, only one low frequency component is taken from each of the 8x8 DCT block. 1x1 IDCT transform matrices is derived from each DCT block. Here C and D blocks will have only one component and Z block is a 2x2 pixel block. C1 =| Y00| 1x1 matrix
D1 = | POO | 1x1 matrix

Z1 =

POO P01 P01 P11

C) Down sampled to 1/16th of the Image size: To get 1/16th of image size, first we get 1/8th of the original image as discussed herein above then interpolation is done on the 1/8th of the image received in earlier step. The interpolation considers the closest 2X2 pixels surroundings the unknown pixel. It then takes weighted average of these four pixels to arrive at its final interpolation value.

3. De-interlacing:
The reduced sized field images produced by partial IDCT method are sufficient for producing thumbnail image for video indexing. As for producing thumbnail images from video only intra coded frames are used, thus there is no need to consider the motion artifacts produced in sequence of images due to time difference between the frames of video sequence. Using one field and interpolating its top and bottom pixels to get the thumbnail in proper format can produce good quality thumbnail.
Description Of Drawings
The present invention is not intended to be restricted to any particular form or arrangement, or any specific embodiment, or'any specific use, disclosed herein, since the same may be modified in various particulars or relations without departing from the spirit or scope of the claimed invention hereinabove shown and described of which the apparatus or method shown is intended only for illustration and disclosure of an operative embodiment and not to show all of the various forms or modifications in which this invention might be embodied or operated.
Figure 1 illustrates the process of the invention. In step 1 the header of the frames is parsed so as to detect the key frame followed by the detection of perceptual information in the key frame. Herein the compressed video stream is parsed to read the header contents. The discrete cosine transform coefficients of the blocks of encoded image are variable length decoded and saved for perceptual information. Perceptual checking method uses decoded differential coefficients to check the availability of perceptible data. If the frame contains the perceptual information content than the coefficients go through the partial decoding to get the reduced size image, otherwise the frame is rejected and another key frame from the video stream is selected. Thus, the key frames

comprising perceptual information are inverse quantized and decoded and stored in the storage area.
Figure 2A illustrates a flow chart for checking the perceptual information. The perceptual information describes whether the frame has enough information to describe the video stream. The discrete cosine values are calculated at (w/2,h/2), (w/4,h/4), (3/4w, 3/4h), (3/4w,h/4) and (1/4w, h/4) in step (201). The key frames are detected for the perceptual information by comparing the absolute discrete cosine value difference with the threshold value in step (202). If the absolute difference of discrete cosines is greater than the threshold value than image is declared as perceptible in step (204) else is declared non perceptible in step (205). If the image is non perceptible than the image is checked for the next position else it is declared perceptible. The image is checked for perceptible information in a spiral manner as shown in Figure 2B. In Figure 2B there is an image of 8X8 blocks, which are scanned in a spiral manner. The calculation of discrete cosine value starts from the middle block of the image and continues in a spiral manner towards the other blocks. If the absolute discrete cosine value at a particular block of the image is greater than threshold value than image is declared perceptible in step (204) else checks if all the blocks have been checked for perceptible information in step (207) moves to the other block in step (203). If all the blocks in the image are checked for the perceptible information and there is no block for perceptible information in any of the blocks than image is declared non-perceptible in step (208).
Herein only intracoded frames are used to produce the thumbnail images from video thus the process is not affected by the motion artifacts produced in sequence of images due to time difference between frames of the video sequence. Using one field and interpolating its top and bottom pixels to get the thumbnail in proper format produce good quality thumbnail.

Figures 3A and SBillustrates the overall deinterlacing procedure of the perceptible image. Fig 3-A and Fig. 3-B displays the overall de-interlacing process. The field pictures have height half the height of full frame image. To 6igai^/4hasfrarrWcfofe[rrdfa^c^rrh€Dfieid tfeiitoiqLescfepiednJUtetlpdcisllye poraateffiiradiive
image.with IDCT. A coefficient selection process eliminates
high frequency coefficients from the 8x8 DCT blocks of a macroblock of 16x16 pixels. These coefficients are then passed on to an IDCT block. The number of low frequency coefficients retained depends on the size of reduced image.
Fig 4-B is the diagram of a technique for discarding high frequency components. Herein 8x8 DCT block is converted in to MxM low frequency block by deleting the low frequency components. Than IDCT is performed on MxM IDCT block so as to get a final inverse discrete transformed block.
Figure 4C illustrates an embodiment wherein a 16x16 macroblock is taken to get a reduced or scaled image as per the scaling factor. In step 1 a 16x16 macrobiock is taken and is divided in four 8x8 blocks named as 61 , B2, B3 and B4 in step 2. Thereafter, each 8x8 DCT block is converted in to 4x4 pixel block by selecting the low frequency components in step 3. In step 4 Inverse discrete cosine transformation is done on each of the low frequency blocks i.e. C1, C2, C3 and C4 so as to result in D1, D2, D3 and D4. Thereafter a final block Z1 of 8x8 pixels is achieved in step 5.
4-D depicts the pixels at the same level whose values are known and the pixels, which are determined by taking a weighted average of the neighborhood pixels.
It will readily be appreciated by those skilled in the art that the present invention is not limited to the specific embodiments shown herein. Thus variations may be made within the scope and spirit of the accompanying claims without sacrificing the principal advantages of the invention

We Claim:
1. A method for extracting a scaled image from a video stream comprising the
steps of:
- defining the scaling factor to be applied;
- extracting the principal frame from a video bit stream;
- checking the frame for perceptual information and thereafter selection of the
I - frame comprising perceptual information;
- down sampling of the macro block of the selected perceptual frame followed by
inverse discrete cosine of the down sampled macro block; and
- deinterlacing of the down sampled macro block so as to get the final scaled
image.
2. A method as claimed in claim 1 wherein the principal frame is extracted by
processing the headers of the bit stream frames.
3. A method as claimed in claim 1 wherein perceptual information check
comprises the following steps:
- determining the dc value at various points and determining the absolute dc
difference for a particular point;
- comparing the absolute dc difference of a point with the threshold value;
- selection of the frame as a perceptual frame when the absolute difference is
greater than the threshold value; and
- rejection of a frame when the absolute dc difference is less than the threshold
value for all the points;
4.A method as claimed in claim 1 wherein the Inverse Discrete Cosine Transformation is performed on a macro block of the image frame selected as perceptual frame.

5. A method as claimed in claims 1 and 4 wherein the IDCT is performed on a
macro block is partial.
6. A method as claimed in claims 1 and 4 wherein the down sampling is done on
the basis of the sampling factor defined by the user.
7. A method as claimed in claim 1 wherein the macroblock of NxN pixels is
divided in four N/2xN/2 blocks followed by selection of N/4xN/4 iow frequency
components from each N/2XN/2 block.
8.A method as claimed in claim 7 wherein Inverse Discrete Cosine Transformation is performed on each of the N/2XN/2 block.
9. A method as claimed in claim 8 wherein all the Inverse Discrete Transformed
N/2XN/2 blocks are merged to form a final N//2XN/2 macroblock.
10. A method as claimed in any of the preceding claims wherein deinterlacing is
performed by using one field and interpolating the top and bottom pixels.
11.A system for extracting a scaled image from a video stream comprising the steps of:
- means connected to the user system for defining the scaling factor to be
applied;
- means for extracting the principal frame from a video bit stream;
- means for checking the frame for perceptual information and thereafter
selection of the I - frame comprising perceptual information;
- means for down sampling of the macro block of the selected perceptual frame
followed by inverse discrete cosine of the down sampled macro block; and
- means for deinterlacing of the down sampled macro block so as to get the final
scaled image.

12.A system as method as claimed in claim 11 wherein the principal frame is extracted by processing the headers of the bit stream frames.
13. A system as claimed in claim 11 wherein perceptual information check
comprises the following steps:
- first processing means processing means for determining the dc value at
various points and determining the absolute dc difference for a particular point;
- second processing means for comparing the absolute dc difference of a point
with the threshold value;
- means for selection of the frame as a perceptual frame when the absolute
difference is greater than the threshold value; and
- means for rejection of frame when the absolute dc difference is less than the
threshold value for all the points;

14. A system as claimed in claim 11 wherein the Inverse Discrete Cosine
Transformation is performed on a macro block of the image frame selected as
perceptual frame.
15. A system as claimed in claims 11 and 14 wherein the IDCT is performed on a
macro block is partial.
16. A system as claimed in claims 11 and 14 wherein the down sampling is done
on the basis of the sampling factor defined by the user.
17. A system as claimed in claim 11 wherein the macroblock of NxN pixels is
divided in four N/2xN/2 blocks followed by selection of N/4xN/4 low frequency
components from each N/2XN/2 block.
18. A system as claimed in claim 17 wherein Inverse Discrete Cosine
Transformation is performed on each of the N/2XN/2 block.

19.A system as claimed in claim 18 wherein all the Inverse Discrete Transformed N/2XN/2 blocks are merged to form a final N//2XN/2 macroblock.
20.A method for extracting a scaled image from a video stream as herein described substantially with reference to accompanying drawings.
21. A system for extracting a scaled image from a video stream as herein described substantially with reference to the accompanying drawings.

Documents

Application Documents

#	Name	Date
1	2305-del-2006-Form-18-(24-11-2008).pdf	2008-11-24
2	2305-del-2006-Correspondence Others-(24-11-2008).pdf	2008-11-24
3	2305-del-2006-gpa.pdf	2011-08-21
4	2305-del-2006-form-3.pdf	2011-08-21
5	2305-del-2006-form-1.pdf	2011-08-21
6	2305-del-2006-correspondence-others.pdf	2011-08-21
7	2305-del-2006-abstract.pdf	2011-08-21
8	2305-del- 2006- form-2.pdf	2011-08-21
9	2305-del- 2006- drawings.pdf	2011-08-21
10	2305-del- 2006- description (complete).pdf	2011-08-21
11	2305-del- 2006- claims.pdf	2011-08-21
12	Relevant Documents.pdf	2014-04-28
13	Form 13_Address for service.pdf	2014-04-28
14	Amended Form 1.pdf	2014-04-28
15	2305-DEL-2006_EXAMREPORT.pdf	2016-06-30