The invention relates with an improved coding efficiency is achieved by givingthe encoder the opportunity to change the field/frame-wise treatment ofindividual picture portions between the first precision-encoded data and thesecond precision-encoded data, with the second precision being higher than thefirst precision.
Coding scheme enabling precision-scalability
Description
The present; invention relates to picture or video cod.i.nq supporting quality-, precision- or SNR-scalab i.li ity .
A current project of the Joint Video Team (JVT) of the ISO/IEC Moving Pictures Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG) is the development of a sea 1 a b 1 e exte n s i on of t h e s t a t e -of-1 he-art. v i d e o c o d i n q standard H.264/MPEG4-AVC defined in ITU-T Roc. & I.SO/iEC 14 4 96-10 AVC, "Advanced Video Cod.i.nq for Generic Audiovisual Services," version 3, 200b. The current, working draft as described in J. Reichei, li. Schwarz and M. W.i.cn, eds. , "Scalable Video Coding - Joint Draff 4, "Joint Video Team, Doc. JVT-Q201, Nice, France, October 2005 and J. R e i c h e 1, H . Schwarz and M . W i en, e d s . , "J" o i n t S c a 1. a b 1 e V i d c o M o d e 1 J'S VM - 4 , " J o 1 n t. V. i d c o T e a m, Doc. J V T - Q 2 0 2 , N i c o, E ranee, 0 c t o b e r 2 0 0 5 , s u p p o r L s t. e m p o .r a 1 , s p a f i a 1. a n d S N! \ scalable coding of video sequences or any combination thereof.
H.264/MPEG4-AVC as described in ITU-T Ree. & ISO/IEC 14496-10 AVC, "Advanced Video Coding for Generic Audiovisual Services, " version 3, 2005, specifies a hybrid video codec in which macroblock prediction signals are either gone rated by motion-compensated prediction or intra-prediction and both predictions are foil Lowed by residual coding. H.264/MPEG4-AVC coding without the scalability extension is referred to as single-layer H. 2 64/MPEG4-AVC coding. Rate-distort i o n p e r f o r m a n c c e o m p a r a b 1 o L o s i n g 1 c -1 a y e r H.264/MPEG4-AVC means that, the same visual reproduction quality is typically achieved at. 10% bit-rato. Given the above, scalability is considered as a functional i ty for removal of parts of the bit-strcarn while achieving an E--D performance at any supported spatial, temporal or SNR
resolution that is comparable to single-layer II. 264 /MPKG4 -AVC coding at that particular resolution.
The basic design of the scalable video coding (SVC; can be c 1 assif ied as 1 ayered . v i deo codoc . J n each Layer, f he bas \.c concepts of motion-compensa t;ed p red 1 .cti.on and i n t ra prediction are employed as in H . 264/MPLG4-AVC. However, additional inter-layer prediction mechanisms have been integrated in order to exploit the redundancy between several spatial or SNR Layers. SNR scalability Ls basically acLiieved by residual quantization, while for spatial scalability, a combination of motion-compensated prediction and oversampled pyramid decomposition is employed. Trie-temporal scalability approach of ii . 2 64 /MPI1G4 -AVC is maintained.
In general, the coder structure depends on the scalability space that is required by an application. For LI lust ra Li on, E'ig . 5 sLiows a typical coder structure 900 with two spatial layers 902a, 902b. In each .1 aye r, a n 1 ndependen t hierarchical motion-compensated prediction structure 904a,b with layer-specific motion parameters 906a,b is employed. The redundancy between consecutive layers 902a,b is exp 1 oi ted by i n ter-1 a yer p red i c t: i on con cep t s 90 8 t ha t include prediction mechanisms for motion parameter.'; 906a, b as well as texture data 910a,b. A base representation 912a,b of the input pictures 914a,b of each layer 902a,b is obtained L)y transform coding 916a, b similar to that of H.264/MPEG4-AVC, the corresponding NAL units (NAL - Network Abstraction Layer) con tain motion in f orma t ion a nd tex L u ro d a t a; t h e N A L u n i t s o f t h e b a s e rep r e scntaLi. o n of t h e lowest layer, i.e. 912a, are compatible 'with single-layer H.264/MPEG4-AVC. The reconstruction quality of the base representations can be improved by an additional coding 918a,b of so-called progressive refinement slices; the corresponding NAL units can be arbitrarily truncated In order to support fine granular quality scalabl LLty (L'GS) or f 1 e x 1 b 1 e b 11 - r a t e a d a p t a L .1 o n .
The resulting bit-streams output by the base layer coding 916a,b and the progressive SNR refinement texture coding 918a, b of the respective .layers 902a, b, respectively, are multiplexed by a multiplexer 920 in order to result in the scalable bit-stream 922. This bit-stream 922 Is scalable in time, space and SNR quality.
Summarizing, in accordance with the above scalable extension of the Video Coding Standard 11 .264 /MPFG4 -AVC, the temporal scalability is provided by using a hierarchical prediction structure. For this hierarchical prediction structure, the one of single-layer 11.264/MPFG4-AVC standards may be used without any changes. For spatial and SNR scalability, additional tools have to be added to the single-layer H.264/MPEG4.AVC. Ail three scalability types can be combined in order to generate a bit-stream thai, supports a large degree on combined scalabi lity.
For SNR scalability, coarse-grain scalability (CCS) .and fine-granular scalability (FGS) are distinguished. With CGS, only selected SNR scalability layers are supported and the coding efficiency is optimized for coarse rate graduations as factor 1.5-2 from one layer to the next. FGS enables the truncation of NAL units at any arbitrary and eventually byte-aligned point. NAL units represent bit packets, which are serially aligned in order to represent the scalable bit-stream 922 output by multiplexer 920.
In order to support fine-granular SNR scalabi 1 i.ty, sc-c a 1 led p r o g r e s s i v e refinement (PR) slice s haver b e o n introduced. Progressive .refinement slices contain refinement information for refining the reconstruction quality available for that sli.ee from the base layer bit-stream 912a, b, respectively. Even more precise, each NAT, unit for a PR slice represents a refinement signal that corresponds to a bisection of a quantization step size (QP increase of 6). These signals are represented in a way that
only a single inverse transform has to bo per formod for each transform block at. the decoder s.i.de. In other words, the refinement signal represented by a PR NAI, unit refines the transformation coefficients of transform blocks into which a current picture of the video has been separated. At the decoder side, this refinement signal may be used to refine the transformation coefficients within the base layer bit-stream before performing the inverse transform in order to reconstruct the texture of prediction residua! used for reconstructing the; actual picture by use of. a spatial and/or temporal predict i.on, such as by means of motion compensation.
The progressive refinement NAL units can be truncated at any arbitrary point, so that the quality of the SNR base layer can be improved in a fine granular way. Therefore, the coding order of transform coefficient level.s has been modi, f ied . Ins tead of s eanning t he t r a n s f o rm c;oe f f.i. c 1 cn t s macroblock-by-macroblock, as it is done in (normaJ.) slices, the transform coefficient blocks are scanned in separate paths and in each path, only a few coding symbols for a transform coef f icient b.1 ock aro coded . W i f h t he cxcept i on of the modified coding order, the CAB AC entropy coding as-specified in H.264/MPEG4-AVC is re-used.
An improvement of the coder structure shown in Fig. 5 has been described in M. Winken, H. Schwarz, D. Marpe, and T. Wiegand, "Adaptive motion refinement, for FGS slices," Joint Video Team, Doc. JVT-Q031, Nice, France, October 2005. In particular, as described there, a concept, for fine-granular SNR scalable coding of video sequences with an adaptive refinement of motion/prediction information is added to the coding structure of Fig. 5. The approach of adaptive motion inf ormat ion ref inemen t f or SNR sca 1.ab 1 e v 1 deo coci i ng enables the video encoder of Fig. 5 the choice to select a, in rate-distortion (RD) sense, better tradeoff between bit. rate for coding of residual and motion data. In particular, as indicated by the dashed lines 924a and 924b in Fig. b,
the refinement coding blocks 918a and 9.18b additionally decide, for each macroblock. in a progressive refinement slice which corresponds to a base layer slice that supports motion-compensated prcdiction (so-ca 11 ed P- and B-s 1. Lces } , which of the two following possible coding modes is to bo used. In particular, according to a first, mode, coding block 918a,b uses the same motion information as the SNR base layer and thus transmits only a refinement of the residual data. This mode is equal to the foregoing description of the functionality of the coding structure of Fig. 5. However, in the alternative coding mode, coding block 918a,b transmits new motion information together with a new residual within the refinement slice Information. Both the new motion and residual data can be predicted from the SNR subordinate layer to achieve a bettor RD-performance. The possible motion modes are the same as supported by the video coding standard H.264/MPEG i-AVC, which means that by subdivision of the macroblocks into smaller blocks for motion-compensated prediction up to 16 motion vectors for P-slices and up to 32 motion vectors for B - s .1. ices can b e s i g n a 1 led.
The decision between the two coding modes with respect, to the motion information performed by blocks 918a,b is made using a Lagrangian approach where a l.agrangian cost functional J = D -t- λ.R is minimized for a given A. Here, !) stands for the distortion between original and reconstructed (decoded) signal and R gives the bit rate needed for coding of the macroblock. If the cost. for refining only the residual data is higher than the cost for one of the possible motion refinement modes, it is in rate-distortion sense obviously better to transmit a new sot of mot ion informa t ion f or thi s ma c; rob 1 oc; k . Con s oq ue n t 1 y, u s 1 ng adaptive motion information refinement, if is possible to achieve a higher picture qualify at the same bit rate.
The above-explained scalable extensions of the video coding standard H.264/MPEG 4-AVC work well with progressive source
material, i.e. videos in which the pictures may be effectively handled picture- or frame-wise, i.e. irrespective of their composition from a top and a bottom field. However, it would be desirable t.o have a coding structure that enables precision-scalabi11ty with a better RD-performance for interlaced source material, i.e. videos in which each frame is composed of two infer leaved fields with the fields being individually handled like frames (field-coded) or with macroblock pair-wise deciding as to whether the respective macroblock portion is divided up into two macroblocks in accordance with the membership of to the top or bottom field or the membership t.o the top or bottom half of the macroblock pair area within the frame.
Thus, it is an object of the present application to provide a coding scheme providing precision scalability allowing for an improved coding efficiency especially .in interlaced video ma te ri a1.
This object is achieved by a decoder according to claim 1, and encoder according to claim 13, a method according t.o claim 22 or 23, and a precision-scalable bit-stream 21.
The basic idea underlying the present invention is that, an improved coding efficiency may be achieved by giving the encoder the opportunity t.o change the f i e 1 d/ f rame-w i so I; r e a t me n t o f i n d 1 v i d u a 1 p I c I. u r e p o r L i o n s b e L w e e n L h e. i" i r s L precision-encoded da t;a and the second prec i s 1 on-encoded data, with the second precision being higher than the first precision.
In accordance with a preferred embodiment of the present invention, a concept for fine-granular SNR scalable coding of interlaced frames is achieved by making and coding the frame/field decision in a progressive refinement slice-independently of the frame/field decision of the corresponding base quality slice. Compared thereto, the above-described scalable extensions of the II. 2 6-1 /MPI1G 4-AVC
standard not supporting motion .information refinement., merely code a refinement of the transform coefficients. The motion and prediction informal";], on is copied from the corresponding base layer slJ.ce. furthermore, the tools for supporting SNR and spatial scalability have only been designed for progressive source materia.].. Spec] ail. Loo.1 s for increasing the coding efficiency for interlaced source material have not been incorporated. According to the aforementioned scalable extension .including motion information refinement, the FGS coding scheme allows the-adaptive refinement of motion and predict, ion information for improving the coding efficiency of the fine-granular SNR scalable coding e s p e c i a 1 I y f o r 1 a r g o b i t -rat. o intervals. However, also the latter F'GS coding scheme has only been designed for progressive source material.
The below-explained E'GS coding scheme embodiment extends the above-described motion information refinement scalable extension in a way that it also supports a revision of the frame/field decision of the co-located macrobiock pair and the base quality slice, thereby enabling achieving a precision-scalable data stream with an improved R/D ratio.
In the following, preferred embodiments of the present application are described with reference to the tigs, in particular, it is shown .in
Fig. 1 a block diagram of a video encoder according to an embodiment of the present invention;
Fig. 2 a schematic illustrating the subdivision of a
picture into macrobiock pairs as well as a
ma c r o b 1 o c k s c a n o f a p r o g r e s s ] v e r c f i n e m o n t s 1 1 c o
in case of a slice of a coded frame with
macroblock-adaptive frame/field decision be.i ng
activated;
Fig. 3a a schematic block diagram .1.1 lus I; rating the mode
of operation of the encoder of." Fig. 1 with
respect to the creation of the base layer data
stream;
Fig. 3b a schematic block diagram illustrating the mode of operation of the encoder of Fig. 1 with respect to the creation of the first, enhancement layer;
Fig. 4 a flow chart showing the steps performed at
decoder side in accordance with an embodiment of: the present invention; and
Fig. 5 a conventional coder structure f.'o.r scalable video coding.
The present invention is described in the f o I lowing by means of an embodiment with a similar structure to the conventional coder structure of Fig. 5. However, in order to more clearly indicate the improvements in accordance with the present invention, the video encoder of F.:i g . 1 representing an embodiment of the present invention is firstly described a s o p e r a t; i n g i n accord a n c e w. i. t h t h o sca1ab1e extensions of the 11. 2 64/MPFG4 -AVC s tanda rd ha v i ng been presented in the introductory portion of this specification with respect to Fig. 5. Thereafter, the actual operation of the encoder Fig. .1 is i 1 lust, rated by emphasizing the differences t.o the mode of operation in accordance with the video structure of Fig. b. As w i .1 1 turn out from this discussion, the differences reside in the refinement coding means.
The video coder of Fig. 1 operating as defined in the above-mentioned Joint Drafts supports two spatial Layers. To this end, the encoder of Fig. 1, which is general ly indicated by 100, comprises two Layer portions or layers 102a and 102b, among which layer 102b is dedicated for
generating that part of the desired sea J able bit-stream concerning a coarser spatial resolution, while the other layer 102a i.s dedicated for . suppl emen ting the bit-stream output by layer 102b with information concerning a higher resolution representation of an .input video signal 10-1 . Therefore, the video signal 104 to be encoded by encoder 100 is directly input into layer 102a, whereas encoder 100 comprises a spatial decimeter 106 for spatially decimating the video signal 104 be tore inputting the resulting spa t i a 11 y dec ima I: e d v 1 d e o s i g n a 1 108 i n t o 1 a y e r 1 0 2 b .
1'he decima t ion per f ormed i n s pa t ia 1 dec 1 me t.e r 1 0 6 comprises, for example, decimating the number of pixels tor each picture 104a of the original video signal 104 by a factor of 4 by means of discarding every second pixel in column and row directions.
The low-resolution layer 102b comprises a mot.i on-compensated prediction block 110b, a base Layer coding block 112b and a refinement coding block 114b. The prediction block 110b performs a motion-componsafed prediction on pictures 108a of the decimated video signal 108 in order to predict pictures 108a of the decimated video signal 108 from other reference pictures 108a of the decimated video signal 108. For example, for a specific picture 108a, the prediction block 110b generates motion information that indicates as to how this picture may be predicted from other pictures of the video signal 108, i.e. from reference pictures, in particular, to this end, the motion information may comprise pairs of motion vectors and associated ref erence pic; I;ure ind 1 c;es, eac;h pa 1 r 1 nd i cati ng , for example, how a specific part or macrobiook of the current picture is predicted from an index reference pic t ure by d1s p1a ci n g t h e re s po c ti ve re f e re n co p i c fure b y the respective motion vector. Kach macrobiook may be assigned one or more pairs of motion vectors and reference picture indices. Moreover, some of the macroblocks of a picture may be intra-predicted, i.e. predicted by use of
the information of the current picture. I.n particular, Lhc prediction block 110b may perform a hierarchical mot. ion-compensator prediction on the decimated video .signal 108 .
The prediction block 110b outputs the motion .information 116b as well as the prediction residuals of the video t e x t u r e informa t; i o n 118 b r e p r e s e nL i. n q L h c d i f 1 e r c n c c s between the predi.ctors and t;he act:ua.1 dec i ma ted p i cLu res 108a. In particular, the determination of the motion information and the texture information 116b and 118b and prediction block 110b is performed such that the resu.l. t.inq encoding of this information by means of the subsequent base layer coding 110b results .in a base-represenfa Lion bit-stream with, preferably, op Li mum rate-distortion performance.
As already described above, the base layer coding block 112b receives the first motion information 116b and the texture information 118b from block 110b and encodes the information to a base-representation bit-stream 120b. The encoding performed by block 11.2b comprises a transformation and a quantization of the texture information 118b. in particular, the quantization used by block 112b is relatively coarse. Thus, in order to enable qua! ity-or precision-up scaling of Lhe bit-stream 120b, the refinement coding block 114b supports the bit.-stream 120b with additional bit-streams for various refinement. layers contain 1 ng inf ormat; 1 on f or ref 1 ni.ng Lhe c;oa rscly quan L i /.od transform coefficients representing the texture information in bit-stream 120b. As discussed later i.n more detail., refinement coding block 114b - for example, i.n co-ope.raLi.on with the prediction block 1.10b is also able to decide t h a t a s pe c i f i c r e f 1 neme n t. .1 a y e r b i t. - s t: r e a m 1.2 2 b s h o u 1 d bo accompanied by refined motion information 116b, a functionality that has also been described in the above-mentioned scalable extension. However, this functionality is, according to the embodiment of the present; invention, related to the functionality of newly coding the
frame/field decision, and therefore these functionalities shall collectively be described hereinafter. The refinement of the residual texture information relative to the base representa tion 12 0b of I:he f o rmer.1.y-ou tpu t lower ref i nemen t layer bit-stream 122b comprises, for example, the encod.i rig of the current quantization error of the transform coef f icients thereby represen 11 nq the Lox t u re i. n I o rma L i on 118b with a finer quantization prediction.
Both bit-streams 120b and 122b are multip.lo.xcd by a multiplexer 12 4 comprised by encoder 1.0 0 in order to insert bot h b i t - s t r e a ms i. n t o t h o f i n a ]. s ca 1 a b I e b i f - s I- r o a m 1 2 6 representing the output of encoder 100.
Layer 102a substantially operates the same as layer 102b. Accordingly, layer 102a comprises a motion-compensation prediction block 1.10a, a base layer coding block 112a and a refinement coding block Ilia. In conformity with layer 102b, the prediction block 110a receives the video signal 104 and performs a motion-compensated prediction thereon in order to obtain motion information 11.6a and texture information 118a. The output motion and texture information 116a and 118a are received by coding block 11.2a, which encodes this information to obtain the base representation bit-stream 1.20a. The refinement coding block 114a codes refinements of the quantization error manifesting itself on the base representation 1.20a by comparing a transformation coefficient of bit-stream 1.20a and the actual transformation coefficient resulting from the original texture information 11.8a and, according!';/, outputs refinement-layer b.i t-s I reams 1 22a f o r va r i ous re f" i nemen f layers.
The only difference between layers 102a and 102b is that layer 102a is inter-layer predicted. That is, the prediction block 110a uses information derivable from layer 102b, such as residual texture information, motion information or a reconstructed video signal, as derived
from one or more of the bit-streams 120b and 122b in order to pre-predict the high-resolution pictures 104a of the video signal 104, thereafter performing the mo fieri ~ compensated prediction on the pre-prediction residuals, as mentioned above with respect to prediction block 110b relative to the decimated video signal 108. Alternatively, the prediction block 110a uses the information derivable from layer 102b for predicting the motion compensated residual 118a. In this case, for intra blocks, picture-content 104a may be predicted by means of the reconstructed base layer picture. For inter blocks 104a, the motion vector(s) 116a output from 110a may be predicted from trie corresponding reconstructed base layer motion vector. Moreover, after the motion compensated residual 118a of layer 102a has been determined, same may be predicted from the reconstructed base layer residual for the corresponding picture which res i.dua 1 i. s then f ur the r prosecu f ed i n block s 112a, 114a.
So far, the description of the mode of operation of the encoder of Fig. 1 concentrated on the treatment of the re s i dual i n f orma t ion by re f1n e me n f c od i n q mea n s 114 a,b. ! n particular, the residual information or texture information output by blocks 110a, b and encoded with a base layer precision in coding means 112a,b Is refined in the refinement coding means 114a,b. However, refinement coding means 114a, b also enables a refinement or change of the motion information from layer to the next as we 1 1 as a change in the frame/field decision made by blocks 118a,b.
The functionality of the encoder of Fig. 1 as described up to here fits well to cases of progressive video source material ' or in cases where the base layer coding means 112a, b uses frame_MBS only flag being equal to one, which me a n s t h a t t h e p i c t u r e s e q u e n c e r e p r e s e nl. i n g t. h e v i d e o consists of coded frames only, so that a decomposition of the frames into fields is neglected. However, the SNR and spatial scalability provided by the encoder of Fig. 1 in
accordance with Lho fund, i.ona J i t.y described so \ar is riot. idea 1 £or in ter1aced source material. Due to this rea son, the encoder of Fig. 1 operating in accordance with an embodiment of the present .invention not only enables refinement of the texture information but a J. so the motion information and, primarily, the frame/field decision, thereby forming a kind of extension to .Interlaced sources.
However, before describing the different behavior of the encoder of Fig. 1, reference Is made to the 11. 2 64 /MPKG4 -AVC standard in which several interlaced tools have been incorporated. In the first tool, a frame can either be coded as a coded frame or as two coded fields. This is referred to as picture-adaptive frame field coding. In other words, a frame or video may be considered to contain two interleaved fields, a top and a bottom field. The top field contains even-numbered rows 0, 2, . . . 11/7.-1, with !i being the number of rows of the frame, wherein the bottom field contains the odd-numbered rows starting with the second line of the frame. If two fields of a frame arc captured at different time instances, the frame may be referred to as an interlaced frame or if may otherwise be referred to as a progressive frame. The coding representation in H.264/MPEG4-AVC is primarily agnostic with respect to this video characteristic, i.e. the underlying interlaced or progressive timing of the original captured pictures. instead, .Its coding specifies a representation primary based on geometric concepts, rathe-!' than being based on timing. The above-mentioned concept of picture-adaptive frame field coding is also extended to macroblock adaptive frame field coding. When a frame Is coded as a single frame and the flag mb_adaptive_frame field flag, which is transmitted in the sequence parameter set is equal to 1, the scanning of macroblocks inside a slice is modified, as depicted in Fig. 2. Fig. 2 shows an exemplary portion of a picture 200. The picture is subdivided into macroblocks 202. Moreover, with a macroblock-adaptive frame/field coding being activated,
each pair of vert.ical.Iy adjacent; macrob Locks 7.02 is grouped into a macroblock pair 204 . As will become clearer from the following discussion, the subdivision of the picture 200 into macroblocks 202 rather serves as a provision of." a quantum unity in which the encoder may decide about coding parameters that have to be adapted to the video content in the respective picture area in order to result in high coding efficiency. The macroblock pairs 201, in turn, subdivide the picture 200 spatially into a rectangular array of macroblock pairs 204. The two macroblocks 202a and 202b of one macroblock pair 204 spatially occupy either substantially the whole macroblock pair portion of the picture 200 with a vertical .resolution being half the vertical resolution of picture 200, or divide the area of. the macroblock pair 204 spatially into an upper half and a lower half. In any case, the macroblock containing the first, third, ... lines or occupying the upper half is called the top macroblock 2 02a, whereas the other is ca! led the bottom macroblock. In other words, two such vortical adjacent macroblocks are referred to as a macroblock pair which may also be arranged in a rectangular array as is shown in Fig. 2. For each mac.rob.lock pair, a syntax element mb_f ie 1 d__decod 1 ng__f 1 ag .is transmi 11ed or i n f;e rred . When mb_f ield_decoding__f lag is equal to 0, the macroblock pair is coded as a frame macroblock pair with the fop macroblock representing the top half of the macroblock pair and the bottom macroblock representing the bottom ha'l f of the macroblock pair in the geometrical sense. The motion-compensation prediction and transform coding for both the top and the bottom macroblock, is app.l led as for macroblocks of frames with mb adaptive frame field coding equal to 0 indicating that macroblock adaptive frame field coding is deactivated and merely frame macroblocks exist. When mb_field_decoding__J:lag is equal to I, the macroblock pair represents a field macroblock pair with a top macroblock representing the top field linos of the macroblock pair and the bottom macroblock representing the bottom field lines of the macroblock pair. Thus, in this
case, the top and the bottom macrobJock substantially cover the same area of the picture, name.l y the macrobJock pair area. However, in these macrobiocks, the vertical resolution is twice the horizontal resolution. In the case of the latter field macrobLock pairs, the motion compensation prediction and the transform coding .is performed on a field basis, 'the coding of the picture-content within the base and refinement layers is performed in slices, i.e. groups of macrobiocks or macroblock pairs. One picture or frame may be composed of one or more s.l ices. In Fig. 2, the macroblock pairs are assumed to belong to the same slice, and the arrows in Fig. 2 indicate an order-in which the macrobiocks are coded in the respective-layers. As can be seen, the macrobiocks are scanned pair-wise, with the top macroblock first followed by the respective bottom macroblock where!naffer the next macroblock pair is visited.
Macrobiocks of coded fields or macrobiocks with mb_field_decoding flag equal to 1 of coded frames are referred to as field macrobiocks. Since each transform block of a field macroblock represents an image area with a vertical r e s o 1 u t i. o n t h a t i s e q u a 1 t o t w i c o t h c h o r i /. o n t a i resolution, it is likely that the distribution of non-zero transform coefficient levels is sh.i. f. fed towards nor izont. a ; low frequencies and for a rate-distortion optimized coding, the scanning of transform coefficients inside a transform block is modified for field macrobiocks relative to frame macrobiocks.
The following description of the encoder of Fig. 1 focuses on the refinement of the motion information as we 1 I as the¬re newal of the frame/field decision performed for the respective macroblock pairs. However, before describing the refinement renewal of this data, reference is made to Fig. 3a showing schematically the stops performed by blocks 110a,b and 112a,b to obtain the base layer bit-stream 912a,b. Again, as a starting point, Fig. 3a shows a current
picture 200 to be coded, the picture 200 being subd.i vided into macroblocks 202, the mac.rob Locks 202 being grouped into macroblock pairs 204, so that the mac rob!ock pairs 204 spatially subdivide the picture 200 Into a rectangular array. In encoding the picture 200, block 110a,b decides, for each macroblock pair 204, as to whether the macroblocks of this macroblock pair shall be macroblocks of coded fields or macroblocks of coded frames. .11 n other words, block 904a,b decides for each macroblock pair as to whether same shall, be coded in the field or frame mode, this decision being indicated in Fig. 3a at 206. The macroblock pair-wise performance of the decision 206 is indicated by exemplarily highlighting one of the macroblock pairs 2.04 by-encircling same with a circle 2.08. The consequence of the decision 2 06 is indicated at 210a and b. As can be seen, in case of frame-coded macroblocks 202a and 202b constituting a macroblock pair 204, same spatially subdivide the picture area occupied by the macroblock pair 2.04 into an upper half and a lower half. Therefore, both macroblock pairs 202a and 202b comprise the picture information contained in both odd-numbered and even-numbered lines of the pi.ct.ure, the odd-numbered lines being indicated by white rectangles, whereas the even-numbered lines are hatched. By contrast, in case of field mode, the top macroblock 2.02 a merely comprises the picture Information within the macroblock pair area as contained in the odd-numbered lines, i.e. the top field, whereas the bottom macroblock contains the picture information within the macroblock pair area contained in the even-numbered lines. This becomes clear by comparing 210a and 210b. The picture resolution in the vertical direction is reduced by a factor of 2 in the case of field mode. The frame/field mode decision 206 made by block 104a, b is somehow reflected in the base layer bit-stream 12 0a,b such that, at the decoder side, the decisions 2.0 6 may be extracted from the scalable bit-si. ream 126 and, especially, from the base layer data-stream in the scalable bit-stream 12 6, as it is indicated in Fig. 3a by arrow 212 pointing from decision 206 to a block 214 contained in the
base layer data stream 216. As a precautionary measure only, It is noted that the frame/field mode decisions do not necessarily need to be arranged or encoded i.nfo a continuous block within the base layer data stream 23.6. The decision with respect to the respective macroblock pairs 2 04 may be distributed over the base .layer data stream 216 in a parsable way. For more do Lai .Is, reference is made 10 the II. 264 /M H E G - A VC s t a n d a r d .
However, the f rame/f :i. e.l d mode decisions 206 arc not the only decisions to be made;, by blocks 1.10a, b. Rather, as indicted by 218, blocks 110a,b also determine the motion parameters for each macroblock. These motion parameters define, for example, at. which spat. Lai. resolution motion vectors are determined for a respective macroblock. As ;.L is shown in Fig. 3a at 220a for example, the Lop macroblock 202a has been further subdivided into four partitions 222, wherein for each partition 222 a motion vector 224 is defined. Compared thereto, the bottom macroblock 202b is left as one partition, so that, merely one motion vector 124 has been determined for this macroblock. Of course, Live decision 218 with respect to the motion parameters is, in the rate/distortion optimization sense, not Independent of the frame/field decision 206. This is indicated by 22.0b indicating an exemplary partitioning for the mac rob .locks 2 02 a and 2 02b in case of field-coded macrobiocks, whereas the earlier described case of 2.20a shall reflect the case of frame-coded macrobiocks. Although the partitioning is exemplarily shown to be the same, it. is clear that the partitioning may be different depending on the framo/fieid decision 206. A further motion parameter may define the number of reference pictures used for mot.ion-compcnsatod.ly predicting the respective macroblock. This decision may be made on a partition basis, macroblock basis or picture basis as well as a slice basis. However, for simplifying Fig. 3 a, just one motion vector is shown for each parti, lion 222. Beside this, t h e m o I: i o n p a r a m e t e r s 218 of c o u r s e define the motion vectors themselves, such as the direction
and length thereof. The motion vectors define the displacement of the reconstructed reference picture having to be performed before taking the picture content of the reconstructed reference picture as a prediction for the picture information contained in macroblock 202a, b. .In dete rmining 2 2 6 the res:i dua 1 or predi ct i on o r ror, t he p i c t u r e con t e n t t a k c n f r o m t h e; r c c o n s t r u c; t o ci r e f e r o n c e picture disp 1 aceci as cief ineci by the; motion vectors 22A i s of course different when considering field-coded macroblocks and frame-coded macroblocks. In case of frame-coded macroblocks, the picture information used out of the displaced and reconstructed reference picture represent a continuous spatial sub-area. However, in case of a ficj id-coded macroblock, the picture information used out of. f he-displaced and reconstructed reference picture relates to an area twice as high. The residual thus obtained for a specific partition 2 22 is indicated at 22 8 for a frame-coded macroblock and at 222b for a field-coded macroblock. The residual samples contained in this partition 228a, b arc not directly coded into the base layer bit-stream. Rather, a transformation, such as a OCT or some other spectra! decomposition, is performed on the residua.i samples in order to obtain a transformation coefficient matrix for representing the residual information contained in 228a,b. The transformation 230 may be performed on the whole partition or macroblock 202a,b. However, the transformation 230 may also be; performed on sub-portions of the: macroblock 202a,b or the partition 228a,b, as exemplar! ly indicated by dashed lines 232 in partition 228a. Accordingly, one or more transformation coefficient matrices 234 may be obtained from one macroblock or partition.
The; motion parameters 2.18 as we I..! as trie transformation coefficients in matrices 234 - the latter in relatively coarsely quantized form as already mentioned above; - arc incorporated by base layer codlnq means 1.12a, b into the; base layer data stream 2.16 or 120a, b, as shown by arrows
236 and 238, thereby obtaining motion informaLion 24 0 and residual information 242 in base layer data stream 12.0a,b.
A Lagrangian approach may be used for dctcrmi.n i nq the frame/field mode decisions and trie motion parameters / i B such that the rate/distortion ratio is somehow optimized. Although the decisions 206 and 218 may in the rate/distortion sense be optimal for the quality associated with the base layer data stream, different decisions 206 and 218 may be optimal when considering a higher quality. T h i s c o n s i d e r a t i o n r e s u .1 t s i n t h e m o d e o f o p e r a Lion of" encoder of Fig. 1 in accordance with an embodiment of the present application, according to which the frame/fie1d mode decision 206 does not necessarily have to be maintained by the encoder. Rather, encoder and decoder are enabled to change the frame/fie Id mode decision with respect to individual macrobiock pairs in the refinement layers. In accordance with the embodiment of Fig. 1, a change of the frame/field mode decision is always accompanied by a renewal of the motion parameters and the residual transform coefficients, too. However, as will be described afterwards, this does not necessarily have to be the case.
Fig. 3b schematically shows the mode of operation of the refinement coding means 114a,b in accordance with an embodiment of the present invention. Fig. 3b focuses on the-refinement of one exemplary macrobiock pair 204, which is exemplarily composed of two frame-coded mac rob.l ocks 202a and 202b, with a top macrobiock 202a being partitioned into four partitions 222, whereas the bottom macrobiock 202b is composed of merely one partition. The field/frame mode decision and the motion parameters thus defined for the representative macrobiock pair 204 correspond to the ones shown at 220a in Fig. 3a. As has also already been described w i t h r e s p e c t t o F i q . 3 a , I: h. c r e s i .dual . i n f' o r rn a t Ion with respect to the macrobiock pair 2.04 Is transmitted by use of transform coefficients arranged in a transform
coefficient matrix 234. The transform coefficients in the transform coefficient matrix 234 correspond to different frequencies in horizontal direction 244 and vertical direction 24 6. In Fig. 3b, the upper Jct'L transform coefficient, for example, corresponds to the DC component, this transform coefficient being indicated by 24 8a.
Now, considering the refinement or qua 1 i fy or precision enhancement for the mac rob lock pair 204, refinement cod i.rig means 114a,b makes 250 a decision as to whether to keep or to change the frame/field mode decision relative to the decision made by block 110a, b with respect to the base layer.
Firstly, the case of keeping the frame/field mode decision is considered. In this case, the mac rob lock pair 204 is still treated as frame-coded In the refinement layer. However, refinement coding means 114a,b considers whether it is in rate-distortion sense better to keep the motion information, i.e. to adopt the motion information from the subordinate layer, i.e. the base layer, and just refine the residual information, or whether: it. is better to change the motion information and residua.! information compared to the base layer. This decision is indicated by 252 in tig. 3b. If refinement coding means 114a,b decides, for a specific macroblock pair 204, to keep both the frame/field mode decision and the motion information, refinement coding means 114a,b incorporates the results of the decisions 250 a n d 2 52 i n t o t h e f .1 r s L e n h a n c e m e n t: lay e r d a L a s L r o a rn 12.2a, b. The result of decision 2 50 is incorporated into data stream 122a, b in form of mode change indicators 256, as indicated by the dashed line 258. Accordingly, the result of decision 252 is incorporated into data stream 122a, b as a motion precision-enhancement on/off indicate;! 260, as indicted by a dashed line 262. Moreover, refinement c o d i n g me a n s .114a, b i. n c; o r p o r a t a s i n t o t: h e d a t a ?; f r o a m 122a,b residual precision enhancement in format.ion 266, this incorporation being indicated with dashed arrow 263. In the
current: preferred embodiment., l.ho residual precision enhancement information 266 incorporated at. 263 sha I I represent: r e s :L d u a 1 L r a n s f' o r m c o c f' f i c i c n t. 1 e v o i .s representing a residual of the respective transform coefficient levels as defined so far by the subordinate layer, i.e. subordinate refinement, .layer or base .layer, relative to the real transform coefficients at. a reduced quantization step size, such as divided by two relative to the subordina te laye r . 1-iowever, as i ndica Led below, a further flag/indicator within stream 12 2a,b may be used to indicate that, for a specific maeroblock, the residual precision enhancement information 266 is to be .interpreted at decoder side as anew transform coefficient. levels r e p r e s e n t i n g the transform coefficient, levels i n d e p o n d o n t. of the current transform coefficient levels as derivable up t o t h e s u b o r d i n a t e .1. a y e r .'
The refinement coding means 114a, b may decide not. to keep the motion i.n f orma I: j on f o r a spec.i. f i c ma c roID 1 oc k b u t: t: o refine same relative to the base layer. In this case, (.he-refinement coding means 114a,b inciicat.es the result, of this alternative decision 252 by a respective indicator 269 In the first enhancement layer data stream 122a,b. Moreover, refinement coding means 114a,b incorporates info the data stream 122a,b motion precision enhancement information 264 as well as residual precis.!on enhancement, .information 2 66, as it is indicated by dashed arrows 2 68 and 270. The motion precision enhancement information 2 64 and/or the residua.; precision enhancement information 266 may either represent completely new motion in formation/rosidual information or refinement information for refining the motion ..information and residual information of the subordinate .layer, respectively, i.e. the base layer in the case ii lust rated in Fig. 3b. Completely new enhancement: information -264 or 2 66 shall indicate -- as a .1 ready indicated above with respect to the residual data - enhancement information that completely replaces the respective enhancement, information of the subordinate enhancement layer, ,i . e. the base layer.
Contrary thereto, enhancement information 2.64 and 266 is for refining the motion/res idua I Information of t.ho subordinate layer, the motion/residual information of the current refinement layer, i.e. the first enhancement layer in case of Fig. 3b, being derivable merely by combining both the current enhancement .information 26/l, 266 as we 1 ! a s t he mo t; i. o n / r c s i. d u a 1. i n f o r m a t i o n of f h o s u b o r ci i n a f o 1 a y e r, s u c h a s b y a ci ci .1 n g cor r e s p o n ci i n g I r a n s for m coefficient levels or motion vector component levels of the . two consecutive refinement levels.
To illustrate the effect of changing the motion information in the f irst enhanc;emcnl. .layer, t.he effect of koep i ng f he frame/field mode decision but changing the motion information is indicated in Fig. 3b at 272. As shown there, the motion information associated with macrobiock pair 204 in the first enhancement layer differs from the motion information associated with that macrobiock pair 2CM in the base layer in that two reference pictures are used for predicting the picture;- content within the macrobiock pair. Accordingly, each partition 222 is associated with two motion vectors 224a and 224b. Moreover, the motion information of the first refinement layer changes the partitioning of the bottom macrobiock 202b in that same is partitioned into four partitions instead of forming merely one partition, as it is the case in the base layer. The motion informal;.ion of the first refinement layer, i.e. the reference picture numbers, the motion vectors 224a and 224b as well as the partitionings of macrob.locks 2 02a and 2 02 b may be either coded completely new in the first enhancement layer data stream 12 2a, b or with faking the motion information of the base; layer as a predictor. For; example;, if the motion vectors 224a correspond to the same reference picture, merely the offset of the motion vectors 224a relative to the motion vectors 224 of the base layer may be coded into motion-precision enhancement infc5.rmat.ion 2 64. By assuming a temporarily linear motion, the motion vectors 224 may also serve as the basis for a prediction of the new
motion vectors 224b relating to a different. reference picture. Beside this, the single motion vector 224 of the single partition of the bottom macroblock 202b may serve as a predictor for the motion vectors of each partition o f:: the bottom macroblock 202b in the first enhancement Layer.
Similarly, the transform coefficient Levels for the transform coefficients of the transform coefficient matrix 234 transmitted in the first enhancement layer data stream 122a,b may either represent merely residuals or offsets relative to the transform coefficient levels of the base layer quantized with a finer quantization step size, or represent a transform coefficient of the transform coefficient matrix 234 completely anew without, use of the transform coefficients of the base layer as a prediction.
Up to now, the case has been described in which t he-refinement; coding means 114a,b decides to maintain the frame/field mode decision with respect to macroblock pair 204. However, if the result of decision 2b() is to change the frame/field mode in the first enhancement Layer, this is indicated by a respective mode change indicator 2b6, and new motion information along with new residua.] i.n.f orma L i on is inserted in form of motion precision enhancement information 264 and residua .1 precision enhancement info rma t ion 266 1 n t o t h e f i r s L e n h a n c e m e n f Lay e r d a t. a stream 122a,b, as it is .indicated by dashed arrows 274 and 276. In particular, according to the example of L'Lq. 3b, the motion information of macroblock pair 204 is changed from the base .layer to the first, enhancement. Layer such that new motion vectors 224 for the partitions 222 of the top macroblock 202a are defined, and the bottom macroblock 202b is partitioned into four partitions 222 with one motion vector 224 for each partition 222. As is indicated at 278, the raacroblocks 202a and 202b are now field-coded with the top macroblock 202a, for example, merely 1 nc.ludinq odd-numbered lines. The residua.I information is coded by means of transform coefficient 'Levels of transform
coefficients in respective transform coefficient: matrixes 234 with the levels being coded without using the transform coefficient levels of the matrices 2 34 of the base iayor as a prediction.
However, although in accordance with the present embodiment the motion and residual information is coded completely new in the case of not keeping the frame/field mode decision, a 11e r n a t i ve 1 y, t he mo t; i o n i n f o rma t; 1 o n a n d r o s L d u a 1 information of the base Layer defined for different, frame/field modes may be used as a predictor. Consider, for example, the transform coefficients. The vertical resolution of the residual sampi.es in the base iayor is twice the vertical resolution of the residual samples of the first enhancement layer. Due to this, the highest-frequency component in the vertical direction 246 for which matrix 234 of the base layer comprises transform coefficients is twice the highest-frequency component in the vertical direction 2.4 6 for which matrix 234 of the first enhancement layer comprises transform coefficients. Thus, at least a part of the matrices 234 of the base layer may be used as a predictor for the transform coefficients of the matrices 2 34 of the first enhancement layer. To be more precise, the transform coefficient level of t he-transform coefficient 24 8a representing the DC component and transmitted 276 within the residual precision enhancement, information 266 in the first enhancement, layer-data stream 122a, b may represent an offset. .roJ.at.ive to the transform coefficient level for the corresponding t. ran s. form coefficient 2 4 8a transmitted in the base layer data stream 12 0a, b . The same app 11 e s f or the hi.g her- f r equen cy horizon t a 1 c omp o n e n t s . M o r e o v e r, t h e I. r a n s f o r m c; o e f f i. c i. c n L levels of the first enhancement layer transmitted for the next but one higher vertical frequency component 280 may be coded as prediction errors relative to the next: vertical frequency components in the base .layer indicated by 282. Similarly, the motion vectors of the frame-coded
macroblocks of the base layer may be used as predictors for the mo tion vec tor s of the 11 r s t enha ncomen L Jaye r .
Of course, the above example of changing the frame-coded macroblock pair from the base layer to a field-coded macroblock pair in the first: enhancement Layer was just a possible example. Of course, a f i c.l d-codod macroblock pair in the base layer may be changed info a frame-coded macroblock pair in the first enhancement layer. Moreover, it is possible that no change in the frame/f J oil d mode decision with respect to a specific macroblock pair occurs " in the first enhancement layer but in the second or following enhancement layer. The quality or precision o f: the pictures of the video may be increased and the d 1 s t o r t i o n o f t h e p i c: t ure d e ere a sod from on e i a y e r to L h e next by, for example, decreasing the quantization step size for t r a n s mi 11 i ng t h e t r a n s fo rm c o e ff. i c I ent iev e is, increasing the resolution by which the motion vectors are defined and/or using a finer partitioning and a greater number of reference pictures tor the motion compensation. Moreover, apart from the indicators 2b6 and 2 60, other-indicators may also be transmitted within the first enhancement layer data stream 122a,b. For example, indicators may be transmitted within first enhancement layer data stream 12 2a,b in order to indicate as to whether merely the motion information or the residual information or both are replaced or refined by the first enhance merit layer data stream 122a,b with respect to a specific macroblock. Similarly, index indicators may be used in order to define as to whether motion precision, enhancement informa t ion or residua1 precisIon enha nccmen t i n f o rma f i.on with respect to a specific macroblock Is to replace or refine the respective mot ion/residual information of the subordinate layer.
It may be noted that, in accordance with a preferred embodiment of the present invention, the order in which fhe-transform coefficient levels of the first enhancement Layer
are inserted in the current, enhancement, .layer data stream 122a, b is dependent on the result of decision 2b0. For example, if, in accordance with a current enhancement, layer, a specific macroblock is a frame-coded macrobiock, a scan path 284 used for defining the order .in which the transform coefficient levels of the first, enhancement, layer are inserted into the residual precision enhancement information 266 is different from a scan path 286 used for the transform coefficient levels of the respective field-coded macroblock in the subordinate enhancement layer, '['he difference in the scan paths for field- and frame-coded macrob 1 ocks re f 1 ec t s the exi s t en c;e of h i g he r- f r cq u e n c y vertical components in the transform coeff icienf matrixes 234 of frame-coded macroblocks relative t.o field-coded macroblocks. In particular, preferably the transform coefficients are transmitted within the residual precision enhancement information 266 with first transmitting the transform coefficient levels of the non-siqnificanf transform coefficients, i.e. those transform coefficients for which the transform coefficient, level is 0 according t.o the subordinate layer. The transform coefficient. ! eve i s of the non-significant transform coefficients arc coded in a so-called significant, path. The coding of the transform coefficient levels of significant transform coefficients following thereto is called a refinement path. The significance path is performed in several, cycles. In the first cycle, for example, the first. non-significant transform coefficient along the scan path 2 8 4 or 28 6 in the first transform block (see 232 in Fig. 3a) in the first macroblock is coded. Eventually, further transform c o e f f 1 c i e n t 1 e v e .1 s o f f o 11 o w 1 n g n o n - s i g n i f 1 c a n t t r a n s f o r m coefficients in scan path direction 284 and 286 within the current transform block are coded immediately thereafter, depending on the transformation bl.ock size. Then, t.ho next transform block in a transform block scan order within the current macroblock is visited until all transform blocks within the current macroblock have been visited. Thereafter, the next macroblock in macroblock scan order
within the current slice is visited, wherein the procedure is performed again within this macroblock, the raacrobiock scan order being indicated in Fig. 2 by 288. Further cycles are performed after having visited the Last, transform block in the last macroblock of" the current si ice. After having coded the transform coefficient levels of trie non¬significant transform coef f i cienfs, the t ransfo rm coefficient levels of the significant transform coefficients are coded in the refinement path. The refinement path may, depending on the encoding scheme used for coding the syntax elements into the bit-stream 126, for example, variable length coding or arithmetic coding performed by scanning the macrobiocks within a slice merely once or by scanning them in a fixed number of cycles each cycle being dedicated for a specific transform coefficient: position in scan order 281 or 286, with a respective transform coefficient; level for a specific transform coefficient position merely being coded if the transform c o e f £ i c; i e n t i s s i g n i f i c a n t.
In the significance path as well, as the ' refinement path, the scan path used for determining the visiting order among the transform coefficients within the respective transform block depends on the frame/fie Id mode of the corresponding macroblock pair according to the current refinement layer. That is, the ordering of the transform coefficients in the first enhancement layer data stream 12 2a,b may have an impact on the rate/distortion ratio of the resulting first enhancement layer data stream 12.2 a, b since, if a context-adaptive coding scheme is used, an ordering of" the transform coefficient levels in the first enhancement. Layer such that transform coefficient levels having a si mi Lar probability distribution are arranged in a juxtaposed position within the first enhancement, layer data stream 122a,b may enable a better adaptation of the probability estimation used for encoding. Therefore, the decisions 2h() and 2 52 may also depend on the influence of those decisions to the coding efficiency or quality of the probab.il i ty
estimation used for encoding the syntax elements and, in particular, the transform coefficient. .Levels in the first e n h a n c e m e n t 1 a y e r .
The way, refinement coding means 1 1 4 a, h> makes decisions 250 and 252 may be similar to the way by which blocks 110a,b along with base layer coding blocks 112a,b create the base layer bit-stream 12 0a,b. To be more precise, a fag rangian approach may be used in order t.o opt.imi/.e the decisions in rate/distortion sense.
After having described the functionality of t.he refinement. coding means 114a, b with respect to Fig. 3b, the mode of operation of the encoder of Fig. 1 is described again with respect, to Fig. 1 to Fig. 3b with more specific reference to the II. 264/MPFG4-AVC standard. In other words, t.he functionality of the encoder of Fig. j i.s described more precisely in the context of creating a scalable bit-stream 12 6 as a scalable ex t e n s i o n o f t he 1!. 2 6 4 /MP F, G A - A V C standard. In the above-described SVC working draffs of October 2005, the scalability tools were especially dedicated for frame_MBS_only flag equal to 1. In other words, in accordance with these draffs t.he mac rob locks were frame macro-blocks only. The concepts of supporting SNR and spatial scalability have only been designed for progressive source material. However, the encoder of Fig. 1 forms an ext en s ion t o in te r1a ced sour ce s by con s i do r i n g t ho properties o f i n t e r 1 a c e d s o u r c e m a t; e r 1 a 1 . T n p a r f i c u I a r, the encoder of Fig. 1 optimizes the coding of progressive refinement slices with adaptive motion refinement as described in the above working draft. JVT-Q031 for interlaced source material.- In addition t.o t.he mot. ion and residual refinement, a revision of t.he mac rob.'l ock-based frame/field decision of the base quality layer can be transmitted in an FGS enhancement layer.
In particular, the encoder of Fig. 1 extends the coding of progressive refinement slices with adapt.! ve motion
refinement for .interlaced frames with mac: rob ! oek-adapt i vo frame/field decisions in that, when macrob 1 ock-adapi. i vo frame/field coding is enabled, then, for all mac rob look pairs or a subset of the macroblock pairs of a progressive refinement slice of a coded frame, a syntax element is transm i 11 e d t h a t: sign a 1 s w h c t h or the macroblock p a i r s a r e coded as a pair or field or frame macroblocks. Depending on the frame/field mode of the macroblock pair and the progressive refinement slice and the f ramo/ f i.eid mode of the co-located macroblock pair in the subordinate SNR layer, the following applies: (1) if the current macroblock 2 02a (Fig. 3b) is coded in the field-frame mode and the co¬llocated macroblock pair in the subordinate SNR layer (in Fig. 3b, the base layer) is coded in the same fie Id-frame mode (see yes path starting from decision 250 in Fig. 3b), the field-frame decision of the SNR subordinate layer macroblock pair is used. The motion and prediction information can be refined independently of the field/frame decision as transmitted by additional .Indicators or syntax elements 2 62, 268 and 270, wherein reference is made to PC'!' E P 2 0 05/010 972 f o r f u r L h e r d c f a 1 1 s i n l. h i s r e q a r d , t. h e content of which is incorporated herein by reference with respect to the refinement of the mot. ion information and refinement information in case of keeping the frame/f i.c.l d mode decision unchanged. (2) Otherwise, if the fie id/frame decision in the current s.l ice is different from the field/frame decision in the subordinate SNR layer (see yes branch from 250), for both macroblocks in the macroblock pair, a new macroblock mode (260) together with corresponding motion and prediction information (26i; is transmitted in addition to the refinement (266) of the residual signal. The possible macroblock modes are the same as supported by the coding standard II . 2 6 4 /MPKG4 -AVC, which means that by subdivision of the macroblock into smaller blocks or partitions for motion-compensated prediction up to 16 motion vectors for IP-slices and up to 32 motion vectors for B-slices can be signal.led.
One way to make this frame/field dec.i s.i on in a progressive refinement slice is to use a Lagrangian approach where a Lagrangian cost functional J = D -l- A.R is minimized Tor a given A. Here, D s t an d s fo r t h e di s to r Li o n b eIw o en o r i q i na 1 and reconstructed (decoded) signal and R represents i.he bit rate needed for coding l.he macrob 1 ock pair. If the cost, for reversing the frame/field decision of the subordinate SNR layer is lower than the cost for keeping the f r a me/dec i s.i on of the subordinate SNR layer, it is in rate-distortion sense obviously better to reverse the frame/field decision of the macroblock pair and transmit a new set of mot., ion and/or prediction information (see no-path of decision 250). Consequently, using l.he adaptive f ramo/ f.'.i e I d refinement- it is possible to achieve a higher' picture quality at the same bit rate.
An advantage of the RGS coding scheme presented hero w I. f h respect to Figs. 1 and 3b is that the inverse transform at. the decoder si.de has to be performed only once for each transform b 1 ock . The sca.1 ed L ra ns f:o rm coe f f i c i en t;s of t ho base quality layer and of all associated progressive refinement slices are, as far as macroblock pairs with maintained frame/field coding mode are concerned, added up, and m e r e 1 y t h e o b t a 1 n e d t r a n s f o r m c: o c f f i c 1 c n I: s , w h i c h represent the highest available quality, have to be transformed. This concept is, in accordance with the RGS coding scheme of Rigs. 1 and 3b, also followed with respect to the adaptive motion refinement. In order to not. increase the decoder complexity for the FGS coding scheme with a d a p 11 v e f r a m e / f i e .1 d d c c i s 1 o n s , p r e f e r a b 1 y a s p e c i a 1 restriction is introduced for the case that the frame/field decision of the subordinate SNR .layer is changed. When a new macroblock mode is transmitted .in the RCiS coding scheme with adaptive motion refinement at; a certain refinement layer, a further syntax element residua.! prediction flag signals whether the residua 1 s i.gna 1 of t he SNR ba se 1 a ye r (or the subordinate refinement layer) is used for reconstruction. If this flag is equal, t.o 1, the transform
coefficients that have been transmitted in the SNR base layer are used for reconstructing the residual of' the enhancement layer representation. Otherwise, if this f'iaq is equal to 0, the residua J signal of the enhancement layer representation is reconstructed by using only the transform coefficient levels 266 that are transmitted in the FGS enhancement layer 122a,b. Since the transforms that are performed for field macroblock pairs use different sets of samples t h a n t h e t r a n s f o r m s t h a t; a r e p e r f o r m e d f o r f r a m o macroblock pairs, it is advantageous to avoid multiple transforms by forbidding the residual prediction when a frame/field decision is changed. Thus, in a preferred embodiment of the present invention, the syntax clement that specifies the above-described usage of a residual from the SNR base layer, i.e. the syntax clement residua 1_ prediction _f 1 ag , is only I. ransm 111ed when the frame/field decision of: the SNR base layer is not modified in the SNR enhancement layer. Otherwise, the syntax element residual_prediction_flag is inferred to bo equai to 0 at the decoder side.
According to an embodiment of the present invention, t he-syntax of specifying the frame/field decision and the macroblock mode for the FGS enhancement layer can be expressed by the following pseudo-code. In so far, the following code defines the steps performed by blocks 114a,b to code the syntax elements mentioned above into the refinement layer data stream 122a, b.
(10) ...
(12) if( ! field_pic_flag && mb adaptive frame field flag ) (
(14)
(16) mb_f :i.eld__decoding f 1 aq Kl, // f ramo/ f i e 1 d doc i s i on i r,
(18) // enhancement layer
(20) if (mb_f ield_decod.ing flag F,l, -= mb field decoding flag) {
(22) // frame/field decision of is not modified
(24) // top macroblock
(26) change_top_pred_in:f:o flag // modified
(28) // mot. i on/proc)i cf i on
(30) if(change top prod info flag ) {
(32) transmission of macroblock mode, mot. ion and
(34) prediction data
(36) transmission of residual^ prediction flag
(38) }
(40) start transmission of transform coefficient
(42) levels for the top macroblock
(44) // bottom macroblock
(46) change_bot_pred info_flag // modified
(48) // motion/prodicton
(50) if( change_bot_pred_info flag ) {
(52) transmission of macroblock mode, motion and
(54) prediction data
(56) transmission of residual prediction flag
(58) }
(60) start transmission of transform coefficient
(62) levels for the bottom macroblock
(64) } else {
(66) // frame/field decision is modified
(68) // top macroblock
(70) transmission of macroblock mode, mot; ion and
(72) prediction data
(74) residual prediction flag .is inferred to bo equal
(7 6) to 0
(78) // bottom macroblock
(80) transmission of macroblock mode, motion and
(82) prediction data
(84) residualjorediction flag Is inferred to be equal
(8 6) to 0
(88) // coding of transform coefficients
(90) start transmission of transform coefficient.
(92) levels for the macroblock pair
(94) }
(96)
(98) }
The first if-clause (lino .12) checks as to whet, her the video source material has been coded by the base layer coding blocks 112a, b such that a mac rob 1 ock-adapt. i ve frame/field decision is activated. If: this is Lhe case a syntax element: mb field decoding flag Eh is transmitted in the enhancement layer for a current mac rob 1 ock pa.i r or several macro-block pairs (Lino 16) in order to define its frame/field decision in that enhancement Layer. The second if-clause (line 20) checks as to whether the f rame/f i el d decision has changed in the enhancement layer relative to the base layer where the frame/field decision is coded info mb_ f i e 1 d_ d e c o d i n g _ f 1 a g .
The next lines (lines 22-62) define the information transmitted when the f rame/f .i eld decision has not been modified. In this case, firstly, a syntax element change top jored info flag is transmitted and coded (.line 26) indicating as to whether the motion/prediction information for the current top macroblock is modified relative to the subordinate layer. This syntax element, therefore represents an indicator 2 60 as shown in !:'i.q. 3b. If this is the case (third if-clause in line 30), a new macroblock mode, new motion vectors and reference picture numbers are transmitted (lines 32 and 3A). Then, a transmission (line 36) of syntax element. residual_predict.ion flag is performed for signal-ling as to whether the transform coefficient levels for the current. top macroblock to follow are transmitted as self-contained new transform coefficients or refinement information for refining the current coarser quantized transform c o e f £i c i e n t s. T h e n, i.e. i f t h e motion info rma t i o n . L s indicated to be adopted from the subordinate layer (no path of if clause at line 30) or the new motion information along with the residua.1 prediction flag has been transmitted (lines 32-36), the transmission of the t r a ns £orm coe £ f 1 c 1 enl: Levels .i. s por f o rmed ( 1 i. no s n 0 , A 2 ) with the transform coefficient levels rop.resenli.nq, in the case of change top_pred info flag being sot, new transform
coefficient level information or differentially coded or residual transform coefficient levels, depending on r e s i d u a1_p r e d i c t i o n _ f1ag t ra n s f erred i n i i n c 3 6. 1. n L h e other case, i.e. change top pred info flag not being set, the transform coefficient levels represent. residua! trans f orm coef f 1 ci en t I e ve 1 s , i.e. re s i d u a 1 p r ed 1 c: t i on f ! a g is inferred to indicate differential coding. This is repeated for the bottom macroblock (lines 4 4-60) .
In other words, in accordance with the present embodiment, in case of the frame/field decision being not modified, in any case, a "refinement" of the residual information fakes place. Of course, this refinment. may be zero, or "refinement" may me a n l; h a t t h e b it- s L r e a m L r a n s m i 11, e d s o far is not used b u t t h a t a c o mp 1. e t e anew signal i s t ransm 111 ed t ha t 1 s n ot d i f f e r en t i. a 1.1 y -coded. T h e f i. r s t flag, i.e. change Lop/bo I; pred info flag, indicates as to whether the refinement of the residual is conducted in the "normal mode", i.e. Lhe same moLlon parameters are used as in the subordinate layer, unci the refinement. of Lhe residual is coded as a difference t.o the Lransform coefficients transmitted so far in the base layer and subordinate refinement layers if any. in case change_top/bot jpred_info flag is not set., new motion parameters are transmitted - in the present, case without differential coding but the latter is also possible as indicated above -, and a further flag is transmitted, i.e. residual__prediction _f lag, this flag indicating as to whether the residual being valid so far is used. i.f the latter flag is set, then the refinement is coded as a difference/residual/ref inement, otherwise the residua I signal is coded completely anew.
However, otherwise, if the frame/field decision has been modified relative to the base layer, now macroblock part 11ioning mode, mo tion vec; tor s a nd ref e renc:e p i c t u re numbers are transmitted (lines 70, 72) for the current; top macroblock without signalling syntax element.
residual __prediction flag which is, instead, at the decoder side, to be inferred to be equal to 0 (lines 74, 76}. This is repeated for the bottom macroblock (lines 78-86). The transmission of the transform coefficient levels f'ot i. he-current macroblock pair then starts (.Lines 90 and 92) alter having transmitted the motion Information for the Lop and bottom macroblocks for the whole macroblock pair. Of course, the steps 10-92 are per tormod tor further macroblock pairs as well.
With respect to the above pseudo-code embodiment, i. t is emphasized that the modified syntax only applies when a coded frame is transmitted, i.e. field pic flag is equal, to 0, and macroblock-adaptive frame/field coding is enabled, i.e. mb_adaptive frame_field flag is equal to 1 (Line 12). Further, the frame/field decision is only transmi.Ll.ed (lines 16, 18) when the macroblock pair is visited the first time during the coding of progressive refinement slice. When the syntax element is different from the corresponding syntax element of the base SNR layer, a new set of macroblock modes, motion and/or prediction information are transmitted (lines 70, 72, 80, 82) for both macroblocks of the macroblock pair, - and the residual__prediction flag is inferred to be equal to 0 for both macroblocks of the macroblock pair (1 ines 7 4, 76, 84, 86) . Additionally, a syntax element specifying the transform size could bo transmitted. The coding proceeds with a first transform coefficient level of the top macroblock in the significance path described above (Lines 90/ 92). When the value of the syntax element specifying the frame/field decision is identical to its value in the base quality slice, the FGS coding follows the concept, in the above-reference PCT application or the concept, of JVT-Q031. The coding proceeds with the top macroblock, and here-first a syntax element, which specifies a change of the macroblock mode and associated motion and prediction data, change_top_pred info flag is transmitted (1 ino 26). If this syntax element is equal t.o 1, a now macroblock mode and
associated motion and prediction data as wei .1. as a flag specifying the usage of residual prediction from the base la_yer are transmitted (.lines 32-36) . The coding then proceeds with t h e f i r s t: t r a n s f o rm c o e f f: i c i e n t 1 e v e i o f t h c-top macroblock in the significance path (Lines 40, A'/.) .
In all following visits of" a macroblock pair or macroblock, i.e. when mb field decoding flag F',1, and change_top_pred_inf o flag or change bo I. pred info flag
(when applicable) and the corresponding syntax elements specifying a modified macroblock prediction modes have-already been transmitted, only further transform coefficient levels are coded in the order mentioned above. That means, the syntax element mb field decoding flag !•',i,
(and a possible modification of the macroblock prediction information for the corresponding macroblock pair) is only transmitted when a macroblock pair is visited the first time and no transform coefficient, level for this macroblock pair has been transmitted in the current progressive refinement slice. Similarly, the syntax element change_top_pred_info flag or change hot. pred info flag as well as a possible modification of the macroblock prediction information is only transmitted when mb_field_decoding_flag_EL is equal to mb_field_decoding_flag of the co-located mac rob Lock pair in the SNR base layer, and when the macroblock is vi.sited the first time and no transform coefficient level has been transmi11ed for this raacrob1ock.
With respect to Fig. A, the steps to be performed by a decoder for decoding the scalable blf-stream 12 6 arc described . The decoder starts with parsing the base Layer bit-stream 122a and 122b contained in the scalable bit-stream 12 6 in step 8 00. As a result of step 8 00, t he-decoder knows the field/frame mode for each macroblock pair-as well as the motion parameters for each macroblock as well as the existence of the residual information. In other words, in step 800, the decoder extracts the information
214, 240 and 242 from the base-; layer data stream 122a, b. In the next step, step 802, the decoder checks as to whether further refinement or quality enhancement i s desired/required. If not, the decoder immediately decodes the base layer data stream !22a,b in a decoding step 804. Depending on the spati.ai resolution dcs.i red/ rcqu i rod, the decoding 8 04 is performed by merely decoding the base layer bit-stream 1.20b in accordance with the il. 264 /M!.;'h',G4 -AVC standard or both base layer bit-streams 120a,b are decoded in accordance with that standard and linen the coa rs i y reconstructed pic tures a re refined by the f ine r1y reconstructed ones.
If a further refinement is desired/required, the decoder steps to step 806 in which the frame/field mode change indication (mb field decoding flag) and, if no change is 1 n d i c; a ted, t h e mo t i o n e n h a n c; e m c n I: o n /of f i. n d I c a f i o n (change _* _pred_into flag) is extracted from the next higher-order refinement layer bit-stream 1.22a, b. Upon stop-SOS, the decoder is able to recons i. rue t from the frame/field mode of the mac rob lock, pairs in the current refinement layer and the significance of the transform coefficient levels in the base layer or subordinate layer, the significance path and the refinement path used at the encoder side f o r t h e c; u r r e n t r o f i n e m e n I; 1 a y e r . T n t h e n e x t step, step 808, decoder parses the refinement layer accordingly in order to extract, the motion information for all. macroblocks with motion enhancement on/oft indication indicating a replacement of the current motion information and for ail macroblocks with changed frame/fie id mode decision, as well as the residual Information representing differentially coded residual Information or self-contained residual information depending on residual predict: ion flag being parsed from the refinement data stream In case of change_*_pred into flag being set, and .inferred to indicate differential coding in case of change * pred into flag being not set. Next, in step 810, the decoder checks for each macroblock pair as to whether the frame/field mode has
changed relative to the subordinate layer. If: yes, Lhe decoder steps to step 8.12 and replaces, since the residual _pr edict ion flag is interred to be equal to 0, the current encoding data, i.e. the current motion/residual da'ta, with the motion/refinement information 26 A and 266 extracted from the enhancement layer data stream of the current enhancement layer. However, for all macroblock pairs where the frame/field mode has not been modified, the decoder checks the motion enhancement on/off indicator, i.e. the syntax element change bot pred info flag, as to whether motion enhancement information 261 or 266 exists for the respective macroblocks of the macroblock pair. If this is the case, the decoder replaces - in an alternative embodiment refines - the current motion data for this ma c r ob 1 oc k, i.e. t h e mo 11 o n 1 n f o r ma t i on , a n d r e p I a c; e s o r refines the residual data for this macroblock depending on the respective f 1 ag transmi t ted in the i.ncomi ng da ta stream, i.e. residual prediction flag. To be more precise, in the ca se of decod ing t he e nha nc;eme n t: lay e r da t a s t ream in accordance with the above-pseudo code, the motion information is always replaced, whereas, in case of Lhe frame/field decision being not modified, the .residual information is replaced or refined depending on some indicator, namely residual prediction flag in the case of the above pseudo-code enhancement layer data stream. in case of replacement, the motion information for a specific macroblock contained in the enhancement layer completely replaces the motion information of the subordinate layer. In case of refinemen t;, t h e .1 n f o r m a 11. o n o f t he sub o r d i n a t e layer is combined with the respective information in the enhancement layer. Especially, the transform coefficient levels of the enhancement layer are dequanfixed and added to the already having been dequanfixed or scaled (and eventually summed up) transform coefficient: levels of the corresponding transform coef f i. c:i en ts o f Lhe subo rd i na f o layer.
Otherwise, i.e. if the motion enhancement on/off indicator, shows that the enhancement layer has no motion enhancement, information for the respective macrob.look, nothing is changed with respect to the motion data tor this macrob.lock but the decoder refines, in step 818, the res.iduaJ data by means of combining the current transform coefficients gained from-"the incoming data stream so far and via de-quantization - the refinement information of the current, refinement layer for refining the residual data, i.e. the I:ransform coef f 1 c ien L 1 e ve.1 s de f i ned f o r a red uccd quantization step size.
Thereafter, i.e. after having performed any of steps 812, 816, and 818 for all macroblocks of the current picture, the procedure returns to step 8 02 In order to check as to whether further refinement is desired/required. if yes, steps 806 to 818 are performed again for the next refinement layer. Otherwise, the procedure steps forward to step 804, where the current, encoding data is decoded, i.e. the re-transformation, such as an inverse spectral decomposition, is performed, the picture content, of the macroblocks is predicted by use of the current, mot.ion information and based on already reconstructed reference pictures and the residual information obtained by the re-transformations combined with the prediction thus obtained in order to yield the current, picture in its reconstructed form.
Summarizing the above embodiments, they represent, an FGS coding scheme with the following properties. Firstly, the coding of refinement, signals for frames with macrob lock-adaptive frame/field decision in which a pair of vertical adjacent macroblocks is either coded as a pair of frame or a pair of field macroblocks, is supported. Further, the frame/field decision for macrob.lock pairs of the base SNR layer is allowed to be adaptive!;/ modified in the FGS enhancement, layer. It. is possible that the frame/fie Id decision for an FGS enhancement, .layer is signaled by a
synlax element for each macroblock pair or for a subset, of
macroblock pairs in the FGS enhancemonL layer. For the
macroblock pairs, for which the frame/field decision is not.
signaled, the frame/field decision Ls .inferred by using
already transmitted syntax elements. In one embodiment., a
complete set of macroblock motion and prediction
information is transmitted when the frame/f.i e.l d decision in
the enhancement layer is different from the frame/field
decision of the SNR base .Layer. A syntax element, specifying
the usage of a residual prediction from SNR base .layer may
be inferred to be equal to X, when the frame/field decision
in the enhancement: layer is different, from the f rame/f 1 o l.d
decision of the SNR base layer. At this, a value of X
specifies that no residual prediction is applied and that
the reconstructed residual signals is obtained by using
only the transform coefficient levels of the current FGS
enhancement layer. Alternatively, for both rnacroblocks of a
macroblock pair, a syntax element may be transmitted when
t h e i r f r a me / f i e 1 d dec 1 s i o n 1 n t h e e n h a n c e m e n t: layer 1 s
identical to the frame/field decision of the SNR base
layer. This syntax element could specify whether a new
macroblock motion and/or predict, ion information is
transmitted in the FGS enhancement layer or whether the
motion and/or prediction Information of the co-located
macroblock in the SNR base layer arc used. The motion
compensation for field rnacroblocks is performed on a field
basis, whereas the mo tion compensaL.ion f o r f rarne
rnacroblocks is performed on a frame basis. Similarly, the
inverse transform for field rnacroblocks may be performed on
a field basis, whereas the Inverse transform for frame
rnacroblocks may be per f ormed on a f rame ba sis. !.•'u r Lhie r,
similarly, the scan order of transform coefficients inside
a transform block may be dependent on whether the
macroblock is a field or a frame macroblock.
L a s 11 y, i t i s n o t e d t h a t I: h c s y n t a x c 1 o m e n f f o r s p eel. f y i n q the frame/field mode of a macroblock pair may be transmitted using conditioned entropy codes, where the
condition is dependent on the frame/field mode of the co-located macroblock pair in the SNR base layer. Kor example, the syntax element 258 could be transmitted by means of an entropy code using a probability estimation that is dependent on the field/frame mode decision 2.12 In the base layer.
Finally, it is noted that the above embodiments were especially related to the H . 2 64/M!?F,G<1-AVC standard. However, the present invention is also applicable for to other coding schemes.
Depending on an actual .implementation, the inventive coding scheme can be implemented in hardware or in software. Therefore, the present .invention also relates to a computer program, which can be stored on a computer-readable medium such as a CD, a disc or any other data carrier. The present invention is, therefore, also a computer program having a program code which, when executed on a computer, performs the inventive method described .in connection with the above figures.
Furthermore, it is noted that all steps indicated in the flow diagrams could be implemented by respective means and the implementations may comprise sub-routines running on a CPU, circuit parts of an. ASIC or the like.
While the foregoing has been particularly shown and described with reference to particular embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made without departing from the spirit, and scope thereof". It Is to be understood that various changes may be made in adapting to different embodiments without departing from the broader cone e p t s d i s c: 1 o s e d h e r o i n a n d c: o m p r e h e n d e d b y the claims that follow.
WE CLAIM :
1. Decoder for decoding an encoded precision-scalable data stream (126) encoding a predetermined picture (200), the encoded precision-scalable data stream comprising :
- first precision-encoded data (120a,b) into which the predetermined picture is encoded with a first precision with using one of a frame coding mode and a field coding mode for a predetermined portion (202a,b) of the predetermined picture;
- higher precision information (122a,b) representing second precision encoded data into which the predetermined portion (202a,b) is encoded with a second precision higher than the first precision with using the other of the frame coding mode and the field coding mode for the predetermined portion (202a,b), or representing refinement information refining the first precision-encoded data to obtain the second precision-encoded data; and
- indication information (256) indicating an existence of a change in the frame and field coding modes used for the predetermined portion, between the first precision-encoded data and the second precision-encoded data;
the decoder comprising:
checking means (810) for checking the indication information as to whether same indicates the existence or an absence of a change in the frame or field coding modes used for the predetermined portion, between the first precision-encoded data and the second precision-encoded data;
arranging means (810-816) for, if the indication information indicate the existence of the change in the frame and field coding modes, disregarding, at least partially, the first precision-encoded data with respect to the predetermined portion and arranging, instead, the second precision-encoded data as data for decoding, or, based on the higher precision information, refining the first precision-encoded data with respect to the predetermined portion to obtain the second precision-encoded data and arranging the obtained second precision-encoded data as data for decoding; and
decoding means (804) for decoding the arranged data with using the other of the frame and field coding modes for the predetermined portion of the predetermined picture to reconstruct the predetermined picture with the second precision.
2. Decoder as claimed in claim 1, further comprising parsing means (800-808) for parsing the encoded precision-scalable data stream to realize the first precision-encoded data and the higher precision information (122a,b).
3. Decoder as claimed in claim 2, wherein the parsing means (800-808) is configured to perform the parsing of the higher precision information (122a,b) depending on the indication information.
4. Decoder as claimed in any of the preceding claims, wherein the predetermined picture is part of a video picture sequence (104) and the decoding means is configured to extract motion information and respective residual information for the predetermined portion from the data for decoding, apply the motion information to reconstructed reference pictures to obtain a motion-compensated prediction for the predetermined portion, and reconstruct the predetermined portion based on the motion-compensated prediction and the residual information.
5. Decoder as claimed in claim 4, wherein the decoding means (804) is configured to perform an inverse spectral decomposition to extract the residual information.
6. Decoder as claimed in any of claims 4 and 5, wherein the decoding means (804) is configured to perform the application of the motion information and the reconstruction of the predetermined portion dependent on the indication information.
7. Decoder as claimed in any of the preceding claims, wherein the arranging means (810-816) is configured to disregard the second precision-encoded data and arrange the first precision-encoded data as data for decoding if an instruction to the decoder signals that the predetermined picture is to be reconstructed merely in the first precision.
8. Decoder as claimed in any of the preceding claims, wherein the arranging means (810-816) is configured to, if the indication information indicate the absence of the change in the frame and field coding decisions for the predetermined portion between the first precision-encoded data and the
second precision-encoded data, check (814) refinement change information (260) in the encoded precision-scalable data stream as to whether the first precision-encoded data is to be refined with respect to the predetermined portion or not, and, depending on the check result, keep the first precision-encoded data as the data to be decoded with respect to the predetermined portion or refine (816), based on the high precision information, the first precision encoded data to obtain the second precision-encoded data and arrange the obtained second precision-encoded data as the data to be decoded.
9. Decoder as claimed in any of the preceding claims, wherein the indication information is signaled by a first syntax element associated with said predetermined portion and the predetermined picture comprises another predetermined portion, wherein the higher precision information (122a,b) lacks any second syntax element associated with said other portion for signaling an absence or an existence of a change in the frame and field coding modes with respect to the other predetermined portion between the first precision-encoded data and the second precision-encoded data, wherein the checking means (810) is configured to infer a value of the second syntax element by use of already transmitted syntax elements.
10. Decoder as claimed in any of the preceding claims, further comprising second checking means (814) for, if the indication information indicates the absence of the change in the frame and field coding modes with respect to the predetermined portion between the first precision-encoded data and the second precision-encoded data, checking a subordinate information (260) comprised by the higher precision information (122a,b) as to whether the second precision-encoded data includes motion
information and/or residual information, and as to whether the second precision-encoded data is to replace the first precision-encoded data with respect to the predetermined portion or the second precision-encoded data is dedicated for refining the first precision-encoded data to obtain the second precision-encoded data, to obtain a check result, wherein the arranging means (810-816) is configured to perform the disregarding and arranging or the refining and arranging with respect to the motion or residual information dependent on the check result.
11. Decoder as claimed in any of the preceding claims, wherein the second precision-encoded data comprises transform coefficient levels of transform coefficients of a transform coefficient matrix (234) representing a motion-compensated residual of at least a portion of the predetermined portion, and wherein the parsing means is arranged to use a scan order among the transform coefficients, which is equal to one of a first and a second scan order (286) different to the first scan order (284), dependent on the indication information.
12. Decoder as claimed in claim 11, wherein the arranging means (810-816) and decoding means (804) are configured to, if the indication information (256) indicates the presence of the change in the frame and field coding modes for the predetermined portion between the first precision-encoded data and the second precision-encoded data, apply an inverse transform to the transform matrix (234) to obtain the motion-compensated residual, combine the motion-compensated residual with a portion of a reconstructed reference picture encoded using a field or frame coding mode, displaced from the predetermined portion by motion information indicated in the higher precision information (122a,b) or the first
precision-encoded data to obtain a candidate reconstructed picture that is equal to the reconstructed picture in case of the other one of the frame and field coding mode being the frame coding mode, and, if the other one of the frame and field coding mode is the field coding mode, convert the candidate reconstructed picture from a frame representation into a field representation to obtain the reconstructed picture.
13. Encoder for encoding a predetermined picture, comprising:
base encoding means (110a,b, 112a,b) for encoding the predetermined picture with a first precision with using one of a frame coding mode and a field coding mode for a predetermined portion (202a,b) of the predetermined picture to obtain first precision-encoded data (120a,b);
determination means (114a,b) for determining higher precision information (120a,b) representing second precision-encoded data into which the predetermined portion is encoded with a second precision being higher than the first precision using the other of the frame coding mode and field coding mode, or representing refinement information refining the first precision-encoded data to obtain the second precision-encoded data; and
construction means (124) for constructing an encoded precision-scalable data stream (126) encoding the predetermined picture to include the first precision-encoded data (120a,b), the higher precision information (122a,b) and indication information (256) indicating a change in the frame and field coding modes used for the predetermined portion, between the first precision-encoded data and the second precision-encoded data.
14. Encoder as claimed in claim 13, wherein the predetermined picture further comprises another predetermined portion, and the higher precision information (122a,b) also represents other second precision-encoded data into which the other predetermined portion is encoded with the second precision and using the one of the frame and field coding modes or also representing respective other refinement information refining the other first-precision encoded data into which the other predetermined portion is encoded with the first precision, wherein the determination means (114a,b) is configured to determine the second-precision encoded data such that same comprise first transform coefficient levels of transform coefficients of a first transform coefficient matrix (234) representing a motion-compensated residual of the predetermined portion, and the other second precision-encoded data so that same comprise second transform coefficient levels of transform coefficients of a second transform coefficient matrix (234) representing a motion-compensated residual of the other predetermined portion, and the construction means (114a,b) being configured to code the first transform coefficient levels into the encoded precision-scalable data stream in accordance with a first scan order (284) among the transform coefficients of the first transform coefficient matrix, and the second transform coefficient levels into the encoded precision-scalable data stream in accordance with a second scan order (286) among the transform coefficients of the second transform coefficient matrix (234) being different from the first scan order (284).
15. Encoder as claimed in claim 13 or 14, wherein the construction means (124) is configured to perform the construction such that a correct parsing of the higher precision information (122a,b) depends on the indication information.
16. Encoder as claimed in any claims 13 to 15, wherein the predetermined picture is part of a video picture sequence (104) and the base encoding means and the determination means (114a,b) are designed such that the second-precision encoded data enables obtaining motion information and/or respective residual information for the predetermined portion from the encoded precision-scalable data stream, applying the motion information to already encoded and reconstructed reference pictures to obtain a motion-compensated prediction for the predetermined portion, and reconstructing the predetermined portion based on the motion-compensated prediction and the residual information.
17. Encoder as claimed in claim 16, wherein the base encoding means (110a,b, 112a,b) and the determination means (114a,b) are configured such that an inverse spectral decomposition has to be performed to extract the residual information.
18. Encoder as claimed in any of claims 16 and 17, wherein the base encoding means and the determination means are configured such that the application of the motion information and the reconstruction of the predetermined portion have to be performed dependent on the indication information.
19. Encoder as claimed in any of claims 13 to 18, wherein the construction means (114a,b) is configured such that the indication information (256) indicates the absence of the change in the frame and field coding modes of another predetermined portion of the predetermined picture between the first precision-encoded data and the second precision-encoded data, and the construction means (114a,b) and the determination means
(114a,b) are configured such that the encoded precision-scalable data stream comprises refinement change information (260) indicating as to whether the first-precision encoded data is to be refined with respect to the other predetermined portion or not, and the higher precision information additionally represents further refinement information refining the first precision-encoded data with respect to the other predetermined portion to obtain other second-precision encoded data encoding the predetermined portion with the second precision.
20. Encoder as claimed in any of claims 13 to 19, wherein the construction means (114a,b) is configured such that the indication information indicates the absence of the change in the frame and field coding modes of another predetermined portion of the predetermined picture between the first precision-encoded data and the second precision-encoded data, the construction means (114a,b) and the determination means (114a,b) are configured such that the encoded precision-scalable data stream comprises subordinate information (260) in the higher precision information (122a,b), indicating as to whether the higher precision information (122a,b) includes other second precision-encoded data including motion information and/or residual information for the other predetermined portion, and indicating as to whether the other second precision-encoded data is to replace the first precision-encoded data with respect to the predetermined portion or the other second-precision encoded data is dedicated for refining the first precision-encoded data with respect to the other predetermined portion to obtain the second-precision encoded data.
21. Method for decoding an encoded precision-scalable data stream (126) encoding a predetermined picture (200), the encoded precision-scalable data stream comprising first precision-encoded data (120a,b) into which the predetermined picture is encoded with a first precision with using one of a frame coding mode and a field coding mode for a predetermined portion (202a,b) of the predetermined picture a first one of frame-wise and field-wise, higher precision information (122a,b) representing second precision encoded data into which the predetermined portion (202a,b) is encoded with a second precision higher than the first precision with using the other of the frame coding mode and field coding mode for the predetermined portion (202a,b) or representing refinement information refining the first precision-encoded data to obtain the second precision-encoded data, and indication information (256) indicating an existence of a change in the frame-/field coding modes used for the predetermined portion, between the first precision-encoded data to obtain the second precision-encoded data, the method comprising the following steps, performed on a hardware:
checking (810) the indication information as to whether same indicates the existence or an absence of a change in the frame or field coding modes used for the predetermined portion between the first precision-encoded data and the second precision-encoded data;
if the indication information indicate the existence of the change in the frame and field coding, disregarding, at least partially, the first precision-encoded data with respect to the predetermined portion and arranging, instead, the second precision-encoded data as data for decoding, or, based on the higher precision information, refining the first precision-
encoded data with respect to the predetermined portion to obtain the second precision-encoded data and arranging the obtained second precision-encoded data as data for decoding; and
decoding (804) the arranged data with using the other of the frame or field coding modes for the predetermined portion of the predetermined picture to reconstruct the predetermined picture with the second precision.
22. Method for encoding a predetermined picture, comprising the following steps, performed on a hardware:
encoding a predetermined portion (202a,b) of a predetermined picture (200) with a first precision using one of frame-wise coding mode or field-wise coding mode to obtain first precision-encoded data (120a,b);
determining higher precision information representing second precision-encoded data into which encoded with a second precision higher than the first precision using the other of the frame coding mode or field coding mode for the predetermined portion, or representing refinement information refining the first precision-encoded data to obtain the second precision-encoded data; and
constructing an encoded precision-scalable data stream (126) encoding the predetermined picture to include the first precision-encoded data (120a,b), the higher precision information (122a,b) and indication information (256) indicating an existence of a change in the frame-field coding modes used for the predetermined portion between the first precision-encoded data and the second precision-encoded data.
ABSTRACT
TITLE "DECODER, ENCODER AND METHODS FOR ENCODING/DECODING PRECISION-SCALABLE BIT STREAM WITH ENCODED PREDETERMINED PICTURE"
The invention relates with an improved coding efficiency is achieved by giving the encoder the opportunity to change the field/frame-wise treatment of individual picture portions between the first precision-encoded data and the second precision-encoded data, with the second precision being higher than the first precision.
| # | Name | Date |
|---|---|---|
| 1 | abstract-03417-kolnp-2007.jpg | 2011-10-07 |
| 2 | 3417-KOLNP-2007-PA.pdf | 2011-10-07 |
| 3 | 3417-KOLNP-2007-CORRESPONDENCE OTHERS 1.2.pdf | 2011-10-07 |
| 4 | 03417-kolnp-2007-pct request form.pdf | 2011-10-07 |
| 5 | 03417-kolnp-2007-pct priority document notification.pdf | 2011-10-07 |
| 6 | 03417-kolnp-2007-international search report.pdf | 2011-10-07 |
| 7 | 03417-kolnp-2007-international publication.pdf | 2011-10-07 |
| 8 | 03417-kolnp-2007-form 5.pdf | 2011-10-07 |
| 9 | 03417-kolnp-2007-form 3.pdf | 2011-10-07 |
| 10 | 03417-kolnp-2007-form 2.pdf | 2011-10-07 |
| 11 | 03417-kolnp-2007-form 18.pdf | 2011-10-07 |
| 12 | 03417-kolnp-2007-form 1.pdf | 2011-10-07 |
| 13 | 03417-kolnp-2007-drawings.pdf | 2011-10-07 |
| 14 | 03417-kolnp-2007-description complete.pdf | 2011-10-07 |
| 15 | 03417-kolnp-2007-correspondence others.pdf | 2011-10-07 |
| 16 | 03417-kolnp-2007-correspondence others 1.2.pdf | 2011-10-07 |
| 17 | 03417-kolnp-2007-correspondence others 1.1.pdf | 2011-10-07 |
| 18 | 03417-kolnp-2007-claims.pdf | 2011-10-07 |
| 19 | 03417-kolnp-2007-abstract.pdf | 2011-10-07 |
| 20 | 3417-KOLNP-2007-(11-04-2013)-PETITION UNDER SECTION 137.pdf | 2013-04-11 |
| 21 | 3417-KOLNP-2007-(11-04-2013)-OTHERS.pdf | 2013-04-11 |
| 22 | 3417-KOLNP-2007-(11-04-2013)-FORM-5.pdf | 2013-04-11 |
| 23 | 3417-KOLNP-2007-(11-04-2013)-FORM-2.pdf | 2013-04-11 |
| 24 | 3417-KOLNP-2007-(11-04-2013)-DRAWINGS.pdf | 2013-04-11 |
| 25 | 3417-KOLNP-2007-(11-04-2013)-DESCRIPTION (COMPLETE).pdf | 2013-04-11 |
| 26 | 3417-KOLNP-2007-(11-04-2013)-CORRESPONDENCE.pdf | 2013-04-11 |
| 27 | 3417-KOLNP-2007-(11-04-2013)-CLAIMS.pdf | 2013-04-11 |
| 28 | 3417-KOLNP-2007-(11-04-2013)-ANNEXURE TO FORM 3.pdf | 2013-04-11 |
| 29 | 3417-KOLNP-2007-(11-04-2013)-ABSTRACT.pdf | 2013-04-11 |
| 30 | 3417-KOLNP-2007-(10-10-2013)-FORM-2.pdf | 2013-10-10 |
| 31 | 3417-KOLNP-2007-(10-10-2013)-FORM-1.pdf | 2013-10-10 |
| 32 | 3417-KOLNP-2007-(10-10-2013)-CORRESPONDENCE.pdf | 2013-10-10 |
| 33 | 3417-KOLNP-2007-(10-10-2013)-CLAIMS.pdf | 2013-10-10 |
| 34 | 3417-KOLNP-2007-(10-10-2013)-ABSTRACT.pdf | 2013-10-10 |
| 35 | 3417-KOLNP-2007-REPLY TO EXAMINATION REPORT.pdf | 2013-10-25 |
| 36 | 3417-KOLNP-2007-OTHERS.pdf | 2013-10-25 |
| 37 | 3417-KOLNP-2007-INTERNATIONAL SEARCH REPORT & OTHERS.pdf | 2013-10-25 |
| 38 | 3417-KOLNP-2007-INTERNATIONAL PUBLICATION.pdf | 2013-10-25 |
| 39 | 3417-KOLNP-2007-FORM 26.pdf | 2013-10-25 |
| 40 | 3417-KOLNP-2007-FORM 18.pdf | 2013-10-25 |
| 41 | 3417-KOLNP-2007-EXAMINATION REPORT.pdf | 2013-10-25 |
| 42 | 3417-KOLNP-2007-CORRESPONDENCE.pdf | 2013-10-25 |
| 43 | 3417-KOLNP-2007-CANCELLED PAGES.pdf | 2014-01-21 |
| 44 | 3417-kolnp-2007-REPLY TO EXAMINATION REPORT-1.1.pdf | 2014-02-10 |
| 45 | 3417-kolnp-2007-INTERNATIONAL SEARCH REPORT & OTHERS-1.1.pdf | 2014-02-10 |
| 46 | 3417-kolnp-2007-INTERNATIONAL PUBLICATION-1.1.pdf | 2014-02-10 |
| 47 | 3417-kolnp-2007-GRANTED-SPECIFICATION-COMPLETE.pdf | 2014-02-10 |
| 48 | 3417-kolnp-2007-GRANTED-LETTER PATENT.pdf | 2014-02-10 |
| 49 | 3417-kolnp-2007-GRANTED-FORM 5.pdf | 2014-02-10 |
| 50 | 3417-kolnp-2007-GRANTED-FORM 3.pdf | 2014-02-10 |
| 51 | 3417-kolnp-2007-GRANTED-FORM 2.pdf | 2014-02-10 |
| 52 | 3417-kolnp-2007-GRANTED-FORM 1.pdf | 2014-02-10 |
| 53 | 3417-kolnp-2007-GRANTED-DRAWINGS.pdf | 2014-02-10 |
| 54 | 3417-kolnp-2007-GRANTED-DESCRIPTION (COMPLETE).pdf | 2014-02-10 |
| 55 | 3417-kolnp-2007-GRANTED-CLAIMS.pdf | 2014-02-10 |
| 56 | 3417-kolnp-2007-GRANTED-ABSTRACT.pdf | 2014-02-10 |
| 57 | 3417-kolnp-2007-FORM 26-1.1.pdf | 2014-02-10 |
| 58 | 3417-kolnp-2007-FORM 18-1.1.pdf | 2014-02-10 |
| 59 | 3417-kolnp-2007-EXAMINATION REPORT-1.1.pdf | 2014-02-10 |
| 60 | 3417-kolnp-2007-CORRESPONDENCE-1.1.pdf | 2014-02-10 |
| 61 | 3417-kolnp-2007-CANCELLED PAGES-1.1.pdf | 2014-02-10 |
| 62 | 257748-Form 27-270215.pdf | 2015-04-06 |
| 63 | 3417-KOLNP-2007-(01-03-2016)-FORM-27.pdf | 2016-03-01 |
| 64 | 03417-KOLNP-2007-(28-03-2016)-FORM-27.pdf | 2016-03-28 |
| 65 | Form 27 [21-03-2017(online)].pdf | 2017-03-21 |
| 66 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [18-01-2018(online)].pdf | 2018-01-18 |
| 67 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [06-02-2019(online)].pdf | 2019-02-06 |
| 68 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [02-03-2020(online)].pdf | 2020-03-02 |
| 69 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [24-09-2021(online)].pdf | 2021-09-24 |
| 70 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [28-09-2022(online)].pdf | 2022-09-28 |
| 71 | 3417-KOLNP-2007-RELEVANT DOCUMENTS [12-09-2023(online)].pdf | 2023-09-12 |