"Reversible Transform For Lossy And Lossless 2 D Data Compression"
Abstract:
A 2D transform and its inverse have an implementation as a sequence of lifting steps arranged for reduced computational complexity (i.e., reducing a number of non-trivial operations). This transform pair has energy compaction properties similar to the discrete cosine transform (DCT), and is also lossless and scale-free. As compared to a separable DCT transform implemented as ID DCT transforms applied separably to rows and columns of a 2D data block, the transforms operations are re-arranged into a cascade of elementary transforms, including the 2x2 Hadamard transform", and 2x2 transforms incorporating lifting rotations. These elementary transforms have implementations as a sequence of lifting operations.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052, U.S.A
Inventors
1. SRIDHAR SRNIVASAN
ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052, U.S.A
Specification
Reversible Transform For Lossy And Lossless 2-D Data
Compression
Technical Field
The invention relates generally to block transform-based digital media (e g, video and image) compression
Background
Block Transform-Based Coding
Transform coding is a compression technique used in many audio, image and video compression systems Uncompressed digital image and video is typically represented or captured as samples of picture elements or colors at locations in an image or video frame arranged in a two-dimensional (2D) grid This is referred to as a spatial-domain representation of the image or video For example, a typical format for images consists of a stream of 24-bit color picture element samples arranged as a grid Each sample is a number representing color components at a pixel location in the grid within a color space, such as RGB, or YIQ, among others Various image and video systems may use various different color, spatial and time resolutions of sampling Similarly, digital audio is typically represented as time-sampled audio signal stream For example, a typical audio format consists of a stream of 16-bit amplitude samples of an audio signal taken at regular time intervals
Uncompressed digital audio, image and video signals can consume considerable storage and transmission capacity Transform coding reduces the size of digital audio, images and video by transforming the spatial-domain representation of the signal mto a frequency-domain (or other like transform domain) representation, and then reducing resolution of certain generally less perceptible frequency components of the transform-domain representation This generally produces much less perceptible degradation of the
digital signal compared to reducing color or spatial resolution of images or video in the spatial domain, or of audio in the time domain
More specifically, a typical block transform-based codec 100 shown m Figure 1 divides the uncompressed digital image's pixels into fixed-size two dimensional blocks (X1, Xn), ea0ch block possibly overlapping with other blocks A linear transform 120-121 that does spatial-frequency analysis is applied to each block, which converts the spaced samples within the block to a set of frequency (or transform) coefficients generally representing the strength of the digital signal in corresponding frequency bands over the block interval For compression, the transform coefficients may be selectively quantized 130 (l e, reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or vanable-length coded 130 into a compressed data stream At decoding, the transform coefficients will inversely transform 170-171 to nearly reconstruct the original color/spatial sampled image/video signal (reconstructed
blocks X1 Xn)
The block transform 120-121 can be defined as a mathematical operation on a vector x of size N Most often, the operation is a linear multiplication, producing the transform domain output y=Mx,M being the transform matrix When the input data is arbitrarily long, it is segmented into N sized vectors and a block transform is applied to each segment For the purpose of data compression, reversible block transforms are chosen In other words, the matrix M is invertible In multiple dimensions (e g, for image and video), block transforms are typically implemented as separable operations The matrix multiplication is applied separably along each dimension of the data (I e, both rows and columns)
For compression, the transform coefficients (components of vector y) may be selectively quantized (l e, reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or vanable-length coded into a compressed data stream
At decoding in the decoder 150, the inverse of these operations (dequantization/entropy decoding 160 and inverse block transform 170-171) are applied on the decoder 150 side, as show in Fig 1 While reconstructing the data, the inverse matrix M-1 (inverse transform 170-171) is applied as a multiplier to the transform domain data When applied to the transform domain data, the inverse transform nearly reconstructs the original time-domain or spatial-domain digital media.
In many block transform-based coding applications, the transform is desirably reversible to support both lossy and lossless compression depending on the quantization factor With no quantization (generally represented as a quantization factor of 1) for example, a codec utilizing a reversible transform can exactly reproduce the input data at decoding However, the requirement of reversibility in these applications constrains the choice of transforms upon which the codec can be designed
Many image and video compression systems, such as MPEG and Windows Media, among others, utilize transforms based on the Discrete Cosme Transform (DCT) The DCT is known to have favorable energy compaction properties that result in near-optimal data compression In these compression systems, the inverse DCT (IDCT) is employed in the reconstruction loops in both the encoder and the decoder of the compression system for reconstructing individual image blocks The DCT is descnbed by N Ahmed, T Natarajan, and K R Rao, "Discrete Cosme Transform," IEEE Transactions on Computers, C-23 (January 1974), pp 90-93 An exemplary implementation of the IDCT is descnbed in "IEEE Standard Specification for the Implementations of 8x8 Inverse Discrete Cosme Transform," IEEE Std 1180-1990, December 6,1990
Conventional data transforms used to implement reversible 2D data compressor have generally suffered one or more of the following primary disadvantages,
1 Unequal norms between transform coefficients, requiring complicated entropy coding schemes,
2 Poor approximations to optimal transforms, such as the DCT, and
3 High computational complexity
Conventional Implementation of 2D Transform
A separable 2D transform is typically implemented by performing ID transforms on the rows of the data, followed by ID transform on its columns of data (or vice versa) See, A K Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989 In matrix notation, let T represent the transform matrix and X be the 2D data. The separable 2D transform with T is defined by Yin the following equation
(Equation Removed)
Indeed, the row-wise and column-wise transforms may be distinct For instance, the data matrix could be non-square (say of size 4x8), or the row-wise and column-wise transforms could be the DCT and discrete sine transform (DST) respectively In this
case, the pre and post multipliers are different (say T1 and T2) and the transform Y is given by
(Equation Removed)
For example, Figure 2 shows a 2D 4*4 DCT implemented in two stages In the first stage, the columns of the data matrix are transformed using a 4 point ID DCT In the second stage, 4 point 1D DCTs are applied along the rows With infinite arithmetic accuracy, this ordering may be switched with no change in the output
The 4 pomt ID DCT can be implemented as a sequence of multiplication and addition operations on the 4 input data values, as represented in the signal flow graph shown in Figure 3 The values c and s in this diagram are respectively cosine and sine of /z/8 The separable transform approach works well for a lossy codec Lossless codecs are more challenging to realize Even with unit quantization, the separable 2D DCT described above in conjunction with its separable inverse DCT or IDCT is not guaranteed to produce a bit exact match to the original input This is because the divisors in Figure 3 give nse to rounding errors that may not cancel out between the encoder and decoder
Lifting
In order to achieve lossless compression with a block transform-based codec, it is necessary to replace the above-descnbed 4x4 2D DCT with a lossless transform A
separable transform may be used only if each 1D transform is lossless or reversible Although multiple choices exist for reversible 1D transforms, those based on "lifting" are by far the most desirable Lifting is a process of performing a matrix-vector multiplication using successive "shears " A shear is defined as a multiplication of the operand vector with a matrix which is an identity matrix plus one non-zero off-diagonal element Sign inversion of one or more vector coefficients may occur anywhere during this process, without loss of generality
Lifting has been implemented through ladder or lattice filter structures in the past Lifting or successive shear based techniques have been used in graphics See, A Tanaka, M Kameyama, S Kazama, and O Watanabe, "A rotation method for raster image using skew transformation," Proc IEEE Con/on Computer Vision and Pattern Recognition, pages 272-277, June 1986, and A W Paeth, "A fast algorithm for general raster rotation," Proceedings of Graphics Interface '86, pages 77-81, May 1986 In fact, it can be argued that Gauss-Jordan elimination is a manifestation of lifting
One simple 2 pomt operation is the Hadamard transform, given by the transform
Matrix
(Equation Removed)
Two approaches are commonly employed for implementing a
lifting-based (reversible) ID Hadamard transform The first is to implement the normalized or scale-free Hadamard transform m lifting steps, as shown m Figure 4 The second approach is to allow the scales to differ between the two transform coefficients, as shown in Figure 5
Problems with Lifting
Lifting is not without its problems In the first Hadamard transform approach shown in Figure 4, the two transform coefficients are normalized This is desirable for realizing multi-stage transforms, such as the 4 or 8 point DCT However, this implementation suffers from two major drawbacks - first, each 2 point Hadamard transform requires three non-tnvial (I e, computationally expensive) lifting steps, and second, rounding errors in the lifting steps cause low pass energy to "leak" into the high frequency term leading to reduced compression efficiency In this first approach, using
the approximation
(Equation Removed)
results in the AC basis function [0 75 -
0 7188] While the discrepancy from the required [0 7071 0 7071] does not seem overly large, a DC signal of amplitude 64 produces an AC response of 2 units, which leaks into the expensive-to-encode high frequency band
The second approach (Figure 5) uses trivial lifting steps However, the low pass
term is scaled up by a factor of v2 , whereas the high pass term is scaled down by l/√2
(or vice versa) The resolution of the two coefficients differs by one bit In two dimensions, the high-high term is less in resolution by 2 bits compared to the low-low term Cascaded transform stages only increase this discrepancy Entropy coding is more difficult to implement due to the differing ranges of the coefficients
In summary, the problems with lifting based lossless transforms are
1 Possible unequal scaling between transform coefficients, making for more complex entropy coding mechanisms
2 Poor approximations to desired transform basis functions that may cause undesirable effects such as the leakage of DC into AC bands
3 Potentially high computational complexity, especially so if the lifting based implementation is designed to well approximate the desired transform
Summary
A digital media encoder/decoder system is based on a separable 2D block transform which has various implementations described herein that address the above-indicated problems and drawbacks of prior art transforms In particular, a described implementation of the pair of a separable 2D transform and its inverse has a sequence of lifting steps arranged for reduced computational complexity (1 e, reducing a number of non-tnvial operations) This transform pair has energy compaction properties similar to the DCT, and is also lossless and scale-free The term "lossless" means that the original integer input to the transform can be recovered without error by the inverse transform, from its integer transform coefficients, assuming no quantization "Scale-free" refers to
the transform pair's basis functions being equally scaled, which also means that the resulting transform matrix is orthogonal
One described implementation of this transform pair is as a 4x4 transform, but can be extended to other sizes as well (e g, 8x8, etc) Furthermore, cascades of the transform pair may be used to realize hierarchical pyramids and larger transforms For instance, one described implementation uses a two level cascade of the transform In the second transform stage, the transform is applied to the 16 DC coefficients generated within a macroblock Smce the transform is similar to the DCT, it can be used to realize a lossless-to-lossy digital media codec (1 e, a codec whose quantization parameter can be varied from a lossless setting to lossy settings) with superior rate-distortion performance and compression efficiency
Additional features and advantages of the invention will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings
Brief Description Of The Drawings
Figure 1 is a block diagram of a conventional block transform-based codec in the prior art
Figure 2 is a block diagram of a 2D 4x4 DCT implemented m two stages, also in the pnor art
Figure 3 is a signal flow graph of a ID 4x4 DCT, also in the pnor art
Figure 4 is a signal flow graph of a normalized 2-point Hadamard transform using lifting, also in the pnor art
Figure 5 is a signal flow graph of a trivial 2-point Hadamard transform, also in the pnor art
Figure 6 is a flow diagram of an encoder based on an unproved reversible 2D transform
Figure 7 is a flow diagram of an decoder based on the unproved reversible 2D transform
Figure 8 is a signal flow graph of a normalized lifting-based implementation of the reversible 2x2 Hadamard transform
Figure 9 is a program listing in the C programming language for realizing the normalized reversible 2x2 Hadamard transform of Figure 8
Figure 10 is a signal flow graph of an inverse of the normalized reversible 2x2 Hadamard transform of Figure 8
Figure 11 is a signal flow graph of a normalized lifting-based implementation of the Jadd transform
Figure 12 is a program listing in the C programming language for realizing the normalized Jadd transform of Figure 11
Figure 13 is a signal flow graph of a normalized lifhng-based version of the inverse of the Jadd transform of Figure 11
Figure 14 is a signal flow graph of a normalized lifhng-based implementation of the Jadd transform
Figure 15 is a program listing in the C programming language for realizing the normalized Jadd ads transform of Figure 14
Figure 16 is a signal flow graph of a normalized lifting-based version of the inverse of the Jadd transform of Figure 14
Figure 17 is a diagram illustrating the ordering of 2x2 data in the depictions herein of transforms and inverse transform operations
Figure 18 is a signal flow graph illustrating a 2D DCT implemented separably as a ID vertical DCT and a ID horizontal DCT applied to columns and rows, respectively, of a 4x4 data input
Figure 19 is a signal flow graph illustrating the reversible, scale-free 2D transform implemented by interleaving horizontal and vertical transform operations in two stages
Figure 20 is a diagram illustrating the points of a 4x4 data block to which the 2x2 Hadamard transform of Figure 8 is applied in a first stage of an implementation of the improved, reversible 2D transform in the encoder of Figure 6
Figure 21 is a diagram illustrating the points of the 4x4 data block to which the 2x2 Hadamard transform of Figure 8, the Jadd transform of Figure 11, and the
Jadd transform of Figure 14 are applied in a second stage of the implementation of the
improved, reversible 2D transform in the encoder of Figure 6
Figure 22 is a diagram illustrating the points of the 4x4 transform coefficients block to which the 2x2 Hadamard transform of Figure 8, the Jadd transform of Figure 11,
and the Jadd transform of Figure 14 are applied in a first stage of the implementation of
the inverse 2D transform in the decoder of Figure 7
Figure 23 is a diagram illustrating the ordering of transform coefficients for the forward and inverse 2D transform in the encoder of Figure 6 and decoder of Figure 7
Figure 24 is a block diagram of a suitable computing environment for implementing the block transform-based codec with improved spatial-domain lapped transform of Figures 6 and 7
Figure 25 is a signal flow graph of a structure for the normalized hfung-based implementation of the reversible 2x2 transforms shown in Figures 11 and 14
Detailed Description
The following description relates to a digital media compression system or codec, which utilizes an unproved, reversible scale-free 2D transform For purposes of illustration, an embodiment of a compression system incorporating the improved transform is an image or video compression system Alternatively, the improved transform also can be incorporated mto compression systems or codecs for other 2D data The transform does not require that the digital media compression system encodes the compressed digital media data in a particular codmg format
1. Encoder/Decoder
Figures 6 and 7 are a generalized diagram of the processes employed in a representative 2-dimensional (2D) data encoder 600 and decoder 700 based on the improved, reversible scale-free 2D transform 650 detailed below The diagrams present a generalized or simplified illustration of the use and application of this transform m a
compression system incorporating the 2D data encoder and decoder In alternative encoders based on this transform, additional or fewer processes than those illustrated in this representative encoder and decoder can be used for the 2D data compression For example, some encoders/decoders may also include color conversion, color formats, scalable coding, lossless coding, macroblock modes, etc The unproved 2D transform permits the compression system (encoder and decoder) to provide lossless and/or lossy compression of the 2D data, depending on the quantization which may be based on a quantization parameter varying from lossless to lossy
The 2D data encoder 600 produces a compressed bitstream 620 that is a more compact representation (for typical input) of 2D data 610 presented as input to the encoder For example, the 2D data input can be an image, a frame of a video sequence, or other data having two dimensions The 2D data encoder tiles 630 the input data into macroblocks, which are 16x 16 pixels in size in this representative encoder The 2D data encoder further tiles each macroblock into 4x4 blocks 632 A "forward overlap" operator 640 is applied to each edge between blocks, after which each 4*4 block is transformed using the reversible scale-free transform 650 Subsequently, the DC coefficient 660 of each 4x4 transform block is subject to a similar processing chain (tiling, forward overlap, followed by 4x4 block transform) The resulting DC transform coefficients and the AC transform coefficients are quantized 670, entropy coded 680 and packetized 690
The decoder performs the reverse process On the decoder side, the transform coefficient bits are extracted 710 from their respective packets, from which the coefficients are themselves decoded 720 and dequantized 730 The DC coefficients 740 are regenerated by applying an inverse transform, and the plane of DC coefficients is "inverse overlapped" using a suitable smoothing operator applied across the DC block edges Subsequently, the entire data is regenerated by applying the 4*4 inverse transform 750 to the DC coefficients, and the AC coefficients 742 decoded from the bitstream Finally, the block edges in the resulting image planes are inverse overlap filtered 760 This produces a reconstructed 2D data output
2. Implementation of the Improved, Reversible Scale-Free Transform
As described by, e g, A K Jain, Fundamentals of Digital Image Processing,
Prentice Hall, 1989, a separable 2D transform can be implemented as a ID transform
operating on the data ordered m 1D, producing a likewise ordered vector result The
equivalent transform matrix is generated by the Kronecker product of the pre- and post-
multipliers used in the separable case If a. and y. denote the data and transform vectors
reordered from their 2D representation in (2), their relationship is given by the following
(Equation Removed)
where J=Kron(T1,T2)
Although the separable implementation of a 2D transform shown in equation (2) is computationally more efficient (in an asymptotic sense) than in equation (3), there are certain cases where the latter representation leads to desirable properties For instance, an implementation based on equation (3) has lower latency than in equation (2), due to a single stage matrix multiply (which is an operation supported natively on several digital signal processors (DSPs)) For the unproved, reversible scale-free transform described herein, the ID representation of 2*2 steps leads to a scale-free reversible structure
Moreover, a separable 2D transform can be implemented as a cascade of simpler ID transforms Assume that the transform matrices T1 and T2 can be decomposed as follows
(Equation Removed)
Associativity of the matrix multiply operation can be used to reorder the 2D
transform (2) as follows
(Equation Removed)
leading to the cascaded ID implementation
(Equation Removed)
Transforms such as the DCT can be formulated as a cascade of elementary 2 point rotation operations The 2D DCT can be formulated using the structure of (6) to possess certain desirable properties, which will be descnbed in detail further ahead
A. 2D Hadamard Transform
The 2D Hadamard transform, implemented as a ID operation is generated by the Kronecker product,
(Equation Removed)
Interestingly, it is possible to realize a scale-free reversible transform corresponding to equation (7), using only trivial lifting steps An implementation of this form is shown as the signal flow graph 800 in Figure 8 Corresponding C++ code eliminating some redundant operations is shown in Figure 9 In this code listing 900, "swap(x,y)" is a function that exchanges the values of its arguments
From the foregomg, it can be seen that the normalized reversible 2D Hadamard transform can be formulated usmg only trivial lifting steps, although this is not possible for the arguably "simpler" ID Hadamard case' Although the transform matrix itself is involutory (l e, 5H is its own inverse), a lossless reconstruction requires that the lifting steps be carefully reversed so as to precisely reproduce any rounding effects The inverse 1000 of the structure 800 in Figure 8 is presented m Figure 10 - the structure 1000 is identical to the forward transform in this case Note that the transform coefficients B and C are permuted in the signal flow graphs
The reversible, scale-free 2D transform 650 in the encoder 600 of Figure 6 uses
an approximation to the 4*4 DCT The following description demonstrates the entire
transform process of the transform 650 can be realized as the cascade of three elementary
2*2 transform operations, which are the 2x2 Hadamard transform, and me following
(Equation Removed)
where the two point rotation matrix TR is given by
(Equation Removed)
ID implementations of equation (8) are obtained by computing the Kronecker product of the pre and post transform matrices (approximated to four decimal places)
(Equation Removed)
The carat indicates the desired transform matrix The approximations resulting from actual implementations do not carry the carat For the 2*2 Hadamard transform, the desired transform matrix and its approximation are identical Therefore, JH is used to denote the 2*2 Hadamard transform implemented in ID, without any ambiguity Next, we look at the lifting implementation of Jadd and Jadd odd
B. Implementation of Jadd
A scale-free, lifting based implementation of the Jadd transform 1100 is shown as
a signal-flow graph m Figure 11, and in a C++ code program listing 1200 in Figure 12 It can be seen that the first and last lifting stages are identical to the Hadamard transform case Besides trivial shears, two non-tnvial lifting rotations are applied in the intermediate stages Each non-tnvial rotation is implemented m three steps, with a multiply by 3 and a bit shift by 3 or 4 bits Therefore, Jadd can be realized in a reversible, scale-free manner by using 6 non-tnvial lifting steps
The resulting ID transform matrix Jadd is shown below (12), which is in close
conformance with the original formulation of Jadd in (10) It can be seen that the second
and fourth rows of the resulting transform matrix sum to zero, which means that there is no leakage of DC into the AC bands This desirable property is achieved although the required 2D rotations are merely approximated in the structure
(Equation Removed)
Although the transform matrix Jadd is involutory (1 e , is its own inverse), rounding errors do not cancel out in two successive applications of the signal flow graph or code The lossless inverse of Jadd is derived by reversing the lifting steps, either in the signal flow graph or in the C++ code, to replicate forward transform side rounding errors The signal flow graph of the inverse 1300 of J^ is shown m Figure 13 - code can be derived likewise
C. Implementation of Jadd odd
The Jadd odd transform 1400 is composed of two rotations, neither of which is a Hadamard transform Interestingly, Jadd can be realized with fewer non-tnvial lifting steps than Jadd This is due to the symmetry properties of the Kronecker product of TR with itself The signal flow graph of the Jadd odd transform 1400 and program listing 1500 of its C++ code realization are shown in Figures 14 and 15, respectively
It can be seen that only one non-tnvial rotation, implemented by means of three non-tnvial lifting steps is required to realize Jadd odd This rotation corresponds to the scale-free ID 2-point Hadamard transform
(Equation Removed)
As with the other transforms considered here, Jadd odd as represented in equation (13) is involutory, though not a bit exact inverse of itself The lossless inverse 1600 of
Jadd odd is obtained by reversing the signal flow graph used for the forward transform, as shown in Figure 16
D. Notation For and Derivation of Above 2x2 Transform Implementations
In the depictions herein of the reversible, scale-free 2D transform using these three reversible scale-free transforms, the following points apply First, the ordering 1700 of the 2x2 data resulting in the above signal flow graphs and C++ code is as shown m Figure 17 The spatial domain points are shown on the left, and corresponding frequency domain points on the right Color coding using four gray levels to indicate the four data points is introduced here, to facilitate the reversible, scale-free 2D transform description that follows
Often, 2 point transforms or rotations are defined as the following operation
(Equation Removed)
instead of the involutory form
(Equation Removed)
These two forms are essentially identical, since they differ only m sign of the second transform coefficient The latter representation (IS) is used here, although the entire derivation in this document is equally applicable to the former form (14)
The structure of the basic 2x2 transforms, JH, ,Jadd and Jadd odd, defined above are constructed by noting each two-point transform is a rotation Further, the Kronecker product of two 2-point rotations is given as follows
(Equation Removed)
We then define an operator H as follows
(Equation Removed)
H represents a non-normalized double butterfly operation and can be efficiently implemented using lifting
The following factorization holds
(Equation Removed)
Based on this, a Kronecker product of the type 7" can be implemented as a cascade of 3 stages
A A double butterfly operation defined by H using lifting steps
B 2 point rotations between the first pair of components, and between the
second pair of components, and C The reverse of the double butterfly performed in step a. For the special case SH, an even simpler decomposition exists, which is shown as the signal flow graph 800 in Figure 8 and described above For the other cases (e g, Jadd and Jadd, the resulting structure can be generalized as the flow graph 2500 shown in
Figure 25
Looking at the three signal flow graphs of the transforms described above (and also their inverses), observe a fundamental similarity in their structure The first stage of the transform is a lifting operation between the a-d and b-c coefficients Likewise, the last stage is the inverse lifting process (discounting sign and coefficient exchanges) Correspondingly, the first stage of the inverse transform is a lifting operation between A and D, and also between B and C, with the reverse operation in the last stage The lifting steps between the diagonal elements is a distinguishing feature of the combined 2D 2*2 transform presented here
The next section discusses the construction of a lossless, scale-free transform, which approximates the 4x4 DCT/IDCT Although an exemplary embodiment of the transform is presented in this detailed technical discussion, the same procedure, with the additional definition of other 2x2 elementary reversible lifting based transforms, can be used to generate higher dimensional reversible transform embodiments with desirable properties
E. Lossless, Scale-Free Transform
The 4-point DCT can be reduced to a sequence of four butterfly operations as shown in the signal flow graph of Figure 3 The first stage consists of two butterflies performing 2-point Hadamard operations on the input data (l e, one 2-point Hadamard of mput data indices 0 and 3, and a second of input indices 1 and 2) The second stage comprises a 2 point Hadamard operation on the low pass results of the first stage to generate the even frequency components (indices 0 and 2), and a 2 pomt rotation by
n/. to generate the odd frequency components (indices 1 and 3)
In two dimensions, the DCT can be implemented separably a vertical ID 4-point DCT of each column of the 4x4 mput data, followed by a horizontal ID 4-point DCT of the rows (or vice versa) This is depicted as a separable DCT implementation 1800 in Figure 18 Alternatively, the two 1D DCT stages described above can be interleaved between the horizontal and vertical, using the theory of equation (S), as shown as an interleaved DCT implementation 1900 m Figure 19
Moreover, when the above approach is followed, the corresponding horizontal and vertical stages can be further combined For instance, the first stage is a 2 pomt Hadamard transform on the "inner" and "outer" input elements The horizontal and vertical stages may be merged into 4 applications of the 2x2 2D Hadamard transform on the 16 input data elements, each transform being applied to a symmetric set of mput points Likewise, the second stage horizontal and vertical steps can be coalesced mto a 2x2 Hadamard transform and three 2x2 transforms, two of which are transposes Observe mat the latter three 2x2 transforms are indeed 2D remappings of 50ii and 3^ > k, where x is the operand, r and k are constants determining rounding and bit shift k is either 2,3 or 4 Likewise, there are 17 single-place nght shifts (i e x » 1) per block Additions, subtractions and negations are not counted in this overview
In comparison, consider the separable implementation 1800 of the 2D DCT illustrated in Figure 18 Assume that each 4 point DCT is implemented using three 2
point normalized Hadamard operations as shown in Figure 3, and the rotation by nZ is
implemented using three nontnvial lifting steps The total number of nontnvial lifting operations per 4x4 block for either the forward or the inverse transform is 2x4x3 = 24 The total number of single-place right shifts is also 24 These numbers are about 50% higher than the improved forward transform 650 and inverse transform 750 implementations, not counting the fact that the resulting transform produces basis functions with norms in the range lA through 2 (or through 4 if irrational range basis functions are avoided) In contrast, all basis functions of the improved transform 650 are unit norm
F. Improved Transform for 4.2:0 Colorspace
In one example implementation of the encoder 600 (Figure 6) and decoder 700 (Figure 7), the YUV 4 2 0 colorspace is used to represent the color of pixels m an image (or video frame) In this example codec, a macroblock in the YUV 4 2 0 colorspace is defined as a 16* 16 tile of pixels in the luminance (Y) channel, and 8*8 tiles in the chrominance (U and V) channels These are further divided into 4*4 blocks that are transform coded using the above-descnbed transform 650 The 4*4 transform 650 is applied to the DC coefficients of the luminance channels However, only 2*2 samples of chrominance are available within a macroblock The example codec then applies 5H, which as described earlier is a reversible scale-free 2x2 Hadamard transform, to the DC chrominance values within each macroblock Thus, the macroblock structure of the example codec format is preserved, and no additional transforms need to be introduced in the codec for handling the 4 2 0 format
G. Minimizing Rounding Errors
Rounding errors are introduced in the lifting steps of the SH, J^ and 3oi&^M transforms that involve a right bit-shift These rounding errors have a known bias, and can build up over the course of the transform For instance, it is known diat a step of the
form x+ = (j>>l) leads to a bias of -½ in the value of x, compared to its mathematical
equivalent x = x+y/2. This is because (y >>1) is a division by 2 rounded down, which is
exact if y is even, and off by lA ify is odd Probabilistically therefore, it is biased by -½ Rounding errors are unavoidable in integer-to-integer transforms involving lifting, but it is desirable to minimize the bias in the overall system
The formulations of SH, Jadd and Jadd odd shown earlier as C++ code snippets add varying factors to operands being divided or right bit-shifted These factors are chosen so as to minimize the bias In particular, the bias in the four transform coefficients after the first stage operation of 3H (to an unbiased input) using the C++ code listing 900 in Figure 9 can be shown to be [V* -V* -V* -'/«] The second stage application of 3H in the improved 2D transform 650 (Figure 6) operates on the DC values of the first stage, l e on coefficients which are already biased to lA The result of the second stage operation produces a bias of [V* -XA -V* ~V*\ Since the first coefficient is the DC of DC, it is expected to be large and the relatively high bias of3/* does not affect codmg performance
The nontnvial lifting steps in J^ and S^^ provide freedom to choose rounding factors for minimizing transform bias The C++ code listing 1500 (Figure 15) for Jo*,.^ shows that sometimes off-center rounding rules (such as o+ = (3 6+5)» 3) lead to smaller overall bias, especially when the mput data is itself biased For the improved 2D transform steps J^ and 5