2200 MISSION COLLEGE BOULEVARD, SANTA CLARA, CALIFORNIA 95052, U.S.A.
Inventors
1. DORON ORENTSTIEN AND OTHERS
ALL ARE U.S.A.
2200 MISSION COLLEGE BOULEVARD, SANTA CLARA, CALIFORNIA 95052, U.S.A.
Specification
FORM 2
THE PATENTS ACT, 1970
[39 OF 1970]
COMPLETE SPECIFICATION [See Section 10; Rule 13]
"METHOD FOR MODIFYING ROOT GROWTH"
INTEL CORPORATION, a corporation incorporated in the State, of Delaware, of 2200 Mission College Boulevard, Santa Clara, California 95052, United States of America,
The following specification particularly describes the nature of the invention and the manner in which it is to be performed:-
the present invention relates to a methd for processing data.
Technical Field The present invention relates to graphics systems and, in
particular, to mechanisms for processing depth or z-daia for 3 dimensional (3D) graphics
in a manner that is transparent to the user.
Background Art. Available computer systems typically include dedicated graphics resources to support the graphics-intensive applications that are prevalent today. Graphics applications, particularly those providing 3D effects, require rapid access to large amounts of graphics data.
A standard method for generating a 3D image begins with sets of primitives that represent the surfaces of each object in the image. Primitives are typically polygons such as triangles or rectangles that may be tiled to form a surface. An object that has a moderately complex shape may require thousands of primitives to represent its surface, and an image that includes multiple objects may require tens or even hundreds of thousands of primitives. Depth, color, texture, illumination and orientation data for each of these primitives must be processed and converted to pixel level data to generate a 3D image on a display device.
Image processing is often Implemented through a 3D pipeline that includes a geometry or set-up phase and a rendering or scan-conversion phase. In the geometry phase, the orientation of each primitive and the location of any light sources that illuminate the primitive are determined with respect to a reference coordinate system and
specified by vectors associated with the primitive's vertices. This vertex data is then transformed to a viewing or camera coordinate system and rotated to a desired orientation.
In the scan conversion phase, the graphics primitives for each object in an image are converted into a single set of pixel values that provide a 2D representation of the 3D image. The pixels that make up the 2D image are typically stored in the entries of a frame buffer from which the display is generated. A well-known mechanism for populating the frame buffer generates color values for each location of a primitive by interpolating the transformed vertex data for the primitive. Since primitive locations are specified in 3D space, multiple primitive locations may map to the same frame buffer entry (pixel) of the 2D display surface. The generated color value for a primitive location is stored in the frame buffer entry to which it maps or discarded, according to whether or not it is visible in the final image. During this phase, texture data may also be determined for the primitives.
One technique for determining which locations of each primitive are visible in the final image employs a z-buffcr. The z-buffer includes an entry for each pixel in the frame buffer. Each z-buffer entry is initialized to zero or other reference value. Often, the reference value represents a back clipping plane of the image. During scan conversion, a z-value is determined for each location within the primitive and compared with the entry in the z-buffer to which the primitive location maps. If the value in the z-buffer is closer to the viewer than the z-value determined for the corresponding primitive location, the primitive location is not visible in the final image, and its color value is discarded. If the value in the z-buffer is further from the viewer than the z-value determined for the corresponding primitive location, the color value for the location is stored in the
appropriate entry of the frame buffer. If the color value is not replaced be Fore scan conversion completes, it is displayed in the final image.
Significant amounts of texture, color and z-data are transferred between memory and the graphics resources during the rendering stage. Since there may be tens to hundreds of pixels per primitive, these data transfers can place significant burdens on the bandwidth of the memory channel. The consequent reduction in memory bandwidth can reduce the performance of the graphics system. This is particularly true if the graphic system is implemented in a computer system that employs a unified memory architecture (UMA). For UNIA-based computer systems, the central processor unit(s) (CPU) and graphics engine have equal access to main memory. Memory demands by the graphics engine can reduce CPU performance. In addition, memory demands by one unit of the graphics engine can reduce the performance of other units. For example, any bandwidth used to transfer z-data for z-testing is unavailable to the unit that determines pixel textures, and the loss in bandwidth can reduce its performance.
The present invention addresses these and other issues associated with memory bandwidth in graphics systems.
According to the present invention there is provided a method for processing data comprising:
storing blocks of data in entries of a first memory location, each of the blocks of data being stored in one of a cleared, compressed, or uncompressed data states;
monitoring operations to the first memory location for a selected operation; and if the selected operation is detected, implementing a modified version of the selected operation that masks differences between the data states of the data blocks, wherein implementing the modified version of the selected operation includes saving a current processor state; reading one or more data blocks according to their data states in response to the selected operation; and writing the read data back into the entries of the first memory location in an uncompressed state before performing the selected operation on the data in the uncompressed state.
Brief Description of the Drawings
The present invention may be understood with reference to the following drawings, in which like elements are indicated by like numbers. These drawings are provided to illustrate selected embodiments of the present invention and are not intended to limit the scope of the invention.
Figs. 1 is a diagram representing one mapping between the locations of a primitive and blocks of pixel-level data.
Fig. 2 is a schematic representation of a graphics pipeline suitable for scan converting primitive data into pixel data.
Fig. 3 is a block diagram of one embodiment of a computer system that implements a z-compression mechanism in accordance with the present invention.
Fig. 4 is a block diagram of one embodiment of a z-compression system in which blocks of z-data and their associated status data are distributed between a local cache and a main memory.
Fig. 5A is a block diagram of one embodiment of a local cache system to store both z-data values and associated status values.
Fig. 5B is a block diagram representing a mechanism for updating the local cache system of Fig. 5 A on a TLB miss.
Fig, 6 A is a schematic representation of another embodiment of a local cache system to store both z-data values and associated status values.
Fig. 6B is a state machine representing the state changes for the entries of the local cache of Fig. 6 A.
Fig. 6C is a schematic representation of a mechanism for updating the local cache system of Fig. 6 A on a TLB miss.
Fig. 7 is one embodiment of a memory map that is suitable for storing status values for data blocks in a linear memory region.
Figs. 8A and 8B represent embodiments of 16-bit and 32-bit z-data formats that may be compressed using a mechanism in accordance with the present invention.
Fig. 9 represents one embodiment of compressed format for z-data that may be used by a system implementing the present invention.
Figs. 10A-10C are flowcharts representing embodiments of methods for implementing memory reads, memory writes, and status updates for blocks of z-data.
Fig. 11 is a flowchart representing one embodiment of a method for implementing z-compression transparently
Fig. 12 is a flowchart representing one embodiment of a method for implementing accesses to the z-buffer transparently
Figs. 13A-13C are flowcharts representing embodiments of different methods for clearing the z-buffer transparently.
Detailed Description of the Invention
The following discussion sets forth numerous specific details to provide a thorough understanding of the invention. However, those of ordinary skill in the art, having the benefit of this disclosure, will appreciate that the invention may be practiced without these specific details. In addition, various well-known methods, procedures, components, and circuits have not been described in detail in order to focus attention on the features of the present invention.
Fig. 1 is a schematic representation of a graphics primitive 100 and a subset of data blocks 110(a), 110(b) (generically, "data blocks 110") to which corresponding locations (x, y) in primitive 100 are mapped in a viewing coordinate system. Multiple graphics primitives 100 are used to approximate the surface of an object that is to be represented in an image. While graphics primitive 100 is shown as a triangle, it is well know that any type of polygon may be used to represent the surface of an object. Similarly, embodiments of the present invention are illustrated with reference to data blocks 1 10
comprising 4x4 arrays of pixels (spans), but other data block configurations may also be used.
Colors, texture coordinates, and depths (c, t, z) are associated with vertices 120(a), 120(b), 120(c) of primitive 100. Other attributes, such as fog and alpha (not shown) may also be assigned to vertices. These vertex properties are then interpolated to provide values for all primitive locations (x, y), which may be mapped to the pixels of data blocks 110. For the disclosed representation, data blocks 110(b) are spans for which all component pixels are mapped from locations within primitive 100. Data blocks 110(a) are spans for which pixel values are mapped from locations that straddle one or more boundaries of primitive 100. That is, not all pixels of data blocks I 10(a) correspond to locations within primitive 100.
Fig. 2 represents one embodiment of a graphics processing pipeline 200 to implement scan conversion. Z-data is read 210 from the entries of a z-buffer to which a given primitive maps. Vertex data for the primitive is interpolated 220 to generate, e.g. color, texture, and z data for each primitive location. For example, z-data for each location (x, y) of a primitive may be generated from the primitive's vertex data, using a surface function of the form z - Co + Cx»x + Cy«y, as discussed below. Color values and texture coordinates may be generated for each location during this stage as well.
In subsequent stages of pipeline 200, image-refining techniques, such as texture mapping, bump mapping, alpha-blending and the like, may be executed 230. A z-tcst 240 determines which locations of the primitive, if any, contribute their color values to the frame buffer, i.e. which portions of the primitive will be visible in the 2D image. If the z-value determined for a location passes the z-test, the appropriate entries in the frame and
z-buffers are updated with the color and z-values, respectively, of the primitive location. Otherwise, the values are discarded.
The transfer of graphics data between the graphics engine and memory locations in the frame and z-buffers, reduces the available memory bandwidth. For memory architectures like UMA, this reduction can have a detrimental effect on a computer system's overall performance. Various methods have been proposed for reducing the bandwidth impact of texture data transfers. The present invention provides a mechanism for reducing the impact of depth-buffering and its associated data transfers on system performance.
Fig. 3 is a block level diagram of one embodiment of a computer system 300 that implements z-compression in accordance with the present invention. Computer system 300 includes a processor core 310, a graphics core 320 and a memory system 330. Processor core 310 and graphics core 320 are coupled to a bus or memory channel 340 to transfer data to and from memory system 330. The dashed line indicates a boundary of an integrated circuit die 370 for an embodiment of computer system 300 in which processor core 310 and graphics core 320 are integrated on a single chip. This embodiment of computer system 300 is likely to implement a unified memory architecture (UMA), for which the features of the present invention may provide significant advantages. The present invention is not, however, limited to computer systems that employ integrated graphics and processor cores or UMA.
For the disclosed embodiment of computer system 300, memory system 330 is shown straddling a boundary of die 370 to indicate that it may include on-chip and off-chip components. For example, memory system 330 typically includes one or more caches located on circuit die 370 and a main memory that is located on a separate circuit
die. Memory system 330 further comprises a z-buffcr 350 and a z-status table (ZST) 360, portions of which may be distributed between on and off-chip memory structures (Fig. 4). As discussed below in greater detail, ZST 360 provides status information for associated entries in z-buffer 350. This status information may be used to reduce or eliminate data transfers on memory channel 340.
One embodiment of ZST 360 includes entries to track a current status for each block of z-data stored in z-buffer 350. The status indicates how the corresponding z-data block is stored and maybe used to manage the transfer of data between graphics core 320 and memory 330. The status may indicate, for example, whether z-data for a particular span is in a compressed format or an uncompressed format, or whether it has a reference value that may be provided from a local storage location, such as a register. Compressed z-data may be transferred with significantly lower impact on the bandwidth of memory channel 340 than uncompressed data. Further, z-dala that is available in, e.g., a local register, need not consume any memory bandwidth at all. One or more components of graphics core 320 use ZST 350 to manage z-data transfers more efficiently and with lower impact on the bandwidth of the memory channel.
For one embodiment of ZST 360, each entry stores a 2-bit status code to indicate a data status for a corresponding data block. Table 1 summarizes one set of 4-bit status codes that may be used.
Table I
For example, each image may be initialized with all entries of z-buffer 350 in a cleared state (00), The status values in ZST 360 may be adjusted as the initialized values in the z-buffer are updated during scan conversion. Depending on the status code, a z-buffer access may be executed normally, a compressed z-buffer access may be implemented or the z-buffer access may be avoided altogether. The last two options reduce the impact of z-data accesses on memory channel bandwidth.
In the following discussion, a block of z-data in which each z-value represents a constant reference depth is referred to as "cleared". This depth may correspond to a back clipping plane in the image space. Since this value is a constant, it may be stored in a register that is local to graphics core 320. When an access targets a span having a cleared data status (00), the cleared value can be read from the local register, eliminating the z-buffer access and preserving memory channel bandwidth. If an access targets a data block that is designated as compressed (10), the targeted z-data may be retrieved in a compressed format and decompressed for use. As discussed below, compression reduces the size of the data block transferred for, e.g., z-testing, which saves memory channel bandwidth, (fan access targets a data block that is designated as uncompressed (01), the access transfers an uncompressed block of z-data and no decompression is implemented.
Z-compression need not apply uniformly to all data stored in z-buffer 350. For example, a determination to write data to z-buffer 350 in compressed or uncompressed format may be made, in part, by reference to the relationship between the data block to be written and the primitive iocations that map to the data block. A data block I 10(b) that represents locations within the boundaries of primitive 100 can usually be compressed. As discussed below, exceptions may arise if the z-value also includes a stencil field or if
certain clipping or saturation conditions prevail. A data block 1 10(a) to which locations straddling a primitive boundary are mapped, is usually not compressed. Where compression is implemented through a surface function (Eq. I), the z-values for locations on different sides of the primitive's boundaries may be governed by different surface functions. This z-compression scheme can generate erroneous results if a location outside the primitive is compressed using a surface equation that is only suitable for locations within the primitive.
Fig. 4 represents one embodiment of a z-compression system 400 that may be used to implement the present invention. Compression system 400 includes a read/write unit 410, a local cache 430, a main memory 440, and a local register 490. Main memory 440 and local cache 430 represent, for example, off-chip and on-chip components, respectively, for one embodiment of memory system 330. Read/write unit 410 implements memory access requests that originate from various units of graphics core 320, according to the status information associated with the data block(s) targeted by the access. Local cache 430 includes local copies of the status and z-data blocks for processing memory accesses. Requests that cannot be satisfied from local cache 430 are satisfied from main memory 440.
For a memory read access, read/write unit 410 determines from the status of a targeted data block whether the data block is in a compressed, uncompressed, or cleared state, and retrieves the targeted data from local cache 430, main memory 440 or local register 490 through a transfer appropriate to the indicated status. For a memory write access, read/write unit 410 uses information on the targeted data to determine whether to store it in a compressed, uncompressed, or cleared state, and it updates an associated data status accordingly.
Also shown in Fig. 4 are a color calculation unit (CCU) 450 and an interpolation unit (ITU) 460 that may provide input to embodiments of z-compression system 400 to implement data accesses. For example, CCU 450 determines color values from vertex data, and indicates to read/write unit 410 whether a data block may be compressed. ITU 460 determines pixel level z-values from primitive vertex data and provides read/write unit 410 with parameters that may be used to compress/decompress data blocks.
Fig. 5 A is a block diagram showing one embodiment of local cache system 500 that stores both z-data and data status information for cached data block entries. Storing both z-data and data status for data blocks in the same cache allows memory accesses, the form of which depends on data status information, to be processed more efficiently.
The disclosed embodiment of local cache system 500 includes read/write unit 410, local cache 430 and a translation unit 510. Translation unit 510 includes a z-status cache (ZSTC) 520 and a z-translation-lookaside buffer (ZTLB) 530. ZTLB 530 stores logical-to-physical memory address translations for z-data. ZSTC 520 stores status information for the z-data to which ZTLB 530 points. For one embodiment of cache system 500, each entry of ZTLB 530 stores a translation for a page of physical memory allocated to the z-buffer and ZSTC 520 stores the status data for the z-entries stored on the page. As discussed below in greater detail, status information from ZSTC 520 is used to control the size of z-data reads and writes to main memory 440.
The disclosed embodiment of local cache 430 includes a tag array 564, a data array 568 and hit/miss unit 570. Each entry 560 includes a tag field (TAG) a status field (STATUS) stored in tag array 564, and a data field (DATA) stored in data array 568. TAG stores a logical address (or portion thereof) which may be used to implement look¬ups to local cache 430. STATUS stores status bits for the data block that is indexed by
TAG, and DATA stores the block of z-data values. The disclosed embodiment of read/write unit 410 includes a read unit 540 and a write unit 550. Main memory 440 and memory channel 340 are also shown in Fig. 5.
For one embodiment of system 500, a look-up of local cache 430 is triggered in response to a memory access. For example, hit/miss unit 570 compares a logical address (or portion thereof) specified by a read access with the tag fields of entries 560. If the access hits, the value in STATUS is provided to read unit 540, which determines an appropriate data retrieval flow. For compressed data (STATUS = 10) and uncompressed data (STATUS = 01), read unit 540 retrieves the data from the hit entry, using an appropriately sized transfer. Compressed (CMP) data is decompressed and forwarded to the requestor, which may be CCU 450 for the disclosed embodiment of system 500. Uncompressed (UNC) data is forwarded to the requestor without decompression. For cleared data (STATUS - 00), read unit 540 provides the cleared data to the requestor from local register 490.
For one embodiment local cache system hit/miss unit 570 considers STATUS & TAG to determine whether an access "hits" or "misses" in local cache 430. For example, an access targeting uncompressed data may hit wholly or partially in local cache 430 according to the following criteria:
Hit = Tag_Match & No_Blocking & [(UNC & QW_Match 1 CMP |
CLEAR]
Partial_Hit = Tag_Match.& No_Blocking & (UNC & IQW-mtatch)
Miss = !Tag_Match
Here, Tag-Match indicates whether the tag identifying the address to be accessed matches a tag in the cache, No_B!ocking indicates whether the another access stalls the current
access, and QVV^Match indicates which portion of a data is being sought in the cache line or data block identified by the tag. Q W_Match may be used for embodiments of local cache 430 that allow the QWs of a data block to be accessed separately. A partial hit occurs when a line is allocated for the tag in the cache (Tag_Match) but the particular quadword sought is not available in the cache.
If hit/miss unit 570 determines a read access missed in local cache 430, a look-up is initiated to translation unit 510. For the disclosed embodiment of replacement unit 510, entries of ZTLB 530 include logical-to-physical address translations that are indexed by a logical address tag field, and ZSTC 520 stores status bits for each data block tracked in ZTLB 530. If the look-up hits in translation unit 510, the status bits indicate the state of the data block(s) at the indicated physical address in main memory 440. If STATUS = cleared, the z-values of the "cleared" data block are provided from local register 490, and no traffic is generated on memory channel 340. If STATUS = compressed or uncompressed, the data block is retrieved from main memory 440 by executing a partial fetch or a full fetch, respectively, to the indicated physical address. Depending on the z-data format, e.g. 32-bit or 16-bit, the partial fetch uses Vz to % of the bandwidth used by the full fetch.
Fig. 5B represents a mechanism for updating translation unit 510 in the event that the look-up does not hit in ZTLB 530 ("TLB-miss"). For the disclosed embodiment, a graphics translation table (GTT) 574 is used to translate the, e.g., 4Kbyte pages of an Advanced Graphics Port (AGP) memory to physical addresses. GTT 574 includes entries for Z-buffer 350 and for ZST 360. A ZSTC Pointer Table (ZPT) 578 stores pointers to locations in ZST 360. That is, ZPT 578 operates like a TLB for ZST 360.
On an initial TLB-miss, GTT 574 provides the missed TLB translation to ZTLB 530. Pointers from GTT 574 are also read into ZPT 57S, and the pointer associated with the missed TLB entry is used to retrieve the corresponding status data from ZST 360. The updated translation is used to retrieve the targeted data block in main memory 440 according to the updated status data. Data array 568 and tag array 564 arc updated with the retrieved z-data and its status, respectively. In general, ZSTC 520 is updated whenever the status of a data block is changed. When an entry in ZTLB 530 is replaced, the corresponding entry in ZSTC 520 is written back to memory.
Fig. 6A is a block diagram of an embodiment of a local cache system 600 that includes a physicaily addressed local cache 430. The disclosed embodiment of local cache 430 includes translation unit 610, tag array 620, data array 624, hit/miss unit 630, replacement unit 634, and output selection unit 638. Read/write unit 4 10 moves data in and out of local cache 430 and register 490. CMPRS and DCMPRS compress and decompress data, respectively, for transfer to and from data array 624. Data array 624 stores data blocks that are indexed by physical addresses (or portions thereof), which are stored in tag array 620. The data block targeted by a memory access is specified through a iogical address, such as the primitive or span coordinates (x, y) of the data block.
Translation unit 610 provides iogical to physical address translations that allow local cache 430 to be searched for data targeted by a memory access. Translation unit 610 includes a ZSTC 614 and a ZTLB 618, which provide functions similar to those provided by ZSTC 520 and ZTLB 530. When an address hits in ZTLB 618, the hit entry provides the physical address to which the logical address is mapped and a corresponding entry of ZSTC 614 provides a status for the data. Embodiments of cache system 600 may update
tag array with status data for the entry. Hit/miss unit 630 compares the physical address with the entries in tag array 620.
For accesses that miss in tag array 620, replacement unit 634 determines which of the current entries will be allocated to receive the data returned from a higher memory ~" structure. For accesses that hit in tag array 620, output selection unit 638 indicates the hit entry to data array 624 and the status information from tag array 620 determines how the data is retrieved. For an embodiment that stores one span per cache line, if the targeted data is compressed, half a cache line is retrieved from data array 660, decompressed, and forwarded to the requestor. If the targeted data is uncompressed, a full cache line is retrieved from data array 660 and forwarded to the requestor without decompression. If the targeted data is cleared, the data is retrieved from local register 490 and forwarded to the requestor.
Read/write unit 410 includes a MUX 644 in read unit 640 to provide data responsive to its associated status. A MUX 648 in write unit 650 provides similar support for data being written to local cache 430. Status information is coupled to read/write unit 410 to indicate how the data being transferred should be handled.
Fig. 6B shows one embodiment of a state machine representing the state changes possible for an entry of data array 624. Before the entry is allocated, it is in an invalid state 654. When the entry is allocated to a data block, its status is updated to cleared (CLR 658), compressed (CMP 660), uncompressed (UNC 664) or uncompressed_all (UNC_A 668), according to the status of the data block to which it is allocated. For the disclosed embodiment, the status is indicated by a corresponding entry in ZSTC 614. For caches that allow less then a full line of data to be loaded, UNC and UNC_A distinguish between cache lines that are partially and fully populated, respectively, with data. Entries that store
data blocks in CLR 658 or CMP 660 may transition to UNC-A 668 when if the previously cleared or compressed data blocks are written back to the cache in an uncompressed state. Since only full data blocks may be CLR 658 or CMP 660, no transition is provided between these states and UNC 664.
If a data block that has been altered is evicted from local cache 430, the status of the evicted data block indicates the operations to be implemented. For example, if the data block in state CLR 658 is evicted, nothing is written back to a higher memory level, and the status of the entry from which the data block is evicted is updated to indicate the status of the new data block. If a data block in state CMP 660 is evicted, the portion of the cache line storing the compressed data, e.g. half the cache line, is written back to memory. If a data block in state UNC 664 is evicted, the altered bytes are written back to memory, and if a data block in state UNC_A is evicted the full cache fine is wrinen back to memory. In each case, the entry state is updated in the ZSTC to track the status of the evicted data block. When an entry is evicted from ZTLB 618, the corresponding bits of ZSTC 614 are written back to ZST 360.Fig. 6C represents a mechanism for updating translation unit 610 if a logical address misses in ZTLB 618. For the disclosed embodiment, a miss triggers a read to a page table 690, which provides a TLB entry indicated by the access. ZTLB 618 is updated with the new TLB entry and ZSTC 614 is updated with the status data for all data blocks on the page. The targeted cache line is loaded into data array 624 from the indicated address in Z-buffer 350, and the state in tag array 620 is updated to a reflect the (storage) state of the data block.
Figs. 5A, 5B and 6A-6C illustrate various features of two embodiments of a cache system that is suitable for handling mixed status and data information. For both embodiments, the data arrays may store data in compressed and uncompressed formats (or
not at all, in the case of cleared entries). Cache management logic moves the data in and out of the data array according to its associated status, and controls operations to an external memory according to the status. The TLBs provide access to both the z-data blocks and their associated status. Status is stored per data block in a tag array and per memory page in the translation unit (ZSTC). While these cache systems have been described as part of a z-compression mechanism, they may also be employed in other systems that need to track different types of data.
The various functional units of graphics core 320 operate on data in its UNC (or UNC_A) format. Accordingly, a data block may be written back to local cache 430 or memory 440 in a compressed (or recompressed) or an uncompressed state, or the write may be avoided altogether if the data status is cleared. For one embodiment of local cache system 500, write unit 550 handles the write access according to a status determined for the data block targeted by the write. Different criteria for whether or not a data block may be compressed or represented by a cleared value are discussed below in greater detail.
Fig. 7 is a block diagram representing one embodiment of a memory map 700 to store status data in a linear portion of memory 330. For map 700, each status entry, Sxy, is associated with a data block that represents z-data for a 4x4 array of pixels (span). Here, Sx-y, represents the status data for the data block at a span address (x, y). One possible representation of the span address for 16 and 32-bit Z-modes is indicated at the bottom of
Fig. 7.
For this data block definition, a 2048 x 1024 pixel frame buffer may be represented by 512 x 256 grid of spans. Memory map 700 organizes the spans into groups of 512 bytes each, and each byte stores status bits for a column of 4 spans. When an access
targeting a data block at span coordinates (x, y) misses in local cache 430, its status bits,
Sx.y, may be accessed at bits Y[I:0] of byte(Byte_Index) at the memory address:
Status_Bits_Base_Address + PageY-512 + PageX.Entry_Size,
where
Entry_Size - 16-bit Z-mode? 32:16
PageX = 16-bit Z-mode? X[8:5] : X[8:4]
PageY = Y[8:2]
Bytejndex = 16-bit Z-mode? X[4:0] : X(3:0]
One factor that complicates z-compression is that compression may not be
desirable or feasible for certain spans. ZST 360 provides a convenient tracking
mechanism for determining whether a span to be read is stored in a compressed state and
whether a span to be written may be compressed before it is written. As noted above,
various criteria may be applied to determine whether to compress a particular span for
storage in the memory system. These criteria include, for example, whether the span falls
fully within the primitive, i.e. whether all pixels of a span are written by the particular
operation, and, if the span includes a stencil value, whether all spans in the primitive have
the same stencil value. For example, if a span is in a cleared state, the cleared value is a
constant for all pixels in the frame and may be stored in a more readily accessible register.
The other criteria may be better appreciated in view of the different uncompressed and
compressed formats in which the data blocks may be stored.
Figs. 8A and 8B are block diagrams representing uncompressed formats 810 and 850 for 16-bit and 32-bit z-data, respectively, when it is stored as spans. For 16-bit format 810, each row corresponds to one quad word (QW) of data (4x16 bits), and for 32-bit format 850, each row corresponds to one double quad word (DQW; of data (4 x 32 bits).
The z-values of the 16 pixels in the span are labeled Zo0 - Z3 3. For one embodiment of 32-bit format 850, each 32-bit value may include a 24-bit z-value and an 8-bit stencil value. Stencil values are used to indicate a portion of the screen for which drawing updates are not necessary. For example, a pixel that is obscured by a window border may include a stencil value that is to be written instead of the pixel value. For one embodiment of the invention, a span whose pixels are associated with different stencil values may not be compressed.
Fig. 9 represents one embodiment of a compressed data block 900, which may be generated from uncompressed formats 810, 850 or other uncompressed formats. Compressed data block 900 is one DQW, which is 50% of the size of 16-bit format 810 and 25% of the size of 32-bit format 950. The disclosed embodiment of compressed format 900 may be generated through a lossless compression method. One method is based on a functional representation of the z-values for a primitive such as that used by interpolator 460 to determine z-values for primitive locations from vertex values.
For one compression method, z-values for a given primitive are represented as:
Documents
Application Documents
#
Name
Date
1
53-mumnp-2003-power of authority(09-01-2001).pdf
2001-01-09
1
abstract1.jpg
2018-08-08
2
53-mumnp-2003-form-pct-isa-210(09-01-2003).pdf
2003-01-09
2
53-mumnp-2003-correspondence(27-03-2006).pdf
2006-03-27
3
53-mumnp-2003-form-pct-ipea-409(09-01-2003).pdf
2003-01-09
3
53-mumnp-2003-form 5(30-12-2005).pdf
2005-12-30
4
53-mumnp-2003-petition under rule 137(02-08-2005).pdf
2005-08-02
4
53-mumnp-2003-form 5(09-01-2003).pdf
2003-01-09
5
53-mumnp-2003-petition under rule 138(02-08-2005).pdf