Abstract: The present invention provides a computer system for providing cache memory management, the computer system comprising: a main memory having a plurality of main memory addresses each having a corresponding data entry; a processor coupled to said main memory; and at least one cache memory coupled to said processor, said at least one cache memory having a cache directory with a plurality of addresses and a cache controller having a plurality of data entries corresponding to said plurality of addresses, each data entry having a non-temporal locality, said processor receiving a non-temporal instruction having an operand address, determining if said operand address matches one of said plurality of addresses in said cache directory, if so, said processor updating a cache entry in said cache controller corresponding to said matched address, otherwise updating a memory entry corresponding to said operand address in said main memory and not performing a cache fill; wherein, a comparator is coupled to said processor, said comparator comparing the operand address with said plurality of addresses in said cache directory and providing an output signal indicative of a status of the comparison operation.
BACKGROUND OF THE INVENTION
1 Field of the Invention
The pzesent invention relates in general to the field of computer systems, and
in particular, to an apparatus and method for providing instructions which facilitate
5 the control of cache accesses while minimizing cache pollution..
2.. Descsi~tiono f the Related Art
The use of a cache memoIy with a pIocessoI facilitates the ~eductiono f
memory access time. The fundamental idea of cache organization is that by
keeping the most frequently accessed instructions and data in the fast cache
10 mernoxy, the average memory access time will approach the access time of the
cache.. To achieve the optimal tradeoffs between cache size and performance,
typical processors implement a cache hierarchy, that is, different levels of cache
memoly.. The different levels of cache correspond to different distances from the
processor coxe.. The closer the cache is to the psocessor, the faster the data access..
15 However, the closer the cache is to the prcjcessor, the more costiy it is to implement.
As a result, the closer the cache level, the faster and smaller the cache.
The performance of cache memosy is frequently measured in terms of its hit
ratio. When the processor refers to memory and finds the data in its cache, it is said
to produce a hit.. If the data is not fomd in cache, then it is in main memory and is
counted as a miss.. If a miss occurs, then an allocation is made at the entry indexed
by the address of the access. The access can be for loading data to the pIocessor or
storing data from the processor to memory. The cached information is retained by
the cache memory untiI it is no longer needed, made invalid or replaced by other
data, in which instances the cache entry is de-allocated.
A computer system may utilize one or more levels of cache memory.
Allocation and de-allocation schemes i-nplemented for the cache fox various known
computer systems are generally simila~in practice. General practice has been to
allocate an entry in the cache for all accesses required by the processor.. Accordingly,
5 system aschitectu~ess pecify re-use of accessed data without notion of relevant cache
hierarchy level. lhat is, all accesses are allocated in cache.. A disadvantage of this
approach is that in certain instances, certain data is referenced once and not reused
in the immediate future It is not desirable for. such data, typically termed as nontemporal
data, to overwrite data that is used frequently (i..e..,t emporal data) such as
10 application-cached code and data, when cache memory is full.
0RIF.F SUMMARY OF THE INVENTION
A computer system and method for providing cache memory management is
disclosed. The computer system comprises a main memory having a - plurality of
main memory addresses each having a co~respondingd ata entry, and a processo1
5 coupled to the main memory.. At least one cache memory is coupled to the
processor.. The at least one cache memory has a cache directory with a plu~alityo f
addresses and a cache controller having a plurality of data entries corresponding to
the plu~alityo f addresses. The pIocesso1 receives an instruction having an operand
address and determines if the operand address matches one of the plurality of
10 addresses in the cache directory.. If so, the processor updates a data entry in the cache
controller corresponding to the matched address. Otherwise, a data entry
cor~espondingto the opeland address in the main memory is updated.
RW.F DESClUPTION OF THE DRAWINGS
The invention is illustrated by way of example, and not limitation, in the
figures. Like reference indicate - similar elements
Figure I illustrates an exempla~yc omputer system in accordance with one
5 embodiment of the invention.
Figure 2 illustrates one embodiment of the organizational structure of a cache
memory e.g., LO, L1 and/or L2 according to one embodiment of the invention
Figure 3 illustr.ates one embodiment of the format of the cache pollution
avoidance instruction 165 provided according to one embodiment of the invention..
10 Figures 4A and 48 illust~ateth e operation of the cache pollution avoidance
instruction according to one embodiment of the invention.
Figures 5A and 5B illustrate the operation of the cache pollution avoidance
instruction according to an alternate embodiment of the invention.
Figure 6 is a flowchart illustrating one embodiment oi the process of the
15 present invention.
DF.TAILET) DESCRIPTION OF THE INVENTION
In the following description, numeIous specific details are set forth to provide
a thoxough understanding of the invention. However, it is undexstood - that the
invention may be practiced without these specific details.. In other instances, well-
5 known circuits, structures and techniques have not been shown in detail in orcder
not to obscuxe the invention..
Figu~e1 illustrates one embodiment of a computer system 100 which
implements the pIinciples of the present invention. Computes system 100
comprises a processor 105, a storage device 110, and a bus 115.. The processor 105 is
10 coupled to the storage device 110 by the bus 115. The sto~aged evice 110 represents
one or more mechanisms for storing data.. For example, the storage device 110 may ~ include read only memory (ROM), random access memory (RAM), magnetic disk ~ storage mediums, optical storage mediums, flash memory devices and/or other
~ machine readable mediums. In addition, a number of user input/output devices,
i 15 such as a keyboard 120 and a display 125, axe also coupled to the bus 115. Tlte
i I~ processor 105 represents a central processing unit of any type of architecture, such as ~~~ m10u5l tcio-tuhlrde abde eidm CplIeSmC,e RntIeSdC ,o Vn LoTnWe, oor.r m hoyrber icdh aiprcs.h iTtehcet usrteo.r aIgne a ddedviitcioen 1,1 t0h ere pprroecseesnstosr
I one or more mechanisms for storing data. For example, the storage device 110 may
1
20 include read only memory (ROM), random access memory (RAM), magnetic disk ~ storage mediums, optical stoxage mediums, flash memoly devices, and/or other
machine-readable mediums. The bus 115 represents one or more buses (e..g., AGP,
1 PCI, ISA, X-Bus, VESA, etc..) and bridges (also termed as bus controllers). While this
I
embodiment is described in relation to a single processor computer system, the
25 invention could be implemented in a multi-process01 computer system..
In addition to other devices, one ox more of a network 130, a TV broadcast
signal receive 131, a fax/modem 132, a digitizing unit 133, a sound unit 134, and a
graphics unit 135 may optionally be coupled to bus 115 The network 130 and fax
modem 132 represent one or more network connections for kansmitting data over a -
5 machine readable media (e..g, car~ierw aves). The digitizing unit 133 represents one
or more devices for digitizing images (i e , a scanner, camera, etc .). The sound unit
134 represents one 01. more devices for inputting and/or outputting sound (e.g..,
microphones, speakers, magnetic main memories, etc ). The graphics unit 135
represents one or more devices for generating 3-D images (e.g., graphics card).. Figure
10 1 also illustrates that the storage device 110 has stored therein data 136 and software
137. Data 136 represents data stored in one or more of the formats described herein.
Software 137 represents the necessaiy code for performing any and/or all of the
techniques described with reference to Figu~es2, and 4-6. Of course, the storage
device 110 preferably contains additional software (not shown), which is not
15 necessary to understanding the invention.
Figure 1 additionally illustrates that the processor 105 includes decode unit
140, a set of registers 141, and execution unit 142, and an internal bus 143 for
executing instructions. The processor 105 further includes two internal cache
memories, a level 0 (LO) cache memory which is coupled to the execution unit 142,
20 and a level 1 (L1) cache memory, which is coupled to the LO cache.. An external
cache memory, i..e., a level 2 (L2) cache memory 172, is coupled to bus 115 via a cache
controller 170.. The actual placement of the va~iousc ache memories is a design
choice orn may be dictated by the processor architecture.. Thus, it is appreciated that
the L1 cache could be placed external to the pIocessor 105 In alternate
25 embodiments, more or less levels of cache (other than L1 and L2) may be
implemented.
Of course, the processor 105 contains additional circuitry, which is not
necessary to understanding the invention. The decode unit 140, registels 141 and
execution unit 142 are coupled togethe1 by inte~nabl us 143- The decode unit 140 is
us-e d for decoding instructions received by processor 105 into conhol signaIs and/o~
5 micro code entry points. In response to these conhol signals and/or, micro code
entry points, the execution unit 142 pe~fonnsth e appxoptiate ope~ation.s 'Lhe
decode unit 140 may be implemented using any number of diffe~entm echanisms
I
(e..g , a look-up table, a hardwase implementation, a PLA, etc..). While the decoding
of the various instructions is represented he~einb y a series of if/then statements, it
10 is undexstood that the execution of an insk~uction does not require a serial
processing of these if/then statements. rat he^; any mechanism for logically
performing this if/then processing is considered to be within the scope of the
implementation of the invention..
The decode unit 140 is shown including a fetching unit 150 which fetches
15 insftuctions, and an instruction set 160 for performing operations on data. In one
embodiment, the instruction set 160 includes a cache pollution avoidance ..
instruction 165 provided in accordance with the present invention. Examples of the
cache pollution avoidance instruction 165 includes the following instructions: a byte
mask write instruction(s) (such as MASKMOVQ), a move 64 bits non temporal
20 instruction(s) (such as MOWQ), and a move aligned four packed single-floating
point (FP) non tempo~ailn struction(s) (such as MOVNTPS). A byte mask
instruction typically performs a write using a mask pattern- A mask pattern
indicates the bit position at which a write can be performed. In one embodiment, a
'0' coxresponds to no wxite (mask) and a '1' corxesponds to a write (unmask).
25 Therefore, a mask pattern of OOlOOlOO allows writing to the bit positions 2 and 5
while other bit positions are masked off.. The MASKMOVQ and MOVNTQ
instructions ale applicable to integer data. In pa~ticulxt,h e MASKMOVQ
instruction moves 64bits ~epresentingin teger data from a register to the memory
location specified and the MOVNTQ instruction moves 64-bits representing integer
op-e rands from a register to memory.. f i e MOVNTPS instruction is applicable to
5 packed floating point data, in which the results of an operation between two sets of
numbers having a pxedetennined number of bits, are stored in a ~egisterh aving the
same p~edeterminedn urnbe~o f bits, i e.., the size or configuration of the ope~andis
the same as that of the result register While certain cache pollution accordance
instruction, are described fox use with particular type and amount of data, alte~nate
10 embodiment, can support inst~uctionst hat operate on different types and amounts
of data..
In addition to the cache pollution avoidance inst~uction(s1) 65, processor 105
can include new instzuctions and/or instructions similar to or the same as those
found in existing general puIpose processors.. For example, in one embodiment the
15 processor 105 supports an instruction set which is compatible with the hte@
Architecture instruction set used by existing processozs, such as the ~entium@II
processor. Alte~nativee mbodiments of the invention may contain more or less, as
well as different instructions and still utilize the teachings of the invention.
The xegisters 141. represent a storage axe on processor 105 for storing
20 information, such as control/status information, scalar and/or packed integer data,
floating point data, etc.. It is understood that one aspect of the invention is the
described inst~uctions et. According to this aspect of the invention, the storage area
used for storing the data is not critical.. The term data processing system is used
herein to refer to any machine for processing data, including the computer
25 sys-t ems(s) desc~ibedw ith reference to Figure 1..
Figure 2 illustrates one embodiment of the organizational structure of a cache
memory e g., LO, L1 and/or L2 For present discussion purposes, the cache memory
LO, L1 and/or L2 will be ~eferredto as cache memory 200 Cache memory 200
co-m prises a cache di~ectory21 0 and a plurality of cache entries 2201-220n
5 (collectively referred to as cache entries 220) I'he cache directo~y2 10 stores a
plurality of addresses 2121-212n of the memory locations in storage device 110,
corresponding to the cache entries 2201-220n These memory locations in storage
device 110 have addresses 2301-23h.
Figu~e3 illustrates one embodiment of the format of the cache pollution
10 avoidance instruction 165 provided in accordance with the present invention.. T'he
cache pollution avoidance instruction 165 comprises an ope~ational code (OP CODE)
310 which identifies the operation of the cache pollution avoidance instruction 165
and an operand 312 which specifies the address of the data object that the instruction
165 will be operating on. In order to understand the present invention, certain
15 terminology must be understood in reference to the cache pollution avoidance
technique.. The present invention operates within the framework wherein.. the
particular data being accessed will have (or not have) temporal locality.. Temporal
locality is an attribute associated with data and determined by how soon in the
future a program will access the data.. Thus, for each data pattern, the data may have
20 tempo~alo cality (T) or not have temporal locality (NT)w ith respect to a cache
memory or a cache level in the cache hierarchy.
In Figure 1, three levels of cache hierarchy are shown (representing levels LO,
i L1 and L2), wherein at each level there are two possibilities for classifying the data
!
paitern being accessed. ?he classifications are noted as 1) temporal (T); and 2) non-
25 temporal (NT). The classifications ale based on the attributes associated with the
data access for a computer system It is appreciated that three levels of cache
hierarchy are shown in Figure 1, but there could be more or less cache levels. For
example, the present invention could be p~acticedw he~eth ere is only one cache
lev-e l (LO only) or where there are only two cache levels (LO and LI), or where there
5 are four or more cache levels
In the practice of the invention, the temporal property is associated with how
close to the processor 105 the data is stored or saved. Accordingly, the temporal
property is associated with the use or. re-use of data at a given level. For example, if
a particular data pattern in the program is identified to be T with respect to L1, but
10 NT with respect to L2, then this data will be used in the near future in the L1 cache,
but not so near in the future in the L2 cache. The temporal distance of how soon the
data will be used or reused is application dependent for a particular computer
system and software.. When data access is regarded as T at a given cache level, it will
likely be re-used (assuming no branches taken, etc ) within a certain time frame (for
15 example, within x number of instructions) in the near future. Where data access is
regarded as NT at a given level, it will likely not be re-used within the specified time
frame.
Thus, the noted temporal criteria can be set based on the action required of
the data.. Subsequently, the present invention provides for a scheme in which
20 cache pollution avoidance can be implemented at the cache memory or at each
cache hierarchy level depending on the attributes associated with the data at a
particular level.. In particdas, by using the cache pollution avoidance instruction
165 to process data, the programmermis indicating that the associated data does not
have temporal locality, i-e.., the associated data is non-temporal.. The associated data
25 is then processed according to the cache pollution avoidance instruction 165 used
Thus, cache pollution avoidance can be based on none, one, or more than one, of
the categories available at each of the cache levels.
In Figu~e1, only LO, LI and L2 levels are shown, but it is appreciated that
m51e or less levels (e..g , one level) can be readily implemented The embodiment
5 shown in Figures 4-6 describes the use of the invention with respect to one cache
level
The cache pollution avoidance technique of the invention will now be
desctibed. Upon receiving the cache pollution avoidance instruction 165, the cache
directory 210 of cache memory 200 is first polled to determine if the address specified
10 in the operand 312 matches one of the addresses stored in the cache directory 210. If
so, there is a cache hit, and the corxesponding data object (i e , contents of the cache
memory conesponding to the addresses specified in the operand) is updated. In one
embodiment where different levels of cache memories are used, for example, LO, Ll,
L2, any one cache level or' any combination of the cache memories are updated, In
15 an alternate embodiment, both the corresponding data object in cache memory and
in main memory are updated..
If the addxess specified in the operand 312 does not match any of the addresses
I stored in the cache directory 210, the~eis a cache miss. In this case, there is no cache
I
I line fill, i-e , the cor~espondingd ata object in storage device 110 will not be
I
20 transferred into the cache memory 210. By not performing the cache line fill, the
I
I processor 105 avoids polluting the cache
Figures 4A and 4B illustrate the operation of the cache pollution avoidance
instruction according to one embodiment of the invention, in which there is a cache
hit. Upon receiving the cache pollution avoidance instruction 165', the cache
I directory 210 of cache memoxy 200 is polled to determine if the address, e.g , address
i I X, specified in the operand 312' matches one of the address stored in the cache
directory 210.. If the cache memory 200 is an inte~nacl ache memory, i-e..,a LO or. Ll
ca-c he memory, the processor 105 conducts the polling.. If the cache memory 200 is
5 an external cache memory, e.g,, L2 cache memory, polling is conducted by the cache
controller.. In one embodiment, a comparator 410 compares the address X specified
in the operand 312' to determine if it matches one of the addresses 2121-212n stored
in cache directory 210 (Figure 4A). In the present case, there is an address match,
indicating that there is a cache hit, and the comparator 400 issues a first signal to the
10 cache line fill logic 420, which determines if a signal should be sent to either the data
object 2124 corresponding to the address X in cache directory 210 or to both the data
object 2124 and to the corresponding data object in 1104 in storage device 110 to
update the data object 2124 and/or 1104.. The required signal(s) are then issued and
updating of the co~respondingd ata object 2124 and/or 1104 is ~erfoxmed( Figure 4B)
15 Figures 5A and 5B illustrate the operation of the cache pollution avoidance
instruction according to an alternate embodiment of the invention in which ,there is
cache miss. Upon receiving the cache pollution avoidance instruction 165", the
cache directory 210 of cache memory 200 is polled to determine if the address, e..g.,
address Y, specified in the operand 312" matches one of the address sto~edin the
20 cache directo~y2 10. In particular, the comparator 410 compares the address Y
specified in the operand 312" to determine if it matches one of the addresses 2221-
212n stored in cache directory 210 (Figure 5A). In this case, the address Y specified in
1 the ope~and3 12" does not match any one of the addresses 2121-212n stored in the
1 cache directory 210. As a ~esultt,h e comparator 410 issues a second signal to the
25 cache line fill logic 420, indicating that these is a cache miss In Iesponse, the cache
line fill logic issues only a signal to the storage device 110 to update the contents of
address Y. how eve^, a cache line fill is not perfo~med, i e., the co~respondingd ata
object 1105 in storage device 110 will not be transferzed into the cache memory 210.
I The cache memory 210 therefore remains intact and is not polluted by the non-
1 temporal data..
I
5 In one embodiment, the cache pollution avoidance instruction 165 is a
weakly ordered instruction. That is, when processing a plurality of the cache
pollution avoidance instructions 165, the execution of the plurality of cache
pollution avoidance inst~uctionsa le not performed in the same ozder as they are
fetched by the fetching unit in the decoder In an alternate embodiment the cache
I 10 pollution avoidance instruction 165 has the p~opetry of write collapsing, whereby
when executing successive stores to the same add~esso, nly the last store will be
visible on the bus 115 In a further alternate embodiment, the cache pollution ~~ avoidance instruction 165 has the property of write combining, where cacheable
store misses to the same cache line are combined before the write is issued to storage
15 device 110..
Figure 6 is a flowchart illustrating one embodiment of the process of the
present invention.. Beginning from a start state, the process 600 proceeds to a
decision block 610, whexe it determines if the address in the cache pollution
avoidance instruction 165 matches one of the addresses in the cache directory 210.. If
20 so, the process 600 advances to decision block 612, where it determines ii only the
conesponding cache entry is to be updated or if both the corresponding cache entry
and the co~~espondinengt ry in storage device 110 have to be updated. If only the
cor~espondingc ache entry is to be updated, the process 600 proceeds to process block
614, where it updates the corresponding entry in cache memory (block 614).
25 Otherwise, it updates both the corresponding cache entry and the co~responding
-p 14-
entxy in sto~aged evice 110- In either case, the process 600 proceeds to plocess block
618, where the process terminates If it is determined, at decision block 610, that the
address in the cache pollution avoidance instruction 165 does not match one of the
addresses in the cache directory 210, the process S600 does not perform a cache line
5 fill, and instead, only updates the corresponding entry in sto~aged evice 110 (process
block 620)
The use of the present invention thus enhances system performance by not
performing a cache line fill du~ingno n-temporal accesses. In pa~ticula~si,n ce no
cache line fill is performed during non-temporal access, system performance is
10 improved by reducing contention for the cache, as frequently used data (i.e.,
temporal data) is not evicted from the cache by non-temporal data. As a result,
cache pollution 01.d istulbance during non-temporal accesses is minimized.
While a preferred embodiment has been described, it is to understood that the
invention is not limited to such use. Xn addition, while the invention has been
desc~ibedin terms of several embodiments, those skilled in the art will recognize
that the invention is not limited to the embodiments described The method and
apparatus of the invention can be practiced with modification and alteration within
the spirit and scope of the appended claims. 'The description is thus to be regarded as
illustxative instead of limiting on the invention
We Claim :
1. A processor comprising:
a decode unit to decode a non-temporal instruction, wherein the non-temporal
instruction has an opcode to identify the non-temporal instruction and an operand to specify a
memory address of a data object on which the non-temporal instruction is to operate, and
wherein use of the non-temporal instruction is to indicate that the data object is non-temporal;
and
an execution unit to execute the non-temporal instruction, wherein execution of the nontemporal
instruction includes accessing the data object at the memory address.
2. A system comprising:
random access memory;
non-volatile memory;
a display;
a network connector;
a camera;
a microphone;
a speaker;
a graphics unit; and
a processor including
a decode unit to decode a non-temporal instruction, wherein the non-temporal
instruction has an opcode to identify the non-temporal instruction and an operand to specify a
memory address of a data object on which the non-temporal instruction is to operate, and
wherein use of the non-temporal instruction is to indicate that the data object is non-temporal;
and
an execution unit to execute the non-temporal instruction, wherein execution of the nontemporal
instruction includes accessing the data object at the memory address.
3. The system of claim 2, wherein the non-volatile memory includes flash memory.
Dated this the 3" day of February 2014 h [V. MANOJ]
Of SUBRAMANIAM & ASSOCIATES
Attorney for the Applicants
| # | Name | Date |
|---|---|---|
| 1 | 315-DEL-2014-AbandonedLetter.pdf | 2021-10-17 |
| 1 | 315-del-2014-Form-18-(10-02-2014).pdf | 2014-02-10 |
| 2 | 315-DEL-2014-FER.pdf | 2019-07-31 |
| 2 | 315-del-2014-Correspondence-Others-(10-02-2014).pdf | 2014-02-10 |
| 3 | 315-del-2014-Form-5.pdf | 2014-06-26 |
| 3 | 315-del-2014-Correspondence Others-(12-06-2015).pdf | 2015-06-12 |
| 4 | 315-del-2014-GPA-(12-06-2015).pdf | 2015-06-12 |
| 4 | 315-del-2014-Form-3.pdf | 2014-06-26 |
| 5 | 315-del-2014-Form-2.pdf | 2014-06-26 |
| 5 | 315-del-2014-Abstract.pdf | 2014-06-26 |
| 6 | 315-del-2014-Form-1.pdf | 2014-06-26 |
| 6 | 315-del-2014-Assignment.pdf | 2014-06-26 |
| 7 | 315-del-2014-Drawings.pdf | 2014-06-26 |
| 7 | 315-del-2014-Claims.pdf | 2014-06-26 |
| 8 | 315-del-2014-Description (Complete).pdf | 2014-06-26 |
| 8 | 315-del-2014-Correspondence-others.pdf | 2014-06-26 |
| 9 | 315-del-2014-Description (Complete).pdf | 2014-06-26 |
| 9 | 315-del-2014-Correspondence-others.pdf | 2014-06-26 |
| 10 | 315-del-2014-Claims.pdf | 2014-06-26 |
| 10 | 315-del-2014-Drawings.pdf | 2014-06-26 |
| 11 | 315-del-2014-Form-1.pdf | 2014-06-26 |
| 11 | 315-del-2014-Assignment.pdf | 2014-06-26 |
| 12 | 315-del-2014-Form-2.pdf | 2014-06-26 |
| 12 | 315-del-2014-Abstract.pdf | 2014-06-26 |
| 13 | 315-del-2014-GPA-(12-06-2015).pdf | 2015-06-12 |
| 13 | 315-del-2014-Form-3.pdf | 2014-06-26 |
| 14 | 315-del-2014-Form-5.pdf | 2014-06-26 |
| 14 | 315-del-2014-Correspondence Others-(12-06-2015).pdf | 2015-06-12 |
| 15 | 315-DEL-2014-FER.pdf | 2019-07-31 |
| 15 | 315-del-2014-Correspondence-Others-(10-02-2014).pdf | 2014-02-10 |
| 16 | 315-del-2014-Form-18-(10-02-2014).pdf | 2014-02-10 |
| 16 | 315-DEL-2014-AbandonedLetter.pdf | 2021-10-17 |
| 1 | 2019-07-1712-25-27_17-07-2019.pdf |