Wikipedia: Cache

A cache in computer science is a short term memory in a computer with quick access. A cache is intended to speed up access to a set of data. The cache will be a piece of memory that is faster (hence more expensive, hence smaller) than the principal data storage area for the data in question. The cache operates by storing a part of the data, allowing that part to be accessed more quickly. A speed-up is acheived if many accesses to the data can access the data in the cache. The reason caches work at all is because many access patterns in typical computer applications have "locality". There are several sorts of locality, but we mainly mean that often the same data is accessed frequently or with accesses that are close together in time, or that data near to each other are accessed close together in time.

One example is a RAM that caches the accesses to the hard drive, so that when two subsequent reads of the same file from the hard disk occur, ony one physical read from the disk is performed. Modern disk drives have such a cache built in to the electronics of the drive. Most modern [operating systems]? also operate some sort of disk cache using the computer's main memory.

Caches are also used between the CPU and the (slower) main RAM in modern PC. There are often several levels of cache, with different sizes and access speed. Generally the caches get smaller, faster, and more expensive, the closer to the CPU they are.

Caches do not have to be thought about in purely low level/OS terms, software can cache return values that are likely to be re-used in the near future. A good example of this is the BIND DNS daemon which caches domain name replies for a period before going back to the authorative source of the name to IP mapping. Resolver libraries also perform almost exactly the same caching.

Web browsers also use caches. Some browses implement their own cache of recently visited web pages; this saves having to download them again when you revisit. Most browsers can be configured to use an external proxy [web cache]?, a server program through which all web requests are routed so that it can cache frequently accessed pages for everyone in an organization.

When designing a cache there are a number of points to consider:

What data should be cached?
What are the typical access patterns to this data?
How big should I make the cache?
How should data be brought into the cache?
When the cache is full, what data should be removed (evicted) from the cache?

A number of different algorithms are used for optimizing the contents of the cache, including to remove the "least recently used" - LRU? - contents when the cache is full.

CPU memory cache

This section discusses in more detail the typical caches used by a CPU to cache data from main memory.

Typically the fastest piece of memory on a CPU is internal flip-flops and buffers. These are not generally accessible to the programmer. Next fastest is the architecture registers. These are the registers that are accessible to the programmer (if she is programming in assembler). Next there is usually at least one cache between those registers and main memory (RAM); often there are two caches, one on-chip (on the same piece of silicon as the rest of the CPU) and one off-chip. When there is more than one cache they are usually called level-1 (for the one nearest the CPU), level-2, level-3 and so on.

The level-1 cache might be between 32 kibibits and 512 kibibits say. In a 32 bit architecture that would be between 1024 and 16384 words (each word being 32 bits).

A cache will usually arrange data in "lines" (sometimes called also called blocks), a line is a contiguous sequence of words starting at an address that is a multiple of the line size. A line might be 8 words say. So we might have 1024 lines in our cache. A "line" in main memory when accessed will end up as a line in the cache. Usually a cache will transfer data to and from memory in chunks of a whole line, this is more efficient than transferring each word individually. This is both a source of efficiency and inefficiency: if we end up accessing all of the data in a line then bring the line into cache all at once will have been more efficient than bring each word into the cache; on the other hand, if we only access one word from the line then we paid the cost of having to bring the whole line in which will be slower.

When a line from main memory is stored in the cache the cache has to associate the address of the data (where it is in main memory) with the data, so that it can correctly process accesses to the data later on. Most caches are arranged so that a particular line in main memory can end up in only a few places in the cache. If any line in main memory can go in any line in the cache then the cache is said to be "fully associative". If any line in main memory can go in only one particular line in the cache then the cache is said to be "direct mapped". Usually the situation is somewhere in between, and we talk of "ways". A cache in "8-way set associative" if a particular line in main memory can go in one of up to eight lines in the cache.

A direct mapped cache is easier to implement: let's say we have 1024 lines of 32 bytes each on a 32-bit byte addressed architecture. If we say that any line in main memory must go in the line in the cache that corresponds to bits 5-14 of the address of its lowest byte (that is the address modulo 2^15, divided by 2^5), so the correspondence between main memory addresses and cache lines looks like this:

main memory address           cache line
0-31                          0
32-63                         1
64-95                         2
...                           ...
32736-32767                   1023
32768-32799                   0
32780-32811                   1

and so the pattern repeats through memory.

In a direct mapped cache when we store a line in the cache all we need to associate with it are bits 15-31 of its address. The low order bits are determined by where it is in the cache. The real saving comes when an access is performed. When a location in main memory is accessed we need to check the contents of the cache to see whether we have that data cached; in a direct mapped cache we only have one line to check, because there is only one place in the cache that it can be; we simply check the high order address bits that we have associated with that line in the cache to see if they match the address of the data being accessed.

The more ways associative a cache is the more address bits we need to associate with each line, but more importantly, the more lines we need to check when an access is performed. In a eight-way set associative cache we need to check eight different lines in the cache to see if the data is stored. Because of the speeds at which caches operate the cost in power to drive the electronics of each line to check its contents becomes significant.

Obviously more ways associative is generally better (for the application).