Implementing a Cache Controller using Content Addressable Memory
Cache controller introduction: All modern processors incorporate a small, high-speed cache right on the chip, to hold recently used data and instructions from memory. A computer science principle called locality of reference states that if the processor recently referred to a location in memory, it is likely that it will refer to it again in the near future. Using a cache to hold recently memory values saves the processor from going to memory each time to reload them. This provides a significant performance boost, because main memory is many times slower than the processor’s cache.
The cache memory system is managed by an intelligent circuit called the cache memory controller. When a cache memory controller retrieves an instruction from RAM, it also takes back the next several instructions to the cache. This occurs because there is a high probability that the adjacent instruction will also be needed.
The cache memories are very expensive, thus restrictions in size. Also bigger size means longer search which in turn means longer processing whereas the purpose of this memory invention was to speed up processing.
Write back and write through caches: Write back cache holds off writing to the hard disk until there’s a lull in CPU activity. This gives an advantage in speed but there is a danger that data can be lost if the power fails. Write through caches, on the other hand interrupts the microprocessor to update the hard disk.
Cache Speed and RAM speed: In Pentium systems, 20ns cache SRAM is generally used for 50-60MHz system boards and 15ns cache SRAM is normally utilized for 66MHz system boards. Cache SRAM at speeds up to 8 ns has recently become available, although rare and expensive.
Cache Controller Block Diagram: The block diagram of the cache controller implemented is shown in figure 1.
The cache used is a 2 way set associative cache. Hence there are 2 caches, cache way 0 and cache way 1. Here we assume that the external memory is organized as pages. Each page contains 32 lines of data, each line containing 8 bits of data. Hence each cache contains 32 lines and each cache line contains 8 bits of data.
There are 2 directories, directory 0 and directory 1 each associated with cache way 0 and cache way 1 respectively. Each directory entry identifies the page in memory that the cache line was copied from. Each directory entry is 16 bit. Hence the maximum number of pages in external memory is 65536. In addition there are 32 LRU (Least recently used) bits, one for each of the 32 lines in both the cache ways. These bits check to see the cache way containing the least recently used line. A ‘0’ indicates that the entry in way 0 is least recently used and a ‘1’ indicates that the entry in way ‘1’ is least recently used. In addition there is a 64 X 1 buffer containing the dirty bits which indicate if the cache entry is valid or not. The first 32 bits in the buffer correspond to the 32 lines in way 0 and the next 32 bits correspond to the 32 lines in way 1. A ‘0’ indicates that the entry is valid and a ‘1’ indicates that the entry is invalid.

A/D Bus
CAM DIR 1 32 x 16 CACHE WAY 1 32 X8 LRU BITS 32 X 1 DIRTY BITS 64 X 1
CAM DIR 0 32 X 16
CACHE WAY 0 32 X 8


System Bus
Figure 1
Though it was originally proposed to implement 128 lines in each cache, the actual design implemented contains only 32 lines in each cache.
Content addressable Memory Introduction: CAM is designed to enhance data retrieval speed from a particular location in a storage array. Instead of using an address to read the data, such as a RAM, the data is supplied as an input to locate the address. CAM determines if the data is found within a storage array. When a match is found, the responder bit associated with the corresponding memory word is set.
Associative processing has its modern antecedent in a thought experiment proposed by Vannevar Bush [7] in 1945. Bush envisioned a database machine called MEMEX that would help people cope with the ever-growing body of general knowledge by storing textual information and retrieving it. Twelve years later, Slade and McMahon [7] succeeded in taking the first step toward creating an electronic digital associative memory. In their system called the Crytron Catalog Memory, a pattern of bits was input in parallel to the memory and an output line indicated whether a match was found. The Cryotron Catalog Memory could tell the user whether a pattern was found in memory but could not return associated information. In the sixties and seventies, interest in associative memory (also called content addressable memory) grew rapidly with many designs being proposed and built. All of the designs in this period let users query the memory for an exact match to a broadcast value called the comparand.
A basic CAM element consists of input data, a storage location, and a comparator. There are several CAM modes: single match, multiple match and ternary CAM. In single match mode, only one location in the storage array contains input data. If the input data can be found in multiple locations within the storage array, it is called multiple match mode. Ternary CAM supports “don’t care” bits as multiple data. The block diagram of a 16 X 8 CAM is shown in Figure 2.
A 16x8 CAM has 2 ports, a separate port for reading and a separate port for writing. The 8-bit data input line goes into the both the read and the write ports. It is used to write data into the CAM at the specified address during the write cycle. During the read cycle, it contains the 8 bit data to be searched in the CAM. Both the read and write ports have their separate clock inputs. The write port has a write enable signal. Data is written into the CAM only when that signal is high. Similarly the read port has a match enable signal. The search for the specific data takes place only when that signal is high. Each of the 16 bits of the match (15:0) output line corresponds to one of the 16 data locations in the CAM. The match output bit corresponding to a particular location is set if a match is found at that location.
Advantages of CAM over RAM: RAM’s are fast, storage efficient and secure. But they also are passive devices and require complex and large data structures to be broken down into small pieces, thus losing intrinsic links. Moreover RAM’s lack the key feature of the human memory i.e. the ability to form associations among memory items. CAM’s on the other hand are expensive but active devices. They allow parallelism to be implemented at the basic level of computer hardware. With the advancements in VLSI technology, it is now possible to fabricate large capacity CAM’s for applications such as database searching and dictionary retrieval.
Read Port
Write Port
Data
In (7:0)
Write Clock
Write
Enable
Addr (3:0)
![]()
Read Clock
![]()
![]()
Match
Enable
Match (15:0)
Figure 2
CAM Limitations: There are a number of problems with CAM. They are:
1. Functional and design complexity of the design memories
2. Relatively high cost for reasonable storage capacity
3. Poor storage density compared to conventional memory
4. Slow access time due to available methods of implementation
5. A lack of software to properly use the associative power of the memory
Implementing a Cache controller using Content addressable memory: The 2 cache directories corresponding to cache ways 0 and 1 are implemented using CAM. Thus there are 2 CAM’s each containing 128 entries, each entry being 16 bits. The block diagram for the 2 cache directories implemented using CAM is shown below in Figure 3.
![]()
![]()
32 X 16 CAM(1)
![]()
ADD(4:0)
DIN(15:0) 32 x 16 CAM(0) Match
(63:32)
Match Enable
![]()
![]()
WE Match
(31:0)
CLKR/CLKW
Figure 3
The CAM cache controller is in multi match mode and has write through caches.
Match enable is the master cache enable. Disabling it always results in a cache miss.
Only when Write Enable (WE) is a 1, data can be written into the cache. The address (line number) where the page number is to be placed in the directory during a cache write is placed in the ADDR bus. During cache read, DIN contains the page number. Match bits 0-31 correspond to the 32 lines of cache way 0 and match lines 32-63 correspond to the 32 lines of way 1.
Cache Controller Modes
Implemented:
CACHE HIT ON MEMORY READ: If the processor is addressing a certain line in a certain memory page, the page number is given as input to the CAM cache directory. The page number if present in the cache results in a cache hit and the corresponding match bits go high. Here many match bits can go high since the page number can be present many times in the directories. If the page number is not present, it results in a cache miss. If a match hit occurs and one or more match bits go high, a check is made to see if the line number requested by the processor has a 1 on it’s match bit. If not a cache miss occurs. If a 1 is present on the match bit, the corresponding bit in the dirty bits buffer is checked to see if the cache entry is valid or not. If valid, a cache hit occurs else a cache miss occurs. If a cache hit occurs the data in the corresponding cache line number is placed on the A/D bus and given to the processor.
CACHE MISS ON MEMORY READ (THEN CACHE WRITE): If a cache miss occurs, i.e. a line requested by the processor from a certain page is not in the cache, then that line is fetched from external memory. The line can be placed at the corresponding line number in cache way 0 and directory 0 or cache way 1 and directory 1. First a check is made to made to see if any of the 2 lines in way 0 or way 1 have a dirty bit in the invalid state. If so that line will be overwritten. If both the lines have dirty bits that are in the valid state, the LRU bits are examined to check the line in the caches that are least recently used and the line fetched from memory is placed in the corresponding cache line with the page number stored in the directory and the data in the cache way. To write the page number into the CAM cache directory, WE should be high. The line number is placed in the ADDR bus. The page number is placed on the DIN bus. And on the
CACHE WRITE FROM PROCESSOR (WRITTEN TO PROCESSOR IMMEDIATELY): If the processor writes data into the cache, it is written in the same manner as that given for a cache miss above. Since the cache is a write through cache, the data is also immediately given to external memory also. Hence the corresponding lines in the cache and external memory have the same data.
CACHE ERASE (CACHE INVALIDATE): This mode is used if any cache line needs to be specifically invalidated. When match enable is 1, the page number is placed on the DIN bus. On a match hit, a check is made to see if the match bit corresponding to the line number to be invalidated is high. If so a cache hit has occurred (the cache contains the line from the page in external memory) and the dirty bit corresponding to that line is set to the invalid state. This is mainly done when another bus master writes to external memory. The cache controller snoops to see if the corresponding line is present in the cache. If present it sets the corresponding dirty bit to the invalid state
VHDL code and simulation script
for the cache controller modes:
Signals and ports used in the
VHDL code:
C: When ‘0’ write into the cache and when ‘1’ read from the cache
We: When ‘1’ enable write to take place in the cache
Din: 16 bit Bus containing the page number
Addr: 5 bit bus containing the address/line number
Clkw: Clock write, used for writing into cache
Clkr: Clock read, used for reading from cache
St1, St2: External Bits used for setting the dirty bit.
En: When ‘0’ cache read and write takes place. When ‘1’ the dirty bit, dir is updated with the value of Int, Ext or Diras.
Dir: 64 bit buffer. A ‘1’ signifies line is invalid and a 0 signifies line is valid. Depending on the state of dir, cache write takes place into the appropriate line in the appropriate cache way. During cache line invalidate, the dir for the line to be invalidated is set to 1.
Diras: The initial state of the dirty bits, dir assigned externally in the program.
Int: 64 bit. When a line in a cache way is overwritten by another line from memory because of it being invalid i.e. the line having an invalid dirty bit, the corresponding int bit is set to 1. Then when en= ‘1’ the corresponding dir is updated depending on the state of int. Int is thus used as a temporary signal to store the state of the dirty bits, dir when a cache line is overwritten by another line from memory due to it’s dirty bit, dir being invalid. Int is then used to indicate that line is valid. This is done by making Int 1.
Ext: 64 bit. When a line in a cache way is invalidated, the corresponding dir has to be set to 1. Whenever a line is invalidated, the corresponding ext bit is set to 1. Then when en= ‘1’ the corresponding dir is updated depending on the state of ext. Ext is thus used as a temporary signal to store the state of the dirty bits, dir when a cache line is invalidated specifically.
Inv: A value of ‘1’ indicates that a cache line has to be invalidated.
Sbus: System Bus
Pbus: Processor Bus
Bussel: used to select pbus or sbus
Lru: 32 bit. Least recently used bits.
Line: 5 bit address or line number
Hit: 64 bit. Each bit corresponds to a cache line. Goes high if a cache hit occurs.
Future Work: The VHDL code for the address implementation has been done using the brute force method. It can be implemented using a more general method. This was not done due to time constraint since many factors like setting the dirty bits appropriately, etc. had to be taken care off and synthesis of the code using Virtex itself took around 2 hours and the code had to be synthesized and re-synthesized many times to correct any errors. Also this code has been implemented for 32 lines in each cache way. It can be implemented for a larger number of lines in each cache.
References:
1. Xilinx Virtex Tech Topic, Content Addressable Memory, http://www.xilinx.com/products/virtex/techtopic/vtt001.pdf, 24 July 2000.
2. J. L. Brelet, Using Block RAM for High Performance Read/Write CAM’s, http://www.xilinx.com/xapp/xapp204.pdf, May 2,2000.
3. Altera, Apex CAM as Cache for External CAM, www.altera.com /literature/wp /camcachewp.pdf,2002.
4. Shanley T, Anderson. D, Pentium Processor System Architecture, Addison
Wesley Publishers, 1995.
5. The PC Guide, Primary (Level 1) Cache and Cache Controller, http:// www.pcguide.com/ref/cpu/arch/int/compCache-c.html,
May 5, 2002.
6. Cache, Cache Memory, http:// www.dis.unimelb.edu.au/
mm/hwtute/ inside_the_computer_case/cache.htm, May 5 2002.
7. C.C. Weems , Associative Processing and Processors, Department of Computer Science, University of Massachusetts, Amherst.
8. Jamil. T, RAM vs. CAM, IEEE Potentials, Volume 16, Issue 2, April-May 1997, pages 26-29.