Gecko's CPU Library

AMD 29000 processors

Introduction: 1988

The AMD 29000, often simply 29k, was a popular family of RISC-based 32-bit microprocessors and microcontrollers from Advanced Micro Devices. They were, for a time, the most popular RISC chips on the market, widely used in laser printers from a variety of manufacturers. In late 1995 AMD dropped development of the 29k because the design team was transferred to support the PC side of the business. What remained of AMD's embedded business was realigned towards the embedded 186 family of 80186 derivatives. The majority of AMD's resources were then concentrated on their high-performance, desktop x86 clones, using many of the ideas and individual parts of the latest 29k to produce the AMD K5.

The 29000 evolved from the same Berkeley RISC design that also led to the Sun SPARC and Intel i960. One "trick" used in all of the Berkeley-derived designs is the concept of register windows, a technique used to speed up procedure calls significantly. The basic idea is to use a large set of registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when the procedure returns. Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). It is interesting to note that the competing early RISC design from Stanford University looked at this concept, but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window, something that has proven true over the years.

In the original Berkeley design, SPARC, and i960, the windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29000 differed from these earlier designs, in that it used a variable window size to improve usage. In this example only two registers would be used, one for the local variable, another for the return address. It also added more registers, including the same 128 registers for the procedure stack, but adding another 64 for global access. In comparison the SPARC had 128 registers in total, and the global set was a standard window of eight. These changes, combined with a "halfway smart" compiler, resulted in the best of both worlds in performance - high performance for procedure calls, while still having lots of global registers for general purpose work. The 29000 also "extended" the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled the calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally the 29000's register usage was considerably more advanced than competing designs based on the Berkeley concepts.

Another difference, this one not so odd, is that the 29000 included no special-purpose condition code register. Any register could be used for this purpose, allowing the conditions to be easily saved at the expense of complicating some code. An instruction prefetch buffer was used that stored up to 16 instructions, used to improve performance during branches - the 29000 did not include any branch prediction system so there was a delay if a branch was taken (nor was it originally superscalar, so it could not "do both sides" as is common in some designs). The buffer mitigated this by storing four instructions from the "other side" of the branch, which could be run instantly while the buffer was re-filled with new instructions from memory.

The first 29000 was released in 1988, including a built-in MMU but floating point support was offloaded to the 29027 FPU. The 29005 was a cut-down version. The line was upgraded with the 29030/29035, which included 8KB/4KB of instruction cache. Another update included the FPU on-die and added 4KB of data cache to produce the 29040. The final general purpose version was the 29050, which was a superscalar design that could issue four instructions per clock, and included out-of-order and speculative execution, as well as a much faster FPU.

Several portions of the 29050 design were used as the basis for the K5 series of x86 compatible processors. The FPU was used without change, while the rest of the core design was used along with complex microcode to translate x86 instructions to 29k-like code on the fly.

Source: Wikipedia, the free encyclopedia.