Gecko's CPU Library

AMD Athlon (Argon, Pluton, Orion) processors

Introduction: June 1999

Overview

Athlon was the brand name applied to a series of different x86 processors designed and manufactured by AMD. The original Athlon, or Athlon Classic, was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel's competing processors for a significant period of time. AMD had continued the Athlon name with the Athlon 64, an eighth-generation processor featuring AMD64 (later renamed x86-64) technology.

Athlon was the ancient Greek word for "Champion/trophy of the games", and made its debut on June 23, 1999.

The Argon, Pluton and Orion cores

The original Athlon core revision, code-named "K7" (in homage to its predecessor, the K6), was available in speeds of 500 to 700MHz at its introduction and was later sold at speeds up to 1000MHz (K75). The processor was compatible with the industry-standard x86 instruction set and plugged into a motherboard slot (Slot-A) mechanically similar to (but not pin-compatible with) the Pentium II's Slot-1.

Internally, the Athlon was a fully seventh generation x86 processor, the first of its kind. The CPU was designed by a combination of AMD engineers and newly-hired ex-DEC engineers, and the result was a merging of technologies from AMD's earlier CPUs and the DEC Alpha 21264. Like the AMD K5 and K6, the Athlon is a RISC microprocessor which decodes x86 instructions into its own internal instructions at runtime. The CPU is again an out-of-order design, like previous post-486 AMD CPUs. The Athlon utilizes the DEC Alpha EV6 bus architecture with double data rate technology. Although it was clocked at 100MHz initially, the DDR aspect to the bus allowed it to provide significantly higher bandwidth than the Intel GTL+ bus used by the Pentium III and its derivatives.

AMD designed the CPU with more robust x86 instruction decoding capabilities, to enhance its ability to keep more data in-flight at once. Athlon's CISC to RISC decoder triplet could potentially decode 6 x86 operations per clock, although this was somewhat unlikely in real-world use. The critical branch predictor unit was enhanced compared to what was onboard the K6 because Athlon's longer pipeline necessitated highly accurate prediction to prevent performance-costly pipeline stalls. The deeper pipelining with more stages allowed higher clock speeds to be attained. Whereas the AMD K6-III+ topped out at 570MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was designed to go much higher.

AMD ended its long-time issue with floating point performance by designing an impressive super-pipelined, out-of-order, triple-issue floating point unit. Each of its 3 units were tailored to be able to calculate an optimal type of instructions with some redundancy to provide for more popular code usage. By having separate units it was possible to operate on more than one floating point instruction at once. This FPU was a huge step forward for AMD. While the K6 FPU had looked positively anemic compared to the Intel P6 FPU, the new Athlon put even the Pentium III to shame. Athlon gained a revised version of 3DNow! too, called "Enhanced 3DNow!", with added DSP instructions and an implementation of the extended-MMX subset of Intel SSE.

Caching onboard Athlon consisted of the typical two levels of cache. First off came the largest level 1 cache in x86 history, a split 2-way associative cache of 128KB, half for data and half for instructions (Harvard architecture). This cache was double the size of K6's already large cache, and quadruple the size of Pentium II and III's L1 cache. Like Intel's Pentium II and "Katmai" Pentium III, there was also a 512KB secondary cache, mounted outside the CPU itself and running at a lower speed than the core. The cache was placed on its own 64-bit bus, called a "backside bus", similar to AMD's own K6-III and Intel's Pentium Pro and later CPUs. A backside bus allows concurrent cache and main RAM accesses, dramatically improving efficiency and bandwidth. This alone was a major improvement over the L2 cache architecture of the AMD K6-2 on down where L2 and RAM shared the front side bus. Initially this L2 cache was set for half of the CPU clock speed, on up to 700MHz Athlon CPUs, but later Slot-A processors ran the cache at 2/5 (up to 850MHz) or 1/3 (up to 1GHz) of the core speed. A 1GHz Slot-A Athlon with external cache would require the chips to run at 500 MHz considering a 1/2 multiplier. The SRAM available at the time was simply incapable of reaching this speed, due both to cache chip technology limitations and the electrical/cache latency complications of running an external cache at such a high speed. Later Athlon processors would, like Intel's Pentium III Coppermine, move to an on-die L2 cache to allow higher cache clock speeds. Athlon cores before "Thunderbird" used an inclusive caching scheme that duplicated L1 cache data in the L2 cache. This was the same as Intel's processors but unlike later AMD processors which utilized exclusive designs.

The Slot-A Athlons were the first multiplier-locked CPUs from AMD. This was partly done to fight off CPU remarking being done by questionable resellers around the globe. AMD's older CPUs could simply be set to run at whatever speed the user chose on the motherboard, making it trivial to relabel a CPU and sell it as a faster grade than it was originally. These relabeled CPUs were not always stable, being overclocked and not tested properly, and this was damaging to AMD's reputation. Although the Athlon was multiplier locked, crafty enthusiasts eventually discovered that a connector on the PCB of the cartridge could control the multiplier. Eventually a device called the "Goldfingers device" was created that could unlock the CPU, named after the gold connector pads it attached to. It was basically a module that attached to the CPU board connector and had a set of switches that opened or shorted the circuits the board connector controlled.

Upon release, the Athlon was the fastest x86 CPU in the world, and various versions of the CPU held this distinction continuously from August 1999 until January 2002. Athlon outclassed Intel's Pentium III processors in nearly every way and compared quite favorably years later to the best the Pentium 4's Netburst-architecture could offer.

In commercial terms, the Athlon Classic was an enormous success - not just because of its own merits, but also because the normally dependable Intel endured a series of major production, design, and quality control issues at this time. In particular, Intel's transition to a 0.18 µm production process, starting in late 1999 and running through to mid-2000, was chaotic, and there was a severe shortage of Pentium III parts. In contrast, AMD enjoyed a remarkably smooth process transition, had ample supplies available, and Athlon sales became quite strong. Many long-time Intel-only PC dealers found the combination of the Athlon's excellent performance and reasonable pricing tempting, and the prospect of being able to get stock in commercial volumes impossible to resist. The demand that resulted in fact caused AMD to stop producing the K6-III.

Source: Wikipedia, the free encyclopedia.