Gecko's CPU Library

DEC Alpha 21064 (EV4) and Alpha 21066 (LCA4) processors

Introduction: November 1992 (Alpha 21064) and September 1993 (Alpha 21066)

The first processor of the Alpha family was called 21064 ("21" implied that Alpha was an architecture of the 21st century, "0" a processor's generation, "64" a computational capability in bits), also code-named as EV4 ("EV" was [supposedly] the abbreviation of "Extended VAX" and "4" a technological process' generation, CMOS4; in turn, CMOS stood for Complementary Metal Oxide Semiconductor). To mention, a prototype of EV4 was ready in 1991 by using a less detailed CMOS3 process, therefore with cache sizes reduced and with no floating-point unit. Nevertheless, it was an important threshold for tuning and polishing off the architecture and software. EV4 was introduced officially in November of 1992 at a COMDEX in Las Vegas (Nevada, the USA). It was manufactured with a proprietary 3-layer 0.75µ technological process (in the future, it was modified towards a 0.675µ CMOS4S, the optical modification of CMOS4). Consisted of 1.68M transistors, possessed a die size of 233mm², required a 3.3V power supply. Core frequencies of 21064 ranged from 150MHz to 200MHz (TDP from 21W to 27W). Supported multiprocessing as one of the architecture's key features. Form-factor: PGA-431 (Pin Grid Array).

The L1 cache was integrated: 8KB for instructions (I-cache, instruction cache), direct-mapped, also 8KB for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 cycles. Every line of I-cache consisted of 32 instruction bytes, a 21-bit tag record, an 8-bit branch history field and of several auxiliary fields. Every line of D-cache consisted of 32 data bytes and a 21-bit tag record. The L2 cache (B-cache, back-up cache) was a recommended option to be implemented through external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead and sized up to 16MB (from 512KB to 2MB usually). Every line consisted of 32 data or instruction bytes with a 1-bit long-word parity or 7-bit long-word ECC field, a 17-bit maximum tag record with an additional 1-bit long-word parity protection and a 3-bit condition flag with an additional parity bit. Read and write speeds of B-cache were programmable in the processor's cycles. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit long-word parity or 7-bit long-word ECC field) and was multiplexed with B-cache data bus, thus physical bus lines were switched between these logical buses if necessary. The system address bus was 34-bit wide. B-cache was organised to be inclusive to D-cache, i. e. contained a full copy of the latter. A processor and no one else could perform read/write operations on B-cache, though a system logic was granted a permission to read B-tags (tags of B-cache) because it was convenient for cache coherence mechanisms to work this way. In other words, a system logic was able to perform so-called snoop operations on B-cache with no processor involved.

EV4 was powered with one integer pipeline (E-box, 7 stages) and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per clock in-order to the functional units, namely E-box, F-box and load/store unit (A-box). The cache memory and system bus controller (C-box) worked in cooperation with A-box and supervised integrated I-cache and D-cache as well as external B-cache. Calculations of virtual addresses were handled by E-box. The branch prediction unit maintained a 4096-entry branch prediction table with 2 bits per entry. There was I-TLB (Instruction TLB) of 8 entries for 8KB pages and 4 entries for 4MB pages, also D-TLB (Data TLB) of 32 entries for pages sized from 8KB to 4MB. Both I-TLB and D-TLB were fully associative.

The first workstation of the Alpha architecture, DEC 3000 Model 500 AXP (code-named as Flamingo), was introduced in November of 1992. It carried a 150MHz 21064, 512KB of B-cache, 32MB of main memory, an integrated 8-bit video controller with 2MB of VRAM, a 1Gb SCSI HDD, a SCSI CD-ROM, a built-in 10Mbit Ethernet controller (thick coaxial and twisted pair), built-in sound and ISDN controllers, also a 19" monitor (1280x1024x72Hz). All peripherals were served by the proprietary TURBOchannel bus. The price was impressive: 39,000 USD. Although there was less expensive workstation, DEC 3000 Model 400 AXP with 21064 at 133MHz, much more affordable machine had to exist anyway.

DEC tried to design a 21064-powered personal computer supporting the ISA or EISA peripheral bus since February of 1991. There were 35 systems of the Beta project engineered and built successfully, each using a 100MHz EV4 prototype, an Intel 82380 ISA system logic set and other proprietary and third-party hardware. However, the upcoming Theta project rather failed because of engineering mistakes which crept into a mainboard powered by the Intel 82350DT EISA system logic set. However, two design teams located in Maynard (Massachusetts, the USA) and Ayr (Scotland, the UK) worked around all issues and released DECpc AXP 150 (code-named as Jensen) in August of 1992. It contained a 150MHz EV4, 512KB of B-cache, an AT form-factor mainboard, industry-standard 72-pin FPM parity SIMMs and EISA peripherals. Although this machine ran DEC OSF/1 and OpenVMS, its future was tied to Windows NT. So, DECpc AXP 150 was introduced on the 28th of October 1992 in New York (New York, the USA) at the Windows on Wall Street presentation when Bill Gates demonstrated this OS for the first time.

There were also three 21064-powered server families: 2-processor DEC 4000, 6-processor DEC 7000 (with 182MHz processors) and DEC 10000 (with 200MHz processors). DEC 7000 and DEC 10000 were modular designs, they featured 4MB of B-cache per processor and could accommodate up to 14GB of operating memory (with 7 memory modules 2GB each installed). While DEC 4000 was designed to support the FutureBus+ peripheral bus, DEC 7000 and DEC 10000 could also be configured for the XMI peripheral bus given an appropriate module (or several ones). DEC 7000 and DEC 10000 could be also powered with NVAX+ processors, hence called VAX 7000 and VAX 10000 (reconfiguration was possible simply by replacing the processor modules).

With a respect to its excellent performance, 21064 was expensive considerably for most potential customers, thus a low-priced brother was released in September of 1993, 21066 (LCA4 or LCA4S). It was based upon the core of EV4, but with the operating memory and PCI controllers integrated additionally as well as with several secondary functional units. On the other hand, the system data bus width was reduced to 64 bits causing a negative impact on performance. LCA4 was manufactured using a 0.675µ CMOS4S process resulting in a die size even smaller than of original EV4 (209mm² compared to 233mm²). However, its clock frequencies were lowered to range from 100MHz to 166MHz, presumably to avoid potential overheating issues common for ventilated badly desktop cases of those days, also to avoid creation of an additional competitor to EV4. Consisted of 1.75 mln. transistors, required a 3.3V power supply. Design of this processor was licenced to Mitsubishi, so it manufactured LCA4 as well even including a 200MHz version.

Source: www.alasir.com