Gecko's CPU Library

Hewlett Packard PA-7100 and PA-7150 (Thunderbird) processors

Introduction: 1992 (PA-7100) and 1994 (PA-7150)

The PA-7100 is a superscalar processor that is therefore able to issue more than one instruction at a time. It is the first PA-RISC CPU to integrate the ALU and FPU on a single die thus saving board space and lowering production cost. The communication channel between the PA-7100 and its instruction cache has been doubled which enables this CPU to achieve instruction level parallelism as described above. In this, multiple consecutive instructions are fetched by the CPU and simultaneously dispatched to independent integer and floating point units. Connection to memory and I/O is provided by the external Processor-Memory Interface (PMI) chip, to which the PA-7100 attaches via the P-bus. The PA-7100 is apparently multi-processing capable, with two alternative strategies: either two PA-7100s share the same P-bus to a (shared) PMI, or each PA-7100 is attached to its own PMI, which shares the memory and I/O bus with the other PA-7100/MPIs.

The PA-7150 is a PA-7100 with tweaks to the core and cache subsystem to allow clock frequencies up to 125MHz.

PA-7100 and PA-7150 were used in 715/{33,50,75}, 725/{50,75}, 735/{99,125}, 742i/50, 745i/{50,100}, 747i/{50,100}, 755/{99,125}, G50, G60, G70, H50, H60, H70, I50, I60, I70, T500, T520, Convex SPP1000/{CD,XA} and Stratus Continuum 610S, 610, 615S, 615, 620, 625, 1220, 1225, 1245.

- PA-RISC version 1.1b 32-bit
- Two functional units: 1 integer ALU, 1 Floating Point unit
- 2-way superscalar
- CPU, FPU, MMU and cache controller on one VLSI chip
- Five-stage pipeline
- Pipeline store technique for reduction of penalty for execution of any store to data cache
- Stall-on-use mechanism for parallel procession of instruction streams and cache misses
- 3-instruction queue
- Hardware TLB miss handler
- Hardware static branch support
- I/D cache bypass (7150)
- Off-chip L1 caches up to 1MB I and 2MB D realized in asynchronous standard SRAMs
- I/D caches are both 64-bit per access, direct mapped, parity protected and cycled at CPU clock
- Caches are software accessible
- Caches are virtually indexed and physically tagged to minimize latency
- 120-entry fully associative TLB
- 16-entry BTLB with programmable page sizes up to 64MB
- P-bus system bus, speed interface programmable to 1.0, .67 and .50 of processor speed
- Dual precision floating point latency: 2 cycles at 100MHz, load-use penalty is one cycle, branch penalty 0 (predicted) and 1 cycle (mispredicted)
- Two different multiprocessing connection strategies supported (shared MPI and dedicated MPIs)
- MP cache coherency support
- Up to 100MHz frequency (PA-7100) with 5.0V core voltage
- Up to 125MHz frequency (PA-7150) with 5.0V core voltage
- 14.0×14.0 mm2 die, 850,000 FETs, 0.8 micron, 3-layer metal CMOS (CMOS26B process) packaged in a 504-pin ceramic PGA package
- Power dissipation of 30W at 100MHz

Source: www.openpa.net