Shortly after the introduction of the PA-8000 the design team noted several aspects of this chip for improvement in the successor: branch prediction, TLB miss rates and Cache sizes.
The new chip should offer improved performace, compatibility with existing applications and short time to market. The whole design should be heavily leveraged from the existing PA-8000 design foundation.
The availability of new 4Mb SRAMs with faster access times allowed an increased CPU clock-speed and a bigger cache size. Furthermore the team analyzed that the PA-8200 performance could be enhanced significantly if wasted cycles while waiting for instructions and data were reduced. Due to this, it was concluded that increasing the BHT, TLB and caches are high benefit, low risk improvements.
PA-8200 was used in C200, C240, D390, J2240, K370, K380, K570, K580, R390, V2200 and V2250.
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 120-entry fully-associative dual-ported TLB
- 42-entry BTAC (Branch Target Address Cache)
- 1024-entry BHT (Branch History Table)
- Dynamic and static branch prediction modes
- Off-chip L1 caches up to 2MB I and 2MB D, realized in synchronous 5ns (200MHz) late-write 4Mb SRAMs, one cycle latency
- Caches are direct-mapped and dual-ported
- 56-entry instruction queue/reorder buffer (IRB)
- Each instruction includes five predecode bits
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Bi-endian support
- Runway system/memory bus, 120MHz, 64-bit wide, featuring split transactions and glueless multiprocessing. Max. throughput of 960MB/s
- Up to 300MHz frequency with 3.3V core voltage
- 17.7×19.6 mm2 die, 4,500,000 FETs, 0.5 micron, 5-layer metal CMOS packaged in a 1,085-pin flip-chip LGA package