Sun Microsystem's UltraSPARC T2 Processor

The UltraSPARC T2 (code-named "Niagara 2") contains up to eight processor cores, which are able to execute 8 threads simultaneoulsy each. Thus, within a single processor chip 64 processes can operate on eight 8-stage Integer pipelines and eight 12-stage floating point pipelines. The objective of this design is, to overlap computation and waiting for memory or to have multiple threads wait for memory simultaneously. Whereas traditional processors only ran at some 5 percent of the peak performance when executing memory hungry programs, the UltraSPARC T2 processor  promises to operate at a higher percentage of its theoretical peak performance.

Each of the processor cores has a separate instruction and data cache and accesses a shared L2 cache and the shared main memory via an internal crossbar. Thus the UltraSPARC T2 processor is a shared memory machine on a single chip with a flat memory ( UMA = uniform memory architecture) from the programmer's perspective.

The above Niagara 2 diagram is from Robert Golla's presentation "A Highly Threaded Server-on-a-Chip"

A single process achieves up to 1.4 GFlop/s, because one core can only execute one floating point operation per cycle. Therefore the peak performance of the whole chip is quite moderate for today's standards:  11,2 GFlop/s. The high potential of the Niagara 2 is revealed, if many threads are active and the high memory bandwidth of some 60 GB/s (in theory) can be exploited - a frequent bottelneck of standard architecures when executing HPC applications. Furthermore, the UltraSPARC T2 processor contains two  10/1 Gbit-Ethernet (up to 3,125 Gb/s), and one PCI-Express x8 1.0A Port (2,5 Gb/s) "on Chip".



Ultra SPARC T2 (Niagara2)


Texas Instruments



Address space

48-bit virtual, 40-bit physical


8 cores with 8 threads each


2 Instruction-Pipelines,
8 8-stage Integer-pipelines, 8 12-stage floatingpoint pipelines with 8 threads using one Pipeline simultaneously,
1 Crypto Unit.

Clock cycle

0,9 GHz - 1.4 GHz

L1 Cache
(per Core):

16 KByte Instruction, 8 KB data cache
(8-way set-associative)

L2 Cache

4 MByte on chip
16-way associative, 8 banks à 512 KByte

Memory Controller

up to 64 FB-DIMMs, 4 dual-channel FB-DIMM Memory Controllers on chip
bandwidth 60 GB/s


8x9 non-blocking,
90 GB/s write and 180 GB/s read per channek approximately


CMOS, 65 nm
Die-size 342 mm2
Transistors: 503 Million
Pins: 1831

Power consumption

95 Watt nominal, 123 Watt max. at 1.4 GHz
Voltage: 1,2 V (Core), 1,5 V (Analog)


on Chip:
two 10/1 Gbit-Ethernet (bis 3,125 Gb/s)
one PCI-Express x8 1.0A Port (2,5 Gb/s)


