Opteron Processor models, families, and revisions/steppings
Opteron naming is not that confusing, but AMD seems intent on making it difficult by rearranging their web site in mysterious ways….
I am creating this blog entry to make it easier for me to find my own notes on the topic!
The Wikipedia page is has a pretty good listing:
List of AMD Opteron microprocessors
AMD has useful product comparison reference pages at:
AMD Opteron Processor Solutions
AMD Desktop Processor Solutions
AMD Opteron First Generation Reference (pdf)
Borrowing from those pages, a simple summary is:
First Generation Opteron: models 1xx, 2xx, 8xx.
- These are all Family K8, and are described in AMD pub 26094.
- They are usually referred to as “Rev E” or “K8, Rev E” processors.
This is usually OK since most of the 130 nm parts are gone, but there is a new Family 10h rev E (below). - They are characterized by having DDR DRAM interfaces, supporting DDR 266, 333, and (Revision E) 400 MHz.
- This also includes Athlon 64 and Athlon 64 X2 in sockets 754 and 939.
- Versions:
- Single core, 130 nm process: K8 revisions B3, C0, CG
- Single core, 90 nm process: K8 revisions E4, E6
- Dual core, 90 nm process: K8 revisions E1, E6
Second Generation Opteron: models 12xx, 22xx, 82xx
- These are upgraded Family K8 cores, with a DDR2 memory controller.
- They are usually referred to as “Revision F”, or “K8, Rev F”, and are described in AMD pub 32559 (where they are referred to as “Family NPT 0Fh”, with NPT meaning “New Platform Technology” and referring to the infrastructure related to socket F (aka socket 1207), and socket AM2 )
- This also includes socket AM2 models of Athlon and most Athlon X2 processors (some are Family 11h, described below).
- There is only one server version, with two steppings:
- Dual core, 90 nm process: K8 revisions F2, F3
Upgraded Second Generation Opteron: Athlon X2, Sempron, Turion, Turion X2
- These are very similar to Family 0Fh, revision G (not used in server parts), and are described in AMD document 41256.
- The memory controller has less functionality.
- The HyperTransport interface is upgraded to support HyperTransport generation 3.
This allows a higher frequency connection between the processor chip and the external PCIe controller, so that PCIe gen2 speeds can be supported.
Third Generation Opteron: models 13xx, 23xx, 83xx
- These are Family 10h cores with an enhanced DDR2 memory controller and are described in AMD publication 41322.
- All server and most desktop versions have a shared L3 cache.
- This also includes Phenom X2, X3, and X4 (Rev B3) and Phenom II X2, X3, X4 (Rev C)
- Versions:
- Barcelona: Dual core & Quad core, 65 nm process: Family 10h revisions B0, B2, B3, BA
- Shanghai: Dual core & Quad core, 45 nm process: Family 10h revision C2
- Istanbul: Up to 6-core, 45 nm process: Family 10h, revision D0
- Revision D (“Istanbul”) introduced the “HT Assist” probe filter feature to improve scalability in 4-socket and 8-socket systems.
Upgraded Third Generation Opteron: models 41xx & 61xx
- These are Family 10h cores with an enhanced DDR3-capable memory controller and are also described in AMD publication 41322.
- All server and most desktop versions have a shared L3 cache.
- It does not appear that any of the desktop parts use this same stepping as the server parts (D1).
- There are two versions — both manufactured using a 45nm process:
- Lisbon: 41xx series have one Family10h revision D1 die per package (socket C32).
- Magny-Cours: 61xx series have two Family10h revision D1 dice per package (socket G34).
- Family 10h, Revision E0 is used in the Phenom II X6 products.
- This revision is the first to offer the “Core Performance Boost” feature.
- It is also the first to generate confusion about the label “Rev E”.
- It should be referred to as “Family 10h, Revision E” to avoid ambiguity.
Fourth Generation Opteron: server processor models 42xx & 62xx, and “AMD FX” desktop processors
- These are socket-compatible with the 41xx and 61xx series, but with the “Bulldozer” core rather than the Family 10h core.
- The Bulldozer core adds support for:
- AVX — the extension of SSE from 128 bits wide to 256 bits wide, plus many other improvements. (First introduced in Intel “Sandy Bridge” processors.)
- AES — additional instructions to dramatically improve performance of AES encryption/descryption. (First introduced in Intel “Westmere” processors.)
- FMA4 — AMD’s 4-operand multiply-accumulate instructions. (32-bit & 64-bit arithmetic, with 64b, 128b, or 256b vectors.)
- XOP — AMD’s set of extra integer instructions that were not included in AVX: multiply/accumulate, shift/rotate/permute, etc.
- All current parts are produced in a 32 nm semiconductor process.
- Valencia: 42xx series have one Bulldozer revision B2 die per package (socket C32)
- Interlagos: 62xx series have two Bulldozer revision B2 dice per package (socket G34)
- “AMD FX”: desktop processors have one Bulldozer revision B2 die per package (socket AM3+)
- Counting cores and chips is getting more confusing…
- Each die has 1, 2, 3, or 4 “Bulldozer modules”.
- Each “Bulldozer module” has two processor cores.
- The two processor cores in a module share the instruction cache (64kB), some of the instruction fetch logic, the pair of floating-point units, and the 2MB L2 cache.
- The two processor cores in a module each have a private data cache (16kB), private fixed point functional and address generation units, and schedulers.
- All modules on a die share an 8 MB L3 cache and the dual-channel DDR3 memory controller.
- Bulldozer-based systems are characterized by a much larger “turbo” boost frequency increase than previous processors, with almost models supporting an automatic frequency boost of over 20% when not using all the cores, and some models supporting frequency boosts of more than 30%.
Kshitij Sudan says
This is a great compilation! Thanks …
John Neville says
Apparently the Bulldozer has had reported BIOS problems with some motherboards?
John D. McCalpin, Ph.D. says
I don’t know about any specific BIOS issues with the Bulldozer processors, but I was not the one who set up our systems (all 4-socket servers).
Just starting to try to reproduce some of AMD’s published performance numbers (http://www.amd.com/us/products/server/benchmarks/Pages/benchmarks-filter.aspx), but did not have much luck on the first try so I will hold off posting until I figger out what I am doing wrong….
John D. McCalpin, Ph.D. says
Again, I don’t know about any BIOS issues with respect to the AMD Family 15h (“Bulldozer”) processors, but there was a lot of discussion about performance issues on the versions of Linux when Bulldozer was first being tested. AMD has produced a very clear description and discussion in this white paper.
My Summary:
The problem is due to a combination of:
1. The instruction cache cannot hold two copies of the same physical address if that physical address is mapped to virtual addresses that differ in bits 12:14.
2. Starting with Linux 2.6.12 kernels, the Address Space Layout Randomization (ASLR) feature maps the text of shared libraries to a different virtual address in every process.
So when both cores are (simultaneously) running code from the same dynamically linked library, 7 of 8 cases will generate this conflict. Note that address spaces are not randomized for code using pthreads or fork (without exec), so there is no problem for pthreads/OpenMP or most user-generated parallel code using fork().
The problem is fixed in Linux kernels starting at 3.2-rc1 (by aligning all text sections on 32kB boundaries, so bits 12:14 always match). The problem can also be fixed using a “prelink” tool, by using statically linked libraries, or by disabling ASLR.
So this is something to watch out for, but is not a big deal.