• AMD Kaveri engineering sample sighted in the wild

    My occasional search for specific CPU model strings finally revealed what seems to be the first sign of a working AMD Kaveri ES in the wild. So this is categorized as CPU family 21 (15h - Bulldozer) model 48 (30h). The measured performance numbers (BOINC Whetstone/Dhrystone) are rather low, which indicates a low clock frequency during that measurement run, possibly caused by missing power management or other CPU drivers.

    The ES code 2M186092H4467_23/18/12/05_1304 tells us even more. According to earlier observations (here and here), the four numbers in the middle part tell a bit about clock speeds. If the first one is not 00 (no turbo, see Kabini ES), it indicates a turbo clock of 2.3GHz. The "18" stands for 1.8GHz nominal frequency. I'm not so sure about the "12". It could stand for 1.2Ghz North Bridge clock. Finally the "05" indicates a 500MHz GPU clock. The right part "1304" is the GPU code, which - thanks to earlier revelations - can be identified as AMD1304.1 = "KV SPECTRE MOBILE 35W (1304)" (source).



  • 2 GHz AMD Jaguar benchmarks

    As a detailed system report suggests, there is a lonely quad core engineering sample sporting four Jaguar cores running in a rack slot somewhere at OSADL (Open Source Automation Development Lab). The CPU is identified as family 22 (16h), model 0, stepping 1. This translates to stepping A1. The OPN is "2M201079J4461_00/20/08/06_9830", which suggests a mobile chip ("M"), running between 0.8 and 2.0GHz core clock and likely with a 600MHz GPU clock ("20/08/06").

    The GPU device ID is 9830, which already appeared in device string AMD9830.1 = "KB 4C 25W (9830)", as reported by So this engineering sample might actually be a Kabini part with a TDP of 25W. This also means, the eight Jaguar cores in the Playstation 4 APU could also be clocked at levels like 2GHz, since the total TDP listed above includes the GPU and FCH, leaving something around 10 to 15W for the compute unit. Two of them would need about 20 to 30W.

    Here's the detailed report of core #1 (running at maximum clock frequency):

    vendor_id	: AuthenticAMD
    cpu family	: 22
    model		: 0
    model name	: AMD Eng Sample: 2M201079J4461_00/20/08/06_9830 
    stepping	: 1
    microcode	: 0x7000105
    cpu MHz		: 2000.000
    cache size	: 2048 KB
    physical id	: 0
    siblings	: 4
    core id		: 1
    cpu cores	: 4
    apicid		: 1
    initial apicid	: 1
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 13
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr
                      sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
                      nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1
                      sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm
                      sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt topoext arat xsaveopt hw_pstate npt
                      lbrv svm_lock nrip_save tsc_scale flushbyasid decodeassists pausefilter pfthreshold bmi1
    bogomips	: 3992.32
    TLB size	: 1024 4K pages
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 40 bits physical, 48 bits virtual
    power management: ts ttp tm 100mhzsteps hwpstate [11]


    Further down, the DMI info block reveals 1.2V core voltage at 1.8GHz. The somewhat lower clocked 40nm Brazos cores run at 1.35 V.

    The benchmark page contains some single/multi core UnixBench results. To find the Jaguar results, look for rack #9, slot #1, or "r9s1". It is also possible to sort the columns. There are other interesting CPUs like VIA, PPC, ARM.

    A link on the report page leads to a detailed cache/memory bandwith plot:


    Some AMD Richland benchmarks

    But this were not the only new benchmark results. There is a new Geekbench result of a Richland based HP Notebook:
    Hewlett-Packard HP ProBook 455 G1
    Compare that to the older result of AMD's Bantry platform here:

  • ISSCC 2013 and Next Gen Consoles

    There is a lot being talked about small cores at ISSCC 2013, which is still going on. So far some first bits of information have made their way out of it, for example about the voltage regulation and power management in Intel's Haswell MPU. Another presentation gave us many details of AMD's Compute Unit (CU) based on four Jaguar cores. This is especially interesting, as Jaguar cores seem to be an important component of next gen XBox and Playstation rumours. 3DCenter once made a nice overview of these rumours (see link below), which still seem to be changing or popping up on a weekly basis. So if Jaguar is meant to be included in one of the next gen consoles' processing units, this might happen in the form of such a compute unit. Let's have a look at it.

    To put that into perspective, I show you these two pictures at original scale (as long as your browser is at 100% zoom and has the correct info about the screens DPI):

    The left image shows a collection of several well known cores (made by Hans de Vries):

    The second image is an photoshopped version of the 4 core Jaguar Compute Unit as depicted at AMD's ISSCC presentation:

    Update: This CU with four cores, 2 MB L2 and additional logic measures 26.2 mm² (excl. the logic in the upper left), which is less than the area of a 32 nm Piledriver Module. According to leaked quad core Kabini models with TDP ratings of 15 to 25 W (which might result in 8 to 15 W SCP according to Intel's definition) these CUs might consume around half of this power with no turbo mode or power distribution engaged. For such a rumoured next gen console with 8 Jaguar cores, two of these CUs would have to be included. This could go with additional memory channels to remove any potential bottleneck. Those CUs might be memory channel agnostic to allow their use beyond planned Kabini/Temash (not Tamesh!) SKUs.


    AMD "Jaguar" Micro-architecture Takes the Fight to Atom with AVX, SSE4, Quad-Core

    Jaguar - The New Low Power CPU-Core From AMD (translated)

    AMD presents Jaguar Quad Module at ISSCC (translated, with galleries containing ALL slides)

    A nice collection of Kabini/Temash information by user vain at SemiAccurateForums

    At the Sony PlayStation event tomorrow, we might hear a bit more about what they will actually include into their next console. To be prepared, there are some links covering these rumours:

    More rumoured details by Xbitlabs

    Raw GPU performance (submetrics) of current and next gen consoles

    Overview of many of the recent next gen console rumours at

    A lot of background information about stacked memory, AMD, etc. at Neogaf forums

    More on that regarding AMD, PS4 by the same poster

    BTW, in my personal opinion I see the possibility to get enough gaming performance out of 8 Jaguar cores supported by enough GPU compute power and memory bandwidth (stacking and/or huge caches). An unchanged hardware spec over the life cycle of a console also helps here develop optimized code for a fixed platform. This also worked in the past. But if there is a need for high single or low thread count performance, other CPU cores might fit better here, even Steamroller. But this should be the topic of a different blog posting.

  • Trinity/Piledriver Performance

    Since February I'm regularly searching for appearances of new "family 21 model 16" BOINC results, which belong to AMD's Trinity APU. As I noticed, I'm not the only one doing that. ;-) Some early results of an engineering sample (ZD372058A4451_41/37/16_9901_800, which should clock at 3.7 GHz base and 4.1 GHz turbo clock according to the string) didn't look bad (one day it reached an integer score of over 13K on 64b linux). But to do some halfway accurate (or semiaccurate ;-)) analysis it is important to look at results achieved on the same OS (here: Win 7, 64 bit) and BOINC client version (6.12.34 here except for the ES, which run a 6.12.43 client).

    The FP benchmark, which is a Whetstone benchmark, seems to run as a multithreaded benchmark according to "informal". At least it fills up all available cores while running. The integer benchmark, a good old Dhrystone benchmark, seems to be single threaded. Further it is important to know, that both benchmarks have a rather small memory footprint.

    Since we don't know the exact clock frequencies of the benchmark runs, it is difficult to find the correct value for calculating per GHz results. I estimated those based on turbo clocks, which might lead to skewed results. At least in the case of comparing Trinity with its Piledriver cores to the FX models, I hope that rather similar turbo mode behaviour should reduce the error margin.

    OK, here comes the table comparing several values I filtered out of my collected BOINC results to have OS and client version the same. As you can see, Piledriver w/o L3 cache seems to perform a bit better than BDver1 based FX models:

    Trinity BOINC Performance Comparison

    Note: I used "Trinity vs. Bulldozer" to denote the difference between a L3-less Piledriver core and a Bulldozer core, which always had L3 available.

    Another note (as of 04/10): In the Piledriver vs. Bulldozer columns I divided the Trinity value by the maximum of all FX values. Further the FP benchmark likely run at base clock frequency. I'll add more on that in a follow up article.

  • AMD FX Processor Launch

    Today NDA's for AMD FX processor reviews got lifted. Here is a quick list (updated as time permits):

    Planet3DNow #1 (Googlish, original article is here. There's also a clock to clock comparison. Stay tuned for my articles looking at certain aspects of the architecture and why there are the performance differences we see today.)

    Planet3DNow #2 (Googlish, my first Bulldozer performance analysis article, where I have a look at measured instruction latencies/throughput, original article is here)


    Hot Hardware

    Tom's Hardware

    Tech Spot


    DGLee@XS part 1 part 2 part 3 part 4 (the 4th part is interesting, where he tests the performance of 4 modules with 1 core disabled each)


    Maximum PC (has a nice result table)


    The Tech Report

    Anand Tech

    XBit Labs

    Legit Reviews


    Benchmark Reviews

    VR Zone (test of memory scaling with latest BIOS, BTW I think, each module has a 64b read+write interface, so it's limited at 17.6GB/s per module w/ NB running at 2.2GHz)

    Also at XS is this long list of reviews.

    Since this microarchitecture is a clean break from any existing x86 microarchitecture before, it won't be perfectly suited for legacy software. Software-wise it's a situation like in times of Intel's Pentium 4. Furthermore rumours indicate that there are some things to be fixed (think of the Linux kernel patch to avoid unnecessary cache line thrashing in the instruction cache).

RSS Feed
RSS 1.0
RSS 2.0
Email subscription

You can receive the posts of this blog by email.

To top link


The content of this website belongs to a private person, is not responsible for the content of this website.