An anonymous poster posted following link to a source code file (related to the Open64 compiler) in Real World Tech's forum:
So far the file contains some details about Bulldozer's cache sizes and associativities:
case TARGET_orochi: L = MHD_LEVEL(MHD_TYPE_CACHE, // Type 16*1024, // Size 64, // Line Size 18, // Clean Miss Penalty 18, // Dirty Miss Penalty 4, // Associativity ... break;
// TODO: this might be too generous: in multiple processor situations,
// there is a cost to loading the shared bus/memory.
L = MHD_LEVEL(MHD_TYPE_CACHE,
// cache size
// cache line size
200, // ?
So it looks like one core in a Bulldozer module will have a 4-way set associative 16 kB L1 data cache and the module itself might contain a shared 2 MB L2 cache, with 16-way set associativity, as known from current designs. The miss penalty numbers indicate a higher latency for the L2 cache of 18 cycles.
The small L1 cache reminds me of the small L1 caches of Prescott, which later had the same size but twice the associativity. This fact and a lot of indications let me believe, that Bulldozer will be a very different design, where the designers might have traded area and static power consumption for higher dynamic power caused by shorter clock cycle times as a design goal. More on that later.