An anonymous poster posted following link to a source code file (related to the Open64 compiler) in Real World Tech's forum:

So far the file contains some details about Bulldozer's cache sizes and associativities:

   case TARGET_orochi:
    L[0] = MHD_LEVEL(MHD_TYPE_CACHE,    // Type
                     16*1024,           // Size
                     64,                // Line Size
                     18,                // Clean Miss Penalty
                     18,                // Dirty Miss Penalty
                     4,                 // Associativity


case TARGET_orochi: // TODO: this might be too generous: in multiple processor situations, // there is a cost to loading the shared bus/memory. L[1] = MHD_LEVEL(MHD_TYPE_CACHE, 2*1024*1024, // cache size 64, // cache line size 150, 200, // ? 16, // associativity ... break;

So it looks like one core in a Bulldozer module will have a 4-way set associative 16 kB L1 data cache and the module itself might contain a shared 2 MB L2 cache, with 16-way set associativity, as known from current designs. The miss penalty numbers indicate a higher latency for the L2 cache of 18 cycles.

The small L1 cache reminds me of the small L1 caches of Prescott, which later had the same size but twice the associativity. This fact and a lot of indications let me believe, that Bulldozer will be a very different design, where the designers might have traded area and static power consumption for higher dynamic power caused by shorter clock cycle times as a design goal. More on that later.