Intel presented details about its upcoming microarchitecture including the changes related to AVX.
You can find the relevant presentation slides here:
For some condensed information and related discussion about the changes have a look at these two forum threads:
You should read most, especially Mark Buxton's responses.
So far, the AVX execution units have been widened to 256 bits, and the L1D bandwidth has changed significantly. Some example speed ups of standard codes and loops have been presented in the slides.
Regarding Bulldozer I think it is likely, that we'll see 128 bit wide execution of 256 bit wide instructions there in its first instance of the core. Although I've included full 256 or 4x64 bit wide FP units in my core diagram, there is not much to back this decision. The possibly related reconfigurable FPU patents mention different variants of instruction splitting, e.g. a 256 bit instruction into two 128 bit operations or eight 32 bit operations. However, there is one reference to a full bit width mode having 256 bit in patent no. 7,565,513 but the way how it is mentioned there doesn't let me feel sure about it. I will continue to dig further into this. Maybe AMD found a better way for doing energy efficient FP calculations while still having enough throughput despite a 128 bit width. Could one higher clocked 128 bit FPU be more power efficient than a lower clocked 256 bit wide FPU?