The problem with HTT was, that letting two threads loose concurrently, increased the cache contention - everythings else - including cache size - being equal, thereby slowing down memory accesses.
This meant that two threads would more likely than not run slower on a HTT, than on the previous generation CPU, with just multiple instruction issue.
This was not what the HW-guys had intented, but it turned out, that HTT required software to be cache-impact aware, which it isn't, because very few programmers are.
That may be why many game engines force a sequential structure on the critical real-time parts of the application.
It would be intesesting to gather some statistics from trainz users about CPU architecture and CMP performance.
nismit