It's acronym overkill.
*******
Hyperthreading, is like your CPU has two banks of registers. The
registers keep track of a thread of execution. (For example, one
of the registers would be the Program Counter or PC, and that
has the address of the currently executing instruction.)
Now, only one of those two banks of registers can be used at a time.
Say the first bank is set up to run Firefox, and the second bank
is set up for Windows Media Player (WMP). We're using the
Firefox bank of registers. Then, the processor makes an attempt
to read a cache line sized chunk of info from main memory. The processor
is now "stalled", waiting for the memory fetch to complete.
So while it is twiddling its thumbs, the core switches to the second bank
of registers, the ones for WMP. It works on the WMP problem for a bit
of time. Now, a few microseconds later, WMP attempts to fetch something
from memory, and main memory is slow to come back. The bank flipper goes
the other way, and the Firefox thread can now run. And since the memory
fetch for Firefox has been sitting there for a microsecond or so,
we know that the Firefox thread can do a bit of work.
There is still only one processor core in this case, but by using the
two sets of registers, we can squeeze around 10% more performance from
the CPU. So if you have a processor with 4 cores and 4 threads (non-HT),
then the 4 cores and 8 threads (HT) processor can be perhaps 10%
faster. Even though Task Manager might show eight graphs, the effort the
graphs can put out is moderated by the fact that they're really sharing
an actual core, and not completely independent cores.
*******
The latest AMD processors, have taken to telling their own kind of lies.
They would be advertised as "eight core" machines, but in fact there
is some sharing going on. If you look at a die shot, you can see
there are four cores present.
AMD sorta does Hyperthreading, but instead of a "lightweight" pair
of register banks, AMD duplicates functional units. So more of the
core contains duplicated hardware. But as far as I know, there is only
one fetch and decode stage per big core. AMD has no interest in
calling this Hyperthreading, but to some extent, while Task Manager
may show eight graphs, again, the graphs are not for perfectly
independent things. AMD does get some speedup from this scheme,
but its not a doubling or anything. So when you see them advertise
eight cores or six cores, think four cores or three cores instead.
*******
Even the Task Manager graphs aren't that useful. In the sense
that, the "Firefox" and "WMP" in my example above, are bouncing
from core to core, many times per second. The OS may make some
attempt to "load balance" the cores, but not in any precise sense.
The graphs then, might not be that meaningful, except if you
add all the contributions together and see that X percent
of the entire thing is in use.
This is a four core eight thread (4C-8T) Intel processor.
All of the cores sit inside one IC package, and plug into
a single big socket on the motherboard. The Cores all "feed"
off the dual channel memory DIMMs, in my example. They have
to take turns, to get some. The cache hierarchy (L1-L2-L3)
helps answer most of the requests, but occasionally a real
memory request needs actual info from the DIMMs, and that
tends to slow things down.
+--------- CPU socket --------+
| |---- Dual Channel
| PC#1 PC#2 PC#1 PC#2 |
| | | | | |---- Memory DIMMs
| +---------+ +---------+ |
| | Core #1 | | Core #2 | |
| +---------+ +---------+ |
| |
| PC#1 PC#2 PC#1 PC#2 |
| | | | | |
| +---------+ +---------+ |
| | Core #3 | | Core #4 | |
| +---------+ +---------+ |
| |
+-----------------------------+
When you get six cores in there, as in LGA2011, there is a
bit of "starvation" going on. You'd think a 6C-12T processor
would be 50% faster than a 4C-8T processor, but in fact
it's only about 35% faster. And this is one of the snags
in this core nonsense, is that eventually, it's a
diminishing return. The bottleneck is in the cache
hierarchy and memory ("dual channel") part. On LGA2011,
they use quad channels, but even that wasn't enough to
prevent starvation, for whatever reason. I think something
similar was seen on the AMD six core (Phenom), although its
internal organization is a bit different. That one was
six "real" cores.
There are processors with more cores. AMD has placed two
silicon dies in a single G34 MCM (fits into a single socket),
and it has a total of 16 (really 8) cores. And it doesn't
seem to scale that nice either. The two silicon dies talk to
each other over a high speed bus inside the MCM.
I've never seen any benchmarks for Larrabee, which is the
"king of cores". At least they're not selling that for
desktops.
(Example of the Larrabee approach.)
http://download.intel.com/pressroom/images/Aubrey_Isle_die.jpg
http://en.wikipedia.org/wiki/Intel_Larrabee
Apparently, some form of larrabee, lives on inside a
supercomputer somewhere. Don't know the details.
HTH,
Paul