(Did you know that your posts take the form of a text attachment?)
Richard said:
There seem to be several misconceptions in this thread...
I've been thinking that!
* 64-bit operating systems do not inherently run applications 10% slower
than 32-bit operating systems.
No, that figure of 10% was puzzling me too. If run on totally 32-bit
_hardware_, they'd have to do two fetches per instruction, but that
would only be slower (and that by 50%, not 10%) if the code wasn't
optimised for 64-bit. (Which a lot of it isn't; on the whole, I wouldn't
run a 64-bit OS on 32-bit hardware.)
* A 32-bit operating system need not be limited to 4GB of RAM, though in
practice some are. Still, a 32-bit OS on 64-bit hardware would be a
bizarre choice for most use cases today.
Mostly agreed. If the 32-bit OS had _knowledge_ of the existence of
64-bit hardware, and arranged memory usage and the like in pairs, it
might not lose much, but still an odd choice. (Though I know of one
piece of software that _won't work_ - because it is closely linked to
the explorer shell - with 64-bit [XP, Vista, 7, or 8], but will with
32-bit versions. [Right up to pre-release, it _would_ work in 64-bit
versions, since MS had the 32-bit shell present; only on the final
release of 7 did they break access to that 32-bit shell.] People who
want to use that software under 7 - about half of them run 32-bit 7, the
rest run it in a virtual machine, usually under XP where it's a bit
happier anyway. There probably are other 32-bit app.s - and, certainly,
16-bit and older - that won't work in a 64.)
* A single 32-bit application can only address a maximum of 4GB of
virtual memory (even if running under a 64-bit OS). A given OS may
impose lower limits.
Well, 2^32 is 4G, so it can't address more than that with single-word
addresses; that never stopped 8-bit processors addressing 64K rather
than only 256 locations. However, you are right that most 32-bit app.s
don't use multi-word addresses, at least in part because processor
hardwares - and OSs - when 32 bits came in didn't consider multi-word
addresses necessary, 4G seeming big enough at the time (unlike 256 in
the 8-bit days).
* 32-bit code does not "waste" half of every memory fetch on a 64-bit
system. Modern computers are replete with buffers and caches which
exploit locality of reference for code and data fetches. Even the
original Pentium (P5 0 > 32-bit instruction set.
Unless these caches etc. are only 32 bits wide, though, I don't see how
the fetches _aren't_ wasting capacity - _unless_ there is
packing/unpacking hardware at each interface. (And in terms of
instructions, the processor would have to have the ability to process
them in pairs too, which isn't always possible if one instruction
depends on the results of the previous one.)
On Intel/AMD hardware, the 64-bit instruction set brings a couple of
performance advantages:
* There are twice as many general-purpose registers. That substantially
reduces the chance that a performance-critical fragment of code will
need to spill values to memory (which is still slower than using
registers despite the extensive caching mentioned above).
Yes, that is certainly an advantage. Having twice as many registers
doesn't of itself fall out of the use of 64 rather than 32 bits, but I
am not surprised that Intel/AMD decided to provide it at the same time.
* The general-purpose registers are (obviously!) 64 bits wide instead of
32 bits. Whether this is an advantage in practice depends on the code
in question.
Indeed. Unlikely to be that significant for maths operations except in
extreme cases, but for parallel processing of shorter values, assuming
there is the necessary packing/unpacking hardware (and the code knows
about it), there is a potential halving.
Furthermore, ABI designers have taken advantage of the instruction set
change to introduce more efficient function call mechanisms
(i.e. passing parameters in registers), which can also reduce the number
of memory accesses required for performance-critical code.
Indeed. (More of what I'm calling packing/unpacking hardware.)
Finally: there *is* a performance regression that may affect some code.
Pointers (addresses) are twice as large, so code using pointer-heavy
data structures occupies more memory, resulting in less efficient use of
cache (including RAM, interpreted as cached virtual memory). The impact
will depend on the particular program; usually it will be much too small
to notice.
Agreed - certainly on the first part about the _theoretical_ penalty;
I'm not qualified to comment on the "usually", so I yield on that.