The "New" Intel Architecture 64
Re-hash for more Cash?

Introduction

So, what do we know about this new-fangled Intel 64 (code-name Merced) Architecture? Very little that isn't extremely vague and nebulous. A few terms like EPIC, predication, RISC and Explicit Parallelism are being muttered quietly, but there seem to be few concrete facts available. Apparently the new Merced family "will extend the Intel Architecture with new levels of performance and features for servers and workstations". Prices for a Merced chip with 2 or 4 MB of secondary cache (gulp -- I remember a time when people were impressed if you had 1MB of RAM in your computer) have been estimated at around $5000, so your average Joe in the street isn't going to have one in his desktop computer (which he basically uses as a glorified games machine anyway) for quite some time after they hit the market. There will be a slightly cheaper less powerful version at $1000 with the secondary cache separate from the CPU (causing half-speed cached-data delivery), if you really feel the need to keep up with the Jones's.

We do know that this a going to be a full 64-bit architecture throughout (apart from the wee section reserved for backwards IA-32 compatibility), but still compatible with existing IA-32 software. Ron Curry, a marketing man at Intel has said that this will not be achieved through putting a separate wee Pentium II section on the chip, but by using the 64-bit components to execute IA-32 instructions with some sort of hardware "translation mechanism". It sounds a bit like the way RISC-based chips from AMD implement IA-32 instruction sets.

We also know that Hewlett-Packard are involved with the design and are attempting to make the architecture PA-RISC (one of their own particular babies) compatible too. This wondrous new chip will be "produced on Intel’s 0.18 micron process technology, which is currently under development". An expensive high-end Pentium II would be manufactured using a .25 micron process right now, so that's a considerable and rather impressive size reduction (only 72% of current state-of-the-art at Intel). It all sound wonderfully impressive.

Is it going to be the best thing since sliced bread or are we just going to be swearing at 64-bit instead of 32-bit machine in the future? (That is a rather silly question as that partly depends on who writes the operating system it runs with, and we've all sworn at our PCs due to a certain OS which will not be named by me for fear being hexed and precipitating a crash on this machine which is running under it right now -- BTW it did crash anyway. Thankfully, it seems that there is going to be a Unix-style operating system written for this powerful beast developed by Hewlett-Packard and SCO, called Summit 3D.) The Merced family of chips is "scheduled for production" in 1999 -- just in time to greet the Millenium, in fact. Will we be making jokes about it as the British public have done about the infamous Millenium Dome?

OK Then, Enough Nonsense, Where Will This Performance Come From?

Well for one thing Intel are talking about a RISC-style (i.e Relegate the Important Stuff to the Compiler) Instruction Set, made up of fast simple instructions and letting the compiler optimise the code so as to run best under the particular architecture. It may for example rearrange instructions to minimise the effect of any possible delays.

The buzzword at the moment with Intel-64 is EPIC (or Explicitly Parallel Instruction Computing), which they claim is the next generation technology after CISC, RISC etc. Instructions sent to the processor will be explicitly parallel. The compiler will take a program and generate machine code which explicitly shows its parallelism. This allows pipelining and multiprocessor systems to be exploited to their full potential, causing more instructions to be executed at the same time and making the system run faster.

Another buzzword with this architecture is scalability -- keeping the same basic design in different implemenations, but using functional blocks of different performances to give chips on different price/performance levels. This is to be a family of chips.

So what are the features of this new processor family?

High Number of Registers: We're talking 128 General Purpose Registers and 128 Floating Point Registers. It's much faster to operate upon operands already in the CPU than having to fetch them from memory. A large number of registers also means that one can perform more optimisations such as "un-rolling" short loops made possible by keeping copies of local variables for each iteration in registers. There are also 64 Predicate Registers; this will be discussed more fully in the Predication section. Having loads of registers reduces the risk of coming to a stop because you've run out of registers to put things in.
Instruction Format: The instruction format is the major feature of EPIC. Instructions are going to be issued to the processor in "bundles" of 3 Instructions each of length 40 bits and an 8-bit template giving information about their relative dependencies and parrallelism. Each instruction will contain:

Opcode
Predicate Register (6 bits)
Source 1 (7 bits)
Source 2 (7 bits)
Destination Register (7 bits)
Opcode Extension/Branch Target/Misc ?

This seems to suggest a fixed width instruction architecture, which is a common feature of RISC-style architectures.; It is interesting to note that the sources and destination of an instruction are all registers (nice little load-store architecture). This does away with Intel's previous complicated set of addressing modes.
Special Multimedia Instructions: Rumour has it that IA64 will have special multi-media instruction to enhance the speed of numerical operations upon graphics and sound à la MMX.
Wide CPU: Many execution units within the CPU to execute more than one instruction at once, speeding up throughput.

Superscaler: That is more than one instruction is issued to the pipeline at once, meaning that the rate at which instructions are completed can be faster than one per cycle.
Out of Order Execution: As I've explained earlier, the compiler is able to rearrange the order of instructions to optimise parallelism and reduce any stalls in the pipeline etc. This gives the compiler writers a lot of responsibility for system performance.
Predication: As explained above, the IA-64 architecture will have 64 Predicate Registers, each 1 bit wide holding a value of TRUE or FALSE. As you can see from the instruction format above, there is a predicate register field in most instructions which specifies the predicate register attached to that instruction. If the predicate register holds a TRUE the decoding and executing of the instruction goes ahead; the instruction is discarded and replaced with a NOP if its predicate is FALSE. This is used in conditional branching and allows both branches to be fetched from memory and put in the pipeline. Each branch will be assigned its own predicate. If the condition for the branch holds, its predicate will be set to TRUE; if the branch goes the other way, it's set to FALSE. We can see that branches not taken therefore get transformed into a series of NOPs in the pipeline. Predication gives the compiler more freedom to rearrange instruction for optimum performance, as it is in charge of assigned predicates and comparing instructions to set them etc. It should speed up the execution of programs as a few NOPs will execute faster then stalling the pipeline until it's known which branch is to be fetched and executed.

This is as opposed to branch prediction where the compiler guesses which way down the branch the program is likely to go and fetches instructions for that branch only. That's fine if the compiler guesses right, but has a rather high penalty if its wrong.

Apparently, IA64 is only going to use predication in a loop if circumstances are suitable and likely to reduce a misprediction penalty, and use more traditional branch prediction when it isn't.

Speculation: This is a method of minimising unnecessary stalls. Basically data is loaded into registers in the hope that it will be needed soon. This means that when the data is used, it doesn't have to be loaded into a register there and then, thus saving any possible delay in loading from memory at that time. An extra bit has to be added to each register because of this, to specify if the data is valid or not. The compiler controls the speculation, thus giving it much more flexibility to control the run-time efficiency of programs.

Is This Actually a New and Revolutionary Architecture Then?

Well, not really, no. All these ideas have been around for a while and implemented on different chips already. Intel are not really doing anything new. The main point is that Intel being the market leader in chips for desktop machines have now adopted these principles, after years of sticking to their old-fashioned Complex Instruction Set Architecture. It's interesting that Intel are now taking the RISC-style approach of their competitors.

Many of the features of the chip are typical RISC things e.g. fixed-length instructions, relying on the compiler to implement complex functionality, a load-store architecture with a high number of registers, having general purpose registers as opposed to those nasty dedicated registers of the x86 family (programming in x86 assembly is a nightmare -- give me 68000 assembly any day) etc. The thing is, Intel do not like to call it a RISC-style architecture for some reason.

They also don't like the term LIW (Long Instruction Word) Architecture either, which is why they came up with the acronym EPIC to describe their Instruction Set Architecture. Apparently, they don't like LIW because it has "negative connotations". Thing is EPIC does have features of LIW such as bundling instructions together when sending them to the processor. I have a sneaking suspicion that Intel just like to rename the principles behind their architecture so it sounds like it was all their own idea and something new and innvative. OK, I'm a cynical little soul, but Gothy-metallers are like that.

There are other LIW architectures out there such as the Alpha 21264, which has an LIW of 4 instructions (compared to IA64's 3 -- then again, it all depends on what the execution of 1 instruction actually achieves) issued to the pipeline per clock cycle, as does HP's PA-8000. The PA-8000 also has "advanced multimedia features" which the IA-64 is reported to have. The 21264 also has "specially developed" Motion Video Instructions (MVI) which allow the compression of MPEG2 video and Dolby AC3 Audio. Wow, Intel better be worknig hard to top that.

Out Of Order execution is no new idea. Current 32-bit chips, such as Pentium IIs already implement it. They also already implement speculation.

Many of the ideas behind the new IA64 architecture seem to be inherited from Hewlett-Packard's super-scalar PA-RISC Architecture which already has OOO Execution, speculative execution, a "wide" CPU with 10 execution units in the PA-8000 and special multimedia instructions (as I've said above). The only really different feature of IA64 Architecture from current 64 and 32 bit architectures is the concept of predication as opposed to branch prediction. As this concept could seriously reduce branch mispredictions if used well by the compiler, this could give the IA64 an edge against today's 64-bit chips.

Conclusion

The really cool thing about IA64 is that it is taking lots of ideas which increase the effieciency of the chip's performance, and putting them all together on a chip manufactured with the latest manufacturing technology with lots of functional units to give a "superwide" CPU. They are re-designing for optimisation and cranking up the technology at the same time, instead of their usual mucking around with the existing architecture design, making it more and more complex and then having to rely on improving the underlying technology for performance. This time we're getting both style and speed (apart from all the wasteful decoding circuitry to make it IA32 compatible).

Like the ingredients of a cake (flour, eggs, butter, sugar) each component idea in IA64 isn't particularly exciting or interesting, but put them together and you get a rather delicious cake (or a rather powerful and very interesting microprocessor -- hopefully).

Many of the features of this "new" architecture are already available in existing CPUs. If we have to wait until late 1999 for Merced, it had better have an incredibly impressive performance compared to the high-end chips already on the market.

Disclaimer: All opinions on this page are merely those of three weird-looking long-haired overworked CES students who probably know more about Black Sabbath's discography, nail-painting technique and how to deal with troublesome neds (wear weird clothes, fishnets, chains, handcuffs and makeup and they're more scared of you than you are of them) than they know about CPU Architecture and designing it.

We are proud not to claim any of Intel's trademarks as anything to do with us. We are sorry if we upset anyone. We're just trying to pass a class. Please don't sue us -- we have no money anyway.

Relevant Links