Wikipedia: Computer architecture

Computer architecture refers to the theory behind the actual design of a computer. In the same way as a building architect sets the principles and goals of a building project as the basis for the draftsman's plans, so too, a computer architect sets out the Computer Architecture as a basis for the actual design specifications.

There are two customary usages of the term:

The more academic usage refers to the computer's underlying languauge - it's "instruction set". An Architecture that is set out this way will include information such as whether the computer's processor can compute the product of two numbers without resorting to external software. It will also include a nominated precision for the computer's computations.

The less formal usage refers to a description of the design of gross requirements for the varous parts of a computer, especially speeds, and interconnection requirements.

Goals

The most common goals of a Computer Architecture include:

1. Cost Generally, cost is held constant, determined by either system or commercial requirements, and speed and storage capacity are adjusted to meet the cost target.

2. Performance (speed)

Computer retailers describe the performance of their machines in terms of CPU Speed (in MHz or GHz). This refers to the number of instructions the Central Processing Unit (CPU) can perform each second (in millions or billions respectively). However this is only one of a number of factors that impact on the performance of a machine.

Throughput is the absolute processing power of the computer system. In the most computer systems, throughput is limited to the speed of the slowest piece of hardware that is being utilised at a given time. These may include input and output (I/O), the CPU, the memory chips themselves, or the connection (or "bus") between the memory, the CPU and the I/O. The gating factor most acceptable to users is the speed of the input, because the computer then seems infinitely fast. General-purpose computers like PCs usually maximize throughput to attempt to increase user satisfaction.

"Interrupt latency" is the guaranteed maximum response time of the software to an event such as the click of a mouse or the reception of data by a modem. This number is affected by a very wide range of design choices. Computers that control machinery usually need low interrupt latencies, because the machine can't, won't or should not wait. For example, computer-controlled anti-lock brakes should not wait for the computer to finish what it's doing- they should brake.

Since cost is usually constant, the variables usually consist of latency, throughput, convenience, storage capacity and input-output. The general scheme of optimization is to budget different parts of the computer system. In a balanced computer system, the data rate will be constant for all parts of the system, and cost will be allocated proportionally to assure this. The exact forms of the trade-offs depend on whether the computer system is being optimized to minimize latency or maximize throughput.

CPU design

To a large extent, the design of a [central processing unit]? is the design of its control unit. The modern (ie, 1965 to 1985) way to design control logic is to write a microprogram.

CPU design was originally an ad-hoc process. Just getting a CPU to work was a substantial government and technical event.

Key design innovations include cache, virtual memory, [instruction pipelining]?, CISC?, RISC, virtual machine, emulators?, microprogram and stack.

The major problem with early computers was that a program for one would not work on others. In 1962, IBM bet the company that microprogrammed computers, all emulating a single reference computer, could provide a family of computers that could all run the same software. Each computer would be targeted at a specific price point. As users' requirements grew, they could move up to larger computers. This computer family was called the 360/370, and updated, but compatible computers are still being sold as of 2001.

IBM chose to make the reference instruction set quite complex, and very capable. This was a conscious choice. The "control store" containing the microprogram was relatively small, and could be made with very fast memory. Another important effect was that a single instruction could describe quite a complex sequence of operations. Thus the computers would generally have to fetch fewer instructions from the main memory, which could be made slower, smaller and less expensive for a given combination of speed and price.

An often-overlooked feature of the IBM 360 instruction set was that it was the first instruction set designed for data processing, rather than mathematical calculation. The crucial innovation was that memory was designed to addressed in units of a single printable character, a "byte." Also, the instruction set was designed to manipulate not just simple integer numbers, but text, scientific floating-point (similar to the numbers used in a calculator), and the decimal arithmetic needed by accounting systems.

Another important feature was that the IBM register set was binary, a feature first tested on the Whirlwind? computer built for Lawrence Laboratory's nuclear weapons simulations. Binary arithmetic is substantially cheaper to implement with [digital logic]?, because it requires fewer electronic devices to store the same number.

Almost all following computers included these innovations in some form. This basic set of features is called a "[complex instruction set computer]?," or CISC (pronounced "sisk").

In many CISCs, an instruction could access either registers or memory, usually in several different ways. This made the CISCs easier to program, because a programmer could remember just thirty to a hundred instructions, and a set of three to ten "addressing modes," rather than thousands of distinct instructions. This was called an "orthogonal instruction set."

In the early 1980s, researchers at UC Berkley discovered that most computer languages produced only a small subset of the instructions of a CISC. They realized that by making the computer simpler, less orthogonal, they could make it faster and less expensive at the same time.

The computer designs based on this theory were called [Reduced Instruction Set Computers]?, or RISC. RISCs generally had larger numbers of registers, accessed by simpler instructions, with a few instructions specifically to load and store data to memory.

RISCs failed in most markets. Most computers and microprocessors still follow the "complex instruction set computer" (CISC) model. In modern computers CISCs remain in use because they reduce the cost of the memory system, and remain compatible with pre-existing software.

Recently, engineers have found ways to compress the reduced instruction sets so they fit in even smaller memory systems than RISCs. In applications that need no compatibility with older software, compressed RISCs are coming to dominate sales.

Another approach to RISCs was the "niladic" or "zero-address" instruction set. This approach realized that the majority of space in an instruction was to identify the operands of the instruction. These machines placed the operands on a push-down (last-in, first out) stack. The instruction set was supplemented with a few instructions to fetch and store memory. Most used simple caching to provide extremely fast RISC machines, with very compact code. Another benefit was that the interrupt latencies were extremely small, smaller than most CISC machines (a rare trait in RISC machines).

The first zero-address computer was developed by [Charles Moore]?, and placed six 5-bit instructions in a 32-bit word, the first very-long instruction word computer of record.

Commercial variants were mostly characterized as "FORTH" machines, and probably failed because that language became unpopular. Also, the machines were developed by defense contractors at exactly the time that the cold war ended. Loss of funding may have broken up the development teams before the companies could perform adequate commercial marketing.

In the 1980s, to make computer systems faster, designers began using several "[execution units]?" operated at overlapping offsets in time. At first, one was used to calculate addresses, and another was used to calculate user data. Then, each of these uses began to subdivide further. In modern CPUs, as many as eight arithmetic-logic units (ALU) are coordinated to execute a stream of instructions. These CPUs can execute several instructions per clock cycle, where classical CISCs would take up to twelve clock cycles per instruction, or more for some forms of arithmetic. The resulting microcode is complex and error-prone, and the electronics to coordinate these ALUs needs many transistors, increasing power and heat.

In the early 1990s, a significant innovation was to realize that the coordination of a multiple-ALU computer could be moved into the compiler, the software that translates a programmer's instructions in machine-level instructions. In software, the coordination consumed no hardware resources or power, and could take advantage of more knowledge about the computer program. A "[very long instruction word]?" computer would just have a wide instruction with sub-fields that directly commanded each ALU.

There were several unsuccessful attempts to commercialize VLIW?. The basic problem was that a VLIW computer does not scale to different price and performance points, as a microprogrammed computer can. Also, VLIW computers maximize throughput, not latency, so they were not attractive to the engineers designing controllers and other computers embedded in machinery. The embedded systems markets had often pioneered other computer improvements by providing a large market that did not care about compatibility with older software.

Recently a company called "Transmeta" took the radical step of placing the compiler in the central processing unit, and making the compiler translate from a reference instruction set (in their case, 80386) to a VLIW? instruction set. This approach appears technically and commercially feasible. It may eventually dominate CPU design because it provides the hardware simplicity, low power and speed of VLIW RISC with the compact main memory system and software compatibility provided by CISC.

The majority of computer systems in use today are embedded in other machinery, such as telephones, clocks, appliances, vehicles, and infrastructure. These "embedded systems" usually have small requirements for memory, modest program sizes, and often simple but unusual input/output systems. For example, most embedded systems lack keyboards, screens, disks, printers, or other recognizable I/O devices of a personal computer. They may control electric motors, relays or voltages, and read switches, variable resistors or other electronic devices. Often, the only I/O device readable by a human is a single light-emitting diode, and severe cost or power constraints will eliminate even that.

The most common tradeoff in embedded systems minimizes interrupt latency. A lower latency is often far more useful than another kilobyte of unused memory.

For example, low-latency CPUs generally have relatively few registers in their central processing units. A register is an electronic abacus to store a number during a calculation. When an electronic device causes an interrupt, the intermediate results, the registers, have to be saved before the software for the interrupt can run, and put back after it is done. If there are more registers, this saving and restoring process takes more time, reducing the latency.

Another common problem involves virtual memory. Historically, random-access memory has been thousands of times more expensive than rotating mechanical storage. For businesses, and many general computing tasks, it is a good compromise to never let the computer run out of memory, an event which would halt the program, and greatly inconvenience the user. Instead of halting the program, many computer systems save less-frequently used blocks of memory to the rotating mechanical storage. In essence, the mechanical storage becomes main memory. However, mechanical storage is thousands of times slower than electronic memory. Thus, almost all general-purpose computing systems use "virtual memory" and also have unpredictable interrupt latencies.

The newest development seems to be [optical computing]?, and this may eventually cause a radical redesign of computer systems.

The most interesting near-term possibility would be to eliminate the bus. Modern vertical laser diodes enable this change. In theory, an optical computer's components could directly connect through a holographic or phased open-air switching system. This would provide a large increase in effective speed and design flexibility, and a large reduction in cost. Since a computer's connectors are also its most likely failure point, a busless system might be more reliable, as well.

Another farther-term possibility is to use light instead of electricity. This would run about 30% faster and use less power, as well as permit a direct interface with quantum computational devices. The chief problem with this approach is that for the forseeable future, electronic devices are faster, smaller (i.e. cheaper) and more reliable. An important theoretical problem is that electronic computational elements are already smaller than some wavelengths of light, and therefore even wave-guide based optical logic may be uneconomic compared to electronic logic. We can therefore expect the majority of development to focus on electronics, no matter how unfair it might seem.