Implementations of the control unit, the measure of performance | Superpipelining And Superscaling.

image source

The major parameters of interest for control units are:

Speed

Complexity (cost)

Flexibility.

The design of the control unit must provide for the fastest execution of instruction possible. Instruction execution speed obviously depends on the number of data paths (i.e., the time to fetch and execute an instruction) time for each instruction.

The complexity of the control unit is predominantly a function of the instruction-set size, although factors such as the Arithmetic Logic Unit (ALU) complexity, the register set complexity, and the processor bus structures influences it.

Machines built in the early 1960s had small instruction sets because of the complexity and hence the high cost of hardware. As Integrated Circuit technology progressed, it became cost-effective to implement complex control units in hardware. Thus, machines with large instruction set (i.e., complex instruction set computers, CISC) were built.

It was noted that an integrated Circuit implementation of the CISC, the control unit would occupy 60 to 75% of the total silicon area. It was also observed that on average 20 to 30% of the instructions in an instruction set are not commonly used by application programmers and it is difficult to design High-Level Language (HLL)compilers that can utilize a large instruction set.

These observations lead to the development of reduced instruction set computers, (RISC). The RISCs of the early 1980s had relatively small instruction sets (i.e., 50 to 100 instructions). But, the instruction sets of the modern day RISCs have 200+ instructions.

The Integrated circuit technology is currently progressing at such a rapid rate, that newer and more powerful processors are introduced to the market very fast. This forces the computer manufacturers to introduce their enhanced product rapidly enough that the competition does not gain an edge. To make such rapid introduction possible, the design (or enhancement) cycle time for the product must be as short as possible. The control unit, being the most complex of the components of the computer system, consumes the largest part of the design cycle time. Hence, it is important to have a flexible design that can be enhanced very rapidly, with a minimum number of changes to the hardware structure.
Two popular implementations of the control unit are:

Hardwired.
Microprogrammed.

Note that each instruction cycle corresponds to a sequence of microoperations (or register transfer operations) brought about by the control unit. These sequences are produced by a set of gates and flip-flops in a hardwired control unit, or from the microprograms located in the control Read Only Memory (ROM) in a microprogrammed control unit, while it requires a redesign of the hardwired control unit.

image source

The microprogrammed control units offer flexibility in terms of tailoring the instruction set for a particular application. The hardwired implementations, on the other hand, offer higher speeds.

sony-xperia-remote-play-two-column-01-ps4-eu-22apr15.png

image source

Almost all hardwired control units are implemented as synchronous units, whose operation is controlled by a clock signal. Synchronous control units are relatively simpler to design compared to asynchronous units. Asynchronous units do not have a controlling clock signal. The completion of one microoperation triggers in the next, in these units. If designed properly, asynchronous provide faster speeds.

A popular scheme for enhancing the speed of execution is the overlapped instruction execution where the control unit is designed as a pipeline consisting of several stages (Fetch, Decode, Address compute, execute, e.t.c.).

One measure of the performance is the average amount of time it takes a processor to complete a task. The runtime R of a task containing N instructions, on a processor that consumes on an average C cycles per instruction, with a clock speed of T seconds per cycle, is given by:

R = N x C x T

Thus, three factors determine the overall time to complete a task. The N is dependent upon the task, the compiler and the skill of the programmer and therefore is not processor architecture dependent. The C and T, however, are processor dependent. Reduction of the T is accomplished simply through a high-speed clock.

For instance, in the MIPS 4000 processor, the external clock input is 50 MHz. An on-chip phase-locked-loop multiplies this by two to get an internal clock speed of 100 MHz, the speed at which the pipeline runs.

This 100 MHz can be divided by 2, 3, or 4 to produce interface speed of 50, the main reason for the adjustable clock speed is to make room for a 75 or 100 MHz external clock (150 or 200 MHz internal clocks), while maintaining a 50 MHz system interface.

There are two popular techniques for reducing the C:

Superlining and,

Superscaling.

1. In Superlining, instructions are processed in long (5- to10-stage) pipelines. This allows simultaneous (overlapped) processing of multiple instructions, thus enhancing the throughput.

2. Superscaling also allows for processing of multiple instructions per clock cycle, but does this through multiple instructions per clock cycle, but does this through multiple execution units rather than through pipelining.

In order for superpipelining to work well, there must be a fast clock. And in order for superscaling to work well, there must be good instruction dispatch and scoreboarding mechanisms. Since these are done in hardware, they take up a lot of chip real estate.

Superlining theoretically lacks scalability, because current technology limits the pipeline at running twice as fast as the cache.

Superscaling, however, could theoretically have an unlimited bus size and an unlimited number of execution units, and therefore performance could be improved endlessly.

In reality, the dispatch circuitries grow in complexity very rapidly as one increase the dispatch multiplicity past two. Also, the compiler complexity grows very rapidly as one tries to avoid constant stalls with the multiple execution units.