Chapter Twenty-One

The Arithmetic Logic Unit

The modern computer is a complex assemblage of myriad components, but they can roughly be divided into three categories:

· Memory

· The central processing unit, or CPU

· Input and output (I/O) devices, often called peripherals

You learned in Chapter 19 how random-access memory is constructed and structured and how each byte in memory is accessed through an address. In Chapter 20 you saw how the contents of memory can store numbers and how codes stored in memory can control circuitry that manipulates these numbers. In the more general case, the contents of memory can also contain text, pictures, music, movies, and anything else that can be represented digitally—that is, with 0s and 1s. The instruction codes that are stored in memory are often referred to collectively as code, and everything else as data. In other words, memory contains code and data.

Computers also include several input and output (I/O) devices, often called peripherals. Which peripherals a particular computer includes depends greatly on whether the computer sits on a desk, folds up under an arm, resides in a pocket or purse, or is hidden away in a microwave oven, a robot vacuum cleaner, or a car.

The most visible I/O devices on a desktop computer are the video display, keyboard, and mouse, and perhaps a printer sitting in the corner. A laptop might have a touchpad rather than a mouse, while a phone performs all those functions on a single screen. All these computers include a device for mass storage, perhaps a hard drive on a desktop computer, a solid-state drive (SSD) on a laptop, and flash storage on a phone, perhaps augmented by external storage such as thumb drives.

Other I/O devices are less obvious, such as circuits to play sound and music, circuits to connect to the internet via an ethernet connector or Wi-Fi, circuits that pick up Global Positioning System (GPS) signals to tell you where you are and where you’re going, and even devices that detect gravity and motion to determine how your phone is oriented and moving relative to the earth.

But the subject of this chapter (and the next three chapters) is the CPU, which is sometimes called the “heart” or the “soul” or the “brain” of the computer, depending on your metaphorical preference.

Chapter 20 described a Triple-Byte Accumulator that consisted of a counter to access memory, an adder, and latches. Everything was controlled by circuitry that used codes stored in memory to add and subtract numbers and later to write a running total into memory.

A CPU is very much like that Triple-Byte Accumulator, except that it is generalized to respond to many different codes. Consequently, a CPU is much more versatile than that previous machine.

The CPU that I’ll begin building in the pages of this book will work with bytes. That means it can be categorized as an 8-bit CPU or an 8-bit processor. But it will be capable of addressing 64K of random-access memory, which requires a 16-bit memory address, or 2 bytes. Although an 8-bit CPU primarily works with bytes, it must also be capable of working with 16-bit values to a limited extent in connection with this memory address.

Although this CPU won’t exist in the material world, it will (in theory, at least) be capable of reading code and data from memory to perform many different types of arithmetic and logical tasks. In terms of arithmetic and logic processing capabilities, it will be equivalent to any other digital computer, no matter how sophisticated.

Over the years as 8-bit CPUs gave way to 16-bit CPUs and then 32-bit CPUs and 64-bit CPUs, these more advanced CPUs did not become capable of different types of processing tasks. They instead perform the same tasks faster. In some cases that speed makes all the difference—for example, when the CPU is decoding a stream of data that encodes a movie. An 8-bit CPU can do this same processing, but it would probably be too slow to display the movie at the speed at which it was meant to be seen.

Although an 8-bit CPU performs mathematical and logical operations on bytes, it will also be capable of working with numbers that require multiple bytes. For example, suppose you want to add two 16-bit numbers together, perhaps 1388h and 09C4h (the hexadecimal values for 5,000 and 2,500, respectively). You would enter the following values in memory for the CPU to process:

Eighteen bytes stored in memory.

Of course, all those bytes probably don’t make very much sense because they’re a mix of instruction codes and data, and you likely don’t know what the instruction codes are. Here’s the annotated version:

Eighteen bytes in memory annotated to show how instruction codes and data add two 16-bit numbers.

A sequence of instructions such as these is called a computer program. (But you probably guessed that!) This is a fairly straightforward program that adds 1388h (which is decimal 5,000) and 09C4h (decimal 2,500). First, the CPU adds the low bytes of the two 16-bit values (88h and C4h), and the result is stored at memory address 0010h. Then the two high bytes (13h and 09h) are added with a possible carry from the first addition. That sum is stored at address 0011h. Then the CPU halts. The 16-bit sum resides at addresses 0010h and 0011h, where it can be examined.

Where did I get the particular codes of 3Eh, C6h, 32h, CEh, and 76h? At this point I haven’t even started building the CPU, so I could have just made them up. But I didn’t. Instead, I used actual instruction codes implemented by the famous Intel 8080 microprocessor that was used in the MITS Altair 8800, which is widely regarded as the first commercially successful personal computer. The first IBM PC didn’t use the 8080 microprocessor, but it did use the Intel 8088, which (as the number suggests) was the next generation of this family of processors.

In this chapter and the next few chapters, I use the Intel 8080 as a model to design my own CPU. But only as a model. My CPU will implement only a subset of the 8080. The Intel 8080 implemented 244 instruction codes, but when my CPU is finished, it will implement just over half of those. Regardless, you will still have an excellent idea of what goes on in the very heart (or soul or brain) of a computer.

I’ve been referring to these codes as instruction codes or operation codes or opcodes. They are also known as machine codes because they are used directly by the machine—the circuitry that constitutes the central processing unit. The little computer program shown earlier is an example of a machine code program.

All the 8080 instruction codes are just 1 byte. However, some of them require 1 or 2 additional bytes following the instruction byte. In the example above, the instruction codes 3Eh, C6h, and CEh are always followed by another byte. These are known as 2-byte instructions because the byte that follows the operation code is really part of the same instruction. The instruction code 32h is followed by 2 bytes that define a memory address. This is one of several 3-byte instructions. Many instructions don’t require any additional bytes, such as the code 76h, which halts the CPU. This variation in the length of instructions will certainly complicate the design of the CPU.

The particular sequence of codes and data in the previous example does not represent the best way to add two 16-bit numbers together. The instruction codes and the data are all jumbled together. Often it’s better to keep the code and data in separate areas of memory. You’ll get a better idea of how this works in the next chapter.

A central processing unit itself is composed of several components. The remainder of this chapter focuses on the most fundamental part of the CPU, which is known as the arithmetic logic unit, or ALU. This is the part of the CPU that adds and subtracts, as well as performing a couple of other useful tasks.

In an 8-bit CPU, the ALU is only capable of 8-bit addition and subtraction. But very often, we need to work with numbers that are 16 bits wide, or 24 bits, or 32 bits, and maybe even larger. As you’ve seen, these large numbers must be added and subtracted in bytes, starting with the least significant byte. Each subsequent 1-byte addition or subtraction must take into account the carry from the previous operation.

This implies that our ALU must be capable of the following basic operations:

· Add one 8-bit number to another.

· Add one 8-bit number to another with the possible carry from the previous addition. This is known as addition with carry.

· Subtract one 8-bit number from another.

· Subtract one 8-bit number from another with the possible carry from the previous subtraction. This is called subtraction with carry or, more commonly, subtraction with borrow, which is just slightly different terminology for the same thing.

For convenience, let’s shorten the descriptions of these four operations:

· Add

· Add with Carry

· Subtract

· Subtract with Borrow

Eventually I will be abbreviating these descriptions even more. Keep in mind that the Add with Carry and the Subtract with Borrow operations use the carry bit from the previous addition or subtraction. That bit could be 0 or 1, depending on whether the operation resulted in a carry or not. This means that the ALU must save the carry bit from one operation to use in the next operation.

As usual, handling the carry bit makes basic arithmetic considerably more complex than it would be without carries.

For example, suppose you needed to add a pair of 32-bit numbers, which are 4 bytes each. You’d first add the two least significant bytes. That addition might result in a carry, or it might not. Let’s call that carry the Carry flag, since it signifies that a carry resulted from the addition. That Carry flag might be 0 or 1. You’d then add the two next more-significant bytes with that Carry flag from the previous addition and continue with the other bytes.

The addition of a pair of 32-bit numbers requires four operations for the four pairs of bytes:

· Add

· Add with Carry

· Add with Carry

· Add with Carry

The process is similar for subtraction, except that the number being subtracted is converted to two’s complement, as discussed in Chapter 16: All the 0 bits become 1, and all the 1 bits become 0. For the first byte of a multibyte number, a 1 is added by setting the carry input of the adder. A subtraction of one 32-bit number from another thus also requires four operations:

· Subtract

· Subtract with Borrow

· Subtract with Borrow

· Subtract with Borrow

I want to encapsulate the circuitry that performs these additions and subtractions in a box that looks like this:

A box labeled Add-Subtract, with two inputs and an output, as well as Carry in and Carry out flags, and two function inputs.

This shouldn’t look too unusual. Two 8-bit inputs are added or subtracted for an 8-bit output. But this box does have a few differences from similar boxes that you’ve seen.

Usually when labeling an 8-bit adder I’ve used CI for Carry In and CO for Carry Out. But this box is labeled a little differently. I’m using the abbreviation CY to represent the Carry flag. As you’ll see, the CY Out is the same as the Carry Out from the adder, but CY In is the Carry flag from the previous addition or subtraction, and that might not be the same as the Carry In to the adder.

Also new in this diagram are the two inputs labeled F0 and F1. The F stands for “function,” and these two inputs govern what goes on inside the box:

A table showing how combinations of F-one and F-two correspond with the Add, Add with Carry, Subtract, and Subtract with Borrow functions.

Keep in mind that we are building something that works in conjunction with instruction codes stored in memory. If we design these instruction codes intelligently, then two bits of these codes might be used to provide the function inputs for this Add/Subtract module, just like the bits for the Add and Subtract codes in the previous chapter. Much of this Add/Subtract module should look familiar:

An 8-bit adder with a one’s complement module for subtractions.

The box labeled Ones’ Complement inverts the input when the Inv (“invert”) signal is 1. This is necessary as a first step to convert to two’s complement when performing subtraction.

The Carry Out from the adder becomes the CY Out from the Add/Subtract module. But what this diagram is missing is the Inv signal for the Ones’ Complement box and the CI signals for the adder. The Inv signal must be 1 for subtraction, but CI is a little more complicated. Let’s see if it can be clarified with a logic diagram:

A table showing how the F-one and F-zero signals correspond with the Invert and Carry In signals for the Add and Subtract module.

This table shows that the Inv signal to the Ones’ Complement inverter is the same as F1. That’s easy! But the CI input to the adder is a little messier. It’s 1 for a Subtract operation. That’s the first byte of a multibyte subtraction when it’s necessary to add one to the ones’ complement to get the two’s complement. If F0 is 1, then CI is the CY flag from the previous addition or subtraction. This can all be accomplished with the following circuitry:

A circuit diagram with two AND gates, an OR gate, and an Inverter to generate Invert and Carry In signals for the adder-subtractor.

An interactive version of the complete Add/Subtract module is available on the website CodeHiddenLanguage.com.

Images

What else might you want an arithmetic logic unit to do besides addition and subtraction? If your answer is “multiplication and division,” I’m afraid you’re going to be disappointed. If you consider how difficult it is to construct circuitry to add and subtract, just try to imagine the logical complexity of multiplication and division! Although such circuitry is possible, it is quite beyond the modest ambitions of this book. And since the Intel 8080 didn’t implement multiplication or division, neither will my CPU. If you’re patient, however, you’ll see by the end of Chapter 24 how the CPU that we’re building will have the basic tools to perform multiplication.

Instead of worrying about multiplication, let’s instead think about the second word in the phrase arithmetic logic unit. In this book, the word logic often refers to Boolean operations. How might these be useful?

Suppose you had the following ASCII codes stored in memory starting at some arbitrary address:

A block of memory containing ASCII codes for the text “Tom Sawyer.”

Perhaps you want to convert all that text to lowercase.

If you glance back to page 154 in Chapter 13, you’ll see that the ASCII codes for the uppercase letters range from 41h to 5Ah, and the ASCII codes for the lowercase letters range from 61h to 7Ah. The ASCII codes for corresponding uppercase and lowercase letters differ by 20h. If you know that a letter is uppercase, you can convert it to lowercase by adding 20h to the ASCII code. For example, you can add 20h to 54h, which is the ASCII code for uppercase T, to get 74h, which is the ASCII code for lowercase t. Here’s the addition in binary:

images

But you can’t do that for all the letters. If you add 20h to 6Fh, which is the ASCII code for the lowercase o, you’ll get 8Fh, which isn’t an ASCII code at all:

images

But look closely at the bit patterns. Here’s the uppercase and lowercase A, which are ASCII codes 41h and 61h:

images

And here’s uppercase and lowercase Z, which are ASCII codes 5Ah and 7Ah:

images

For all the letters, the only difference between the uppercase and lowercase letters is a single bit, which is the third bit from the left. You can convert uppercase to lowercase by setting that bit to 1. It doesn’t matter if the letter is already lowercase, because that bit would already be set.

So instead of adding 20h, it makes more sense to use a Boolean OR operation on each pair of bits. Do you recall this table from Chapter 6?

A table for the OR operation, showing that 0 or 0 is 0, 0 or 1 is 1, and 1 or 1 is 1.

The result of an OR operation is 1 if either of the two operands is 1.

Here’s the uppercase T again, but instead of adding 20h, let’s apply an OR operation between the corresponding bits of 54h (the T) and 20h:

images

The result is a 1 bit if either of the corresponding bits is 1. The advantage of this method is that lowercase letters remain unchanged. The result of an OR operation with lowercase o and 20h:

images

If you apply an OR operation of 20h with each of the letters in memory, you can convert all the letters to lowercase:

A block of memory containing ASCII codes for the text “Tom Sawyer” entirely in lowercase.

What we’ve been doing here has a name. It’s called a bitwise OR operation, because it performs an OR operation between each pair of corresponding bits. It turns out to be useful for other tasks besides converting text to lowercase. For that reason, I want to add the following circuitry to the arithmetic logic unit:

Eight OR gates assembled to perform a bitwise-OR operation between two 8-bit values.

For the corresponding 8 bits of 2 bytes labeled A and B, the circuit performs an OR operation. Let’s put that circuit in a box with a simple label:

A box labeled OR with two 8-bit inputs and an output.

Now consider how you might convert a block of text to uppercase. It’s a little different process because instead of setting a bit to 1, you want to set that bit to 0. Instead of an OR operation, you’ll need an AND operation. Here’s the table from Chapter 6:

A table for the AND operation, showing that 0 and 0 is 0, 0 and 1 is 0, and 1 and 1 is 1.

Instead of using an OR operation with 20h, you use the AND operation with DFh (11011111 in binary), which is the inverse of 20h. Here’s the lowercase o converted to uppercase:

images

The ASCII code 6Fh becomes the code 4Fh, which is the ASCII code for the uppercase O.

If the letter is already uppercase, an AND operation with DFh has no effect. Here’s the uppercase T:

images

It remains uppercase after the AND operation with DFh. Over the entire text, an AND operation converts every letter to uppercase:

A block of memory containing ASCII codes for the text “Tom Sawyer” entirely in uppercase.

It will be useful if the ALU contains a collection of eight AND gates to perform a bitwise AND operation between 2 bytes:

Eight AND gates assembled to perform a bitwise-AND operation between two 8-bit values.

Let’s put this in a little box for easy reference:

A box labeled AND with two 8-bit inputs and an output.

A bitwise AND operation is also useful for determining if particular bits of a byte are 0 or 1. For example, suppose you have a byte with an ASCII code for a letter, and you want to know if it’s lowercase or uppercase. Perform a bitwise AND with 20h. If the result is 20h, then the letter was lowercase. If the result is 00h, it was uppercase.

Another useful bitwise operation is exclusive OR, or XOR. The following table appeared in Chapter 14 when it became evident that such an operation was useful for addition:

A table showing the XOR logical operation.

Here’s a row of eight XOR gates wired to perform a bitwise XOR operation between 2 bytes:

Eight XOR gates assembled to perform a bitwise-XOR operation between two 8-bit values.

Once again, let’s put that circuit in a convenient box:

A box labeled -XOR with two 8-bit inputs and an output.

The XOR operation is useful for inverting bits. For example, if you were to apply an XOR operation with the ASCII codes for “TomSawyer” and 20h, all the uppercase letters would be converted to lowercase, and all the lowercase letters would become uppercase! Performing an XOR operation with FFh would invert all the bits in a value.

Earlier I defined two function bits labeled F1 and F0 for the Add/Subtract module. For the entire ALU, we’ll need three function bits:

A table showing how the three function bits correspond with the eight operations of the arithmetic logic unit.

I am not assigning these function codes arbitrarily. As you’ll see, these codes are implied by the actual instruction codes implemented by the Intel 8080 microprocessor. Besides the bitwise AND, XOR, and OR, you’ll see that another operation, called Compare, has been added to the table. I’ll discuss that shortly.

Toward the beginning of this chapter I showed you a little program with operation codes C6h and CEh, which performed an addition with the next byte in memory. The C6h code is a regular addition, while CEh is an addition with carry. These are called immediate instructions because they use the next byte following the operation code. In the Intel 8080, those two codes are part of a family of eight operation codes, shown here:

The Intel 8080 opcodes for the arithmetic and logical immediate instructions.

These opcodes have the following general form:

images

where F2, F1, and F0 are the bits shown in the previous table. Those three bits are used in the next circuit, which combines the bitwise AND, XOR, and OR boxes:

Images

An assemblage of bitwise AND, XOR, and OR gates selected with function signals.

The A and B inputs are routed to all three AND, XOR, and OR boxes. They all simultaneously perform their assigned duties. But only one must be selected as output. This is the purpose of the three boxes labeled TRI, which are 8-bit tri-state buffers. The tri-state buffers allow selecting one of them (or none of them) based on the three F0, F1, and F2 function signals. If F2 is 0, or if F2, F1, and F0 are all 1, then none of the outputs is selected.

Let’s encapsulate that diagram in another box:

A box labeled Logic, with A and B inputs, an Out output, and F2, F1, and F0 inputs.

That’s the logic component of the arithmetic logic unit.

The table above shows that if F2, F1, and F0 are all 1, then a Compare operation is performed. What does this mean?

Sometimes it’s useful to determine if one number is less than, greater than, or equal to another. How do you do this? Basically, it’s a subtraction. Subtract byte B from byte A. If the result is zero, you know that the two numbers are equal. Otherwise, if the Carry flag is set, then byte B is larger than byte A, and if the Carry flag is not set, byte A is larger.

The Compare operation is the same as the Subtract operation with the important distinction that the result isn’t saved anywhere. Instead, the Carry flag is saved.

But for a Compare operation, it’s also necessary to know if the result of the operation was zero, which indicates that the two bytes are equal to each other. This implies the need for another flag, called the Zero flag, which must be saved along with the Carry flag.

While we’re at it, let’s add another flag, called the Sign flag. This flag is set if the most significant bit of the result of an operation is 1. If the number is in two’s complement, the Sign flag indicates whether the number is negative or positive. The flag is 1 if the number is negative and 0 if the number is positive.

(The Intel 8080 actually defines five flags. I won’t be implementing the Auxiliary Carry flag, which indicates whether a carry results from the least significant 4 bits of the adder to the most significant 4 bits. This is necessary for implementing an Intel 8080 instruction called Decimal Adjust Accumulator, which converts the value in the accumulator from binary to binary-coded decimal, or BCD, which I discussed when building clocks in Chapter 18. My CPU won’t implement that instruction. The other flag I won’t be implementing is the Parity flag, which is 1 if the result of the arithmetic or logical operation has an even number of 1 bits. This is fairly easy to implement with seven XOR gates, but it’s much less useful than the other flags.)

It turns out that for some programming tasks, the Compare operation is more important than addition and subtract. For example, suppose you’re writing a program that finds some text on a webpage. This involves comparing the characters of the text on the webpage with the characters of the text you want to find.

The entire arithmetic logic unit combines the Add/Subtract module and the Logic module with some rather messy support circuity:

Images

The Add/Subtract module and the Logic module combine to form the arithmetic logic unit.

The two boxes labeled TRI are tri-state buffers. The Logic module enables an output only for the three combinations of F0, F1, and F2 that select AND, OR, and XOR operations. The tri-state buffer on the output of the Add/Subtract module is enabled only if F2 is 0, which indicates addition or subtraction.

Toward the bottom, two latches have their Clk inputs connected to a Clock input at the bottom left that applies to the entire ALU. Another tri-state buffer is controlled by an Enable signal at the bottom left that is also an input to the ALU. The tri-state buffer on the bottom right is the composite output from the Add/Subtract and Logic modules.

Most of the logic gates in the diagram are devoted to the Carry flag (abbreviated as CY). The Carry flag should be set if the F2 signal is 0 (indicating an addition or subtraction operation) or F1 and F0 are 1, which indicates a Compare operation.

The three flags are inputs to the latch in the center at the bottom. An eight-input NOR gate determines if the result of an operation is all zeros. That’s the Zero flag (abbreviated as Z). The high bit of the data output is the Sign flag (abbreviated as S). Although there are only three flags, they are treated as 3 bits of a byte as they are output from the ALU. The Carry flag then circles back up to the top to provide the CY In input of the Add/Sub module.

The next step is to hide all that messy logic in a simple box:

The arithmetic logic unit encapsulated in a box, with A and B outputs, three function selectors, Carry In, Enable, and Flags and Out data paths.

The arithmetic logic unit is complete!

Although the ALU is an extremely important component of the central processing unit, the CPU needs more than a way to perform arithmetic and logical operations on numbers. It needs a way to get numbers into the ALU, and a way to store the results and move them around. That’s the next step.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!