Chapter Twenty-Four

Loops, Jumps, and Calls

Our lives are full of repetition. We count the days through the natural rhythms of the rotation of the earth, the revolution of the moon around the earth, and of the earth around the sun. Each day is different, but our lives are often structured by standard routines that are similar from day to day.

In a sense, repetition is also the essence of computing. Nobody needs a computer to add two numbers together. (Let’s hope not, anyway!) But adding a thousand or a million numbers together? That’s a job for a computer.

This relationship of computing and repetition was obvious early on. In Ada Lovelace’s famous 1843 discussion of Charles Babbage’s Analytical Engine, she wrote:

1. Both for brevity and for distinctness, a recurring group is called a cycle. A cycle of operations, then, must be understood to signify any set of operations which is repeated more than once. It is equally a cycle, whether it be repeated twice only, or an indefinite number of times; for it is the fact of a repetition occurring at all that constitutes it such. In many cases of analysis there is a recurring group of one or more cycles; that is, a cycle of a cycle, or a cycle of cycles.

In modern terminology, these cycles are often called loops. What she calls a cycle of a cycle is now called a nested loop.

The CPU that I’ve been building over the past several chapters seems deficient in that regard. At the end of the previous chapter, I showed you a little program that adds 5 bytes together that are stored in memory beginning at address 1000h:

A short program that accesses memory to add five numbers together.

The HL register pair is used to address memory. The first byte is read from memory into the accumulator with the MOV instruction, and then subsequent ADD instructions add the other 4 bytes to the accumulator. After each byte is read from memory, the value in the HL register pair is incremented with the INX instruction. Finally, the STA instruction stores the result in memory.

How would this program be enhanced to add a hundred or a thousand bytes together? Would you just keep adding INX and ADD instructions to match the number of bytes you need to add? That doesn’t seem quite right. It’s not a solution that would work well with other needs. It’s not a generalized solution.

What might instead be handy is a new instruction that allows repeating certain sequences of instructions, in this case the INX and ADD instructions. But what would that look like?

At first, such an instruction seems so different from existing instructions that you might fear it would require a complete overhaul of the CPU. But don’t despair quite yet.

Normally the program counter is incremented after the CPU fetches each instruction byte. This is how the CPU advances from one instruction to the next. An instruction that performs a loop must also somehow alter the program counter but in a different way.

You’ve seen how instructions such as LDA and STA are followed by 2 bytes that together form a 16-bit memory address. Consider an instruction that is followed by two bytes much like LDA and STA, but the 2 bytes following the instruction are not used to address memory. Instead, those 2 bytes are latched in the program counter. Such an instruction would change the normal course of execution because it would cause the program counter to effectively jump to a different address.

Let’s call this instruction JMP, for “jump.” That’s what it’s called in the Intel 8080 microprocessor. (The Motorola 6809 named a similar instruction BRA, for “branch.”)

The JMP instruction is followed by 2 bytes that form a 16-bit address. In the following example, this address is 0005h. Here’s what it might look like:

A program that uses a JMP instruction to repeat the execution of two other instructions.

Every time the INX and ADD instructions are executed, this JMP instruction then continues execution at address 0005h for another round of the INX and ADD instructions. That’s the loop.

Adding this JMP instruction to the CPU is surprisingly easy. But let’s hold off on that for a moment while we first acknowledge a problem: This little program with the JMP instruction will continue forever. There’s no way to stop the loop, and for that reason it’s called an infinite loop. The HL value will continue to be incremented, and the byte at that address will continue to be added to the sum in the accumulator. Eventually HL will equal FFFFh at the very end of memory. After it’s incremented again, it will roll over to become 0000h, and it will start adding instruction bytes to the accumulator!

Loops are extremely important in programming, but just as important is looping sometimes but not always.

Is there something already in the CPU that might be able to control whether a jump occurs or not?

Yes, there is. You’ll recall that the arithmetic logic unit (ALU) built in Chapter 21 saves several flags in a latch. These are the Carry flag, the Zero flag, and the Sign flag, and they indicate, respectively, whether the ALU operation caused a carry, whether the result was equal to zero, and whether the high bit of the result was 1, indicating a negative two’s complement number.

We might conceive of an instruction that only jumps if the Zero flag is set, or if the Zero flag is not set. In fact, we might define a little collection of jump instructions:

A table of seven variations of a Jump instruction.

I am not inventing these instructions and operation codes! These are instructions implemented by the Intel 8080 microprocessor that I’m using as a guide in building my own subset of that CPU. The addr in the first column is a 2-byte memory address that follows the opcode.

The JMP instruction is known as an unconditional jump. It causes the CPU to alter its normal course of execution regardless of the settings of the ALU flags. The others are known as conditional jumps. These instructions alter the program counter only if certain flags are set or not set in the ALU. (The 8080 CPU also implements two more conditional jumps that are based on the Parity flag. I mentioned that flag in Chapter 21, but my CPU doesn’t implement it.)

Let’s see how these conditional jumps might work in a program. Suppose you want to add up 200 bytes that are stored in memory beginning at address 1000h.

The trick here is to use one of the registers to store a value called a counter. The counter begins at the value 200, which is the number of bytes to add. Every time a byte is accessed and added, this counter is decremented. At any time, the value of the counter indicates the number of bytes left to add. When it reaches zero, the job is completed.

This means that the program needs to juggle two arithmetic operations. It needs to maintain a running total of the bytes that it’s adding up, and it needs to decrement the counter every time it adds a new byte.

This creates a bit of a problem: As you’ll recall, all the arithmetic and logic operations use the accumulator, which means that the program has to move bytes from registers into the accumulator to do some arithmetic; then it needs to move the new bytes back into registers.

Let’s decide to store the running total of the bytes in register B, and the counter in register C. These values must be moved to the accumulator for any arithmetic operations and then moved back to B and C for the next repetition of the instructions.

Because this program is a little longer than those you’ve seen previously, I’ve divided it into three sections.

The first part of a computer program is commonly called the initialization:

A program that uses a conditional jump and an unconditional jump to add 200 numbers.

This section sets the 16-bit composite value of the HL register pair to 1000h, which is the location of the numbers to be added. Register C is set to decimal 200 (hexadecimal C8h), which is how many numbers must be added. Finally, register B is set to the first number in that list.

The second part of this program contains the instructions that are repeated:

The second part of a program to add up 200 numbers from memory.

This section begins by copying the value of the counter to the accumulator. The SUI instruction subtracts 1 from that number. The first time through, the value 200 becomes 199. If that value is zero (which it obviously isn’t yet), the JZ instruction jumps to the address 0015h, which is the next address after this block. This kind of instruction is known as breaking out of the loop.

Otherwise, the value in the accumulator (which is now 199 during the first time through) is moved back to register C. Now HL can be incremented with INX. The running total (stored in register B) is moved to A. The value at memory address HL is added to that, and then the new total is copied back to register B. Then an unconditional JMP instruction jumps up to the top for the next time through.

Each of these times through the code is commonly called an iteration. Eventually, the value in register C will be 1, and when 1 is subtracted from that, it equals zero, and the JZ instruction jumps to address 0015h:

The final part of the program to add 200 numbers.

Register B contains the final sum of all 200 numbers. It’s moved into the accumulator in preparation for the STA instruction, which stores the value in memory. The program is then halted.

Notice that the program is very easy to modify if the numbers to be added reside at a different memory location or if there are more or less than 200. All that information is set at the very top of the program and can easily be changed. It’s always a good idea when writing a computer program to think about how it might be changed in the future.

It is very rarely the case that computer programs can be written only one way. There’s a slightly different way to write this program that involves only one jump instruction. This version starts out almost like the first one:

The beginning of an alternate program to add 200 numbers in a list.

The only difference is that the value in register C is set to 199 rather than 200. You’ll see the reason for this shortly.

The middle of the program has been rearranged. Now it begins by incrementing HL and adding the next value on the list:

The second part of an alternate program to add 200 numbers together.

After the next value is added, the counter value in register C is moved to A, decreased by 1, and the new value is moved back into register A. Then the JNZ instruction jumps to the top of the loop if the result of the SUI instruction is not zero.

If the result of the SUI instruction is zero, then the program continues with the next instruction after JNZ. This is the conclusion of the program that stores the accumulated sum in memory and halts:

The conclusion of an alternate program to add 200 numbers together.

By removing one of the jump instructions, the program has been shortened by 3 bytes, but it might seem a little more complex. Do you see why register C needs to be set to 199 rather than 200? It’s because that value is being modified and examined after a value from memory has been added. If there were only two numbers in the list to be added, both those numbers would be accessed before the first iteration of the JNZ instruction. Hence, C would have to be initialized to 1 rather than 2. This program wouldn’t work at all for only one byte in the list. Do you see why?

It’s common to make mistakes when determining how many times a loop must be iterated. Problems like these are so common in programming that they were given a name. They are referred to as off-by-one errors.

Perhaps you don’t know how many numbers need to be added, but you do know that the last number in the list is 00h. This 00h value signals to your program that the list is complete. Such a value is sometimes called a sentinel. In this case, you’d want to use a Compare instruction to compare the value from memory with 00h to determine when to break out of the loop.

Starting with this alternate program that uses a sentinel, I want to stop showing you values in memory, and just show you the instructions. Instead of showing you memory addresses, I’m going to use words that are called labels. They may look like words, but they still represent locations in memory. The labels are followed by colons:

Start: MVI L,00h

MVI H,10h

MVI B,00h

Loop: MOV A,M

CPI 00h

JZ End

ADD B

MOV B,A

INX HL

JMP Loop

End: MOV A,B

STA Result

HLT

Result:

After the next value in memory has been loaded into the accumulator with the MOV A,M instruction, the CPI instruction compares it with 00h. If A equals 00h, the Zero flag is set, and the JZ instruction jumps to the end. Otherwise, the value is added to the running total in B, and HL is incremented for the next iteration.

Using labels allows us to avoid figuring out the memory addresses of the instructions, but it’s always possible to calculate the memory locations of these labels. If the program starts at memory location 0000h, then the first three instructions require 2 bytes each, so the label Loop represents memory address 0006h. The next seven instructions occupy a total of 12 bytes, so the label End is the memory address 0012h, and Result is 0017h.

If you haven’t surmised this already, the conditional jump is a very important feature of a CPU, but it’s perhaps much more important than you might realize. Let me tell you why.

In 1936, a 24-year-old graduate of Cambridge University named Alan Turing set out to solve a problem in mathematical logic, posed by ­German mathematician David Hilbert, known as the Entscheidungsproblem, or decision problem: Is there a process that could determine whether an arbitrary statement in mathematical logic is decidable—that is, could it be determined whether the statement were true or false?

In answering this question, Alan Turing took an extremely unusual approach. He hypothesized the existence of a simple computing machine that functioned by simple rules. He did not actually build this machine. It was instead a computer of the mind. But besides proving the Entscheidungsproblem false, he established some basic concepts of digital computing that have had an impact far beyond this problem in mathematical logic.

An alternate version of a looping program that uses a sentinel to break out of the loop.

Pictures from History/Universal Images Group/Getty Images

The imaginary computing machine that Turing invented is now known as a Turing machine, and in terms of computational ability, it is functionally equivalent to all the digital computers that have been built since then. (If you’re curious about exploring Turing’s original paper describing his imaginary computer, my book The Annotated Turing: A Guided Tour through Alan Turing’s Historic Paper on Computability and the Turing Machine might be helpful.)

Different digital computers run at different speeds; they can access different amounts of memory and storage; they have different types of hardware attached to them. But in processing power, they are all functionally equivalent. They can all do the same type of tasks because they all have one very special feature: a conditional jump based on the result of an arithmetic operation.

All programming languages that support a conditional jump (or something equivalent to it) are fundamentally equivalent. These programming languages are said to be Turing complete. Nearly all programming languages satisfy this condition, but markup languages—such as the HyperText Markup Language (HTML) used in webpages—are not Turing complete.

Besides the jump instructions that I listed earlier in this chapter, another instruction is useful for performing jumps. This one is based on the value in HL:

A table showing the PCHL instruction.

The seven jump instructions and PCHL are fairly easily incorporated into the timing circuitry shown in the previous chapter. In the circuit on page 366 in Chapter 23, recall that the three decoders have inputs corresponding to the 8 bits of the operation code:

Circuity to decode the instruction byte into Jump instructions.

Various combinations of the outputs of these decoders are then used to generate signals when the opcode corresponds to a jump instruction.

All the jump instructions except PCHL can be consolidated in a group with a seven-input OR gate:

A seven-input OR gate consolidates all the Jump instructions into a Jump Group signal.

This can be used to integrate the jump instructions into the circuit on page 367 in Chapter 23, which determines how many instruction bytes must be fetched from memory and how many cycles are required to execute each instruction. The Jump Group signal indicates that 3 bytes must be fetched from memory: the operation code and a 2-byte address. The PCHL instruction is only 1 byte in length. All these instructions require only one cycle to execute, and only involve the address bus.

For the execution of the jump instructions, let’s create a signal that indicates whether a conditional jump should occur. The signals for the decoded instruction byte must be combined with flags from the ALU in Chapter 21:

A circuit to determine if a conditional jump should occur.

It’s then fairly straightforward to incorporate these signals into the diode ROM matrix shown on page 374 in Chapter 23:

Additions to the diode ROM matrix for implementing jump instructions.

The JMP and conditional jump instructions enable Instruction Latches 2 and 3 on the address bus, while the PCHL instruction enables HL on the address bus. In all cases, that address is stored in the program counter.

Images

An interactive version of the enhanced CPU is available on the website CodeHiddenLanguage.com.

Almost anything substantial that you need to do in a computer program involves repetition and becomes an excellent candidate for a loop. Multiplication is a good example. Back in Chapter 21 I promised to show you how to persuade this CPU to multiply, and now it’s time to see how it’s done.

Let’s look at the simplest case, which is a multiplication of 2 bytes—for example, 132 times 209, or in hexadecimal, 84h times D1h. These two numbers are called the multiplier and the multiplicand, and the result is the product.

In general, multiplying one byte by another creates a product that is 2 bytes wide. For my example, it’s easy to calculate the product as 27,588 or 6BC4h, but let’s have the CPU do it.

In the past I’ve used registers H and L for a 16-bit memory address, but you can alternatively use H and L as normal 8-bit registers, or you can use the register pair HL to store a 2-byte value. In this example, I’ll use HL to store the product. The code to multiply two bytes begins by setting the B register to the multiplicand and C to the multiplier, and the H and L registers to zero:

Start: MVI B,D1h ; Set B to multiplicand

MVI C,84h ; Set C to multiplier

MVI H,00h ; Initialize HL to zero

MVI L,00h

I’ve added little descriptions of the instructions at the right following a semicolon. These are known as comments, and the use of a semicolon to preface a comment can be found in Intel’s original documentation of the 8080 CPU.

I’m first going to show you a simple way to multiply two numbers, which is just repeated addition. I’ll be adding the multiplier to the HL registers a number of times equal to the multiplicand.

The first step is to check whether the multiplicand (stored in register B) is zero. If so, the multiplication is complete:

Loop: MOV A,B ; Check whether B is zero

CPI 00h

JZ Done ; All finished if that’s the case

If that’s not the case, then the multiplier (stored in register C) is added to the contents of registers H and L. Notice that this is essentially a 16-bit addition: C is added to L by first moving the contents of L to the accumulator, and then zero is added to H with a possible carry resulting from the first addition:

MOV A,L ; Add C to HL

ADD C

MOV L,A

MOV A,H

ACI 00h

MOV H,A

Now the multiplicand in register B is decremented, indicating one fewer number to be added to HL, and the program jumps back up to Loop for another iteration:

MOV A,B ; Decrement B

SBI 01h

MOV B,A

JMP Loop ; Repeat calculation

When the earlier jump to the Done label occurs, the multiplication is finished, and the HL registers contain the product:

Done: HLT ; HL contains result

This is not the best way to multiply two bytes, but it has the advantage of being easy to understand. Solutions of this type are sometimes called brute-force approaches. No consideration is being given to performing the multiplication as quickly as possible. The code doesn’t even compare the two numbers to use the smaller of these numbers for the loop. Adding just a little more code to the program would allow performing 132 additions of 209 rather than 209 additions of 132.

Is there a better way to perform this multiplication? Consider how you do a decimal multiplication on paper:

images

The two numbers under the first underline are 132 times 9, and then 132 times 2 shifted left two spaces, effectively 132 times 200. Notice that you don’t even need to write down 132 times 0, because that’s just zero. Rather than performing 209 additions or 132 additions, only two numbers need to be added!

What does this multiplication look like in binary? The multiplier (which is 132 in decimal) is 10000100 in binary, and the multiplicand (209 in decimal) is 11010001 in binary:

images

For each bit in the multiplicand (11010001) starting at the right, multiply that bit by the multiplier (10000100). If the bit is 1, then the result is the multiplier, shifted left for each bit. If the bit is 0, the result is 0 so it can be ignored.

Under the first underline are just four instances of the multiplier (10000100). There are only four because there are only four 1s in the multiplicand (11010001).

This approach reduces the number of additions to a bare minimum. Performing the multiplication like this becomes even more essential if you are multiplying 16-bit or 32-bit numbers.

But it also seems more complicated in some ways. We’re going to need to test which bits of the multiplicand are 1 and which bits are 0.

This testing of bits makes use of the 8080 ANA (AND with accumulator) instruction. This instruction performs a bitwise AND operation between two bytes. It’s called a bitwiseAND because for each bit, the result is 1 if the corresponding bits of two bytes are 1, but 0 otherwise.

Let’s put the multiplicand in register D. In this example, that’s the byte D1h:

MVI D,D1h

How can you tell if the least significant bit of register D is 1? Perform an ANA operation between D and the value 01h. You can first set register E to this value:

MVI E,01h

Because the ALU works only with the accumulator, you’ll need to move one of the numbers into the accumulator first:

MOV A,D

ANA E

The result of this AND operation is 1 if the rightmost bit of D (the least significant bit) is 1 and 0 otherwise. This means that the Zero flag is set if the rightmost bit of D is 0. That flag allows a conditional jump to be performed.

For the next bit, you’ll need to perform an AND operation not with 01h but with 02h, and for the remaining bits you’ll be performing AND operations with 04h, 08h, 10h, 20h, 40h, and 80h. Look at this sequence for a moment and you might realize that each value is twice the previous value: 01h doubled is 02h, and that doubled is 04h, and that doubled is 08h, and so forth. This is useful information!

Register E starts out at 01h. You can double it by adding it to itself:

MOV A,E

ADD E

MOV E,A

Now the value in E equals 02h. Execute those three instructions again, and it equals 04h, then 08h, and so on. What this simple operation essentially does is progressively shift a bit from the least significant position to the most significant position, from 01h to 80h.

It will also be necessary to shift the multiplier to add to the result. That means that the multiplier will no longer fit in an 8-bit register, and it must somehow be treated as a 16-bit value. For that reason, the multiplier is first stored in register C, but register B is set to 0. You can treat registers B and C as a pair that stores this 16-bit multiplier, and it can be shifted for the 16-bit additions. The combination of registers B and C can be referred to as BC.

Here’s how the registers are initialized for this new and improved multiplier:

Start: MVI D,D1h ; Multiplicand

MVI C,84h ; Store multiplier in BC

MVI B,00h

MVI E,01h ; Bit tester

MVI H,00h ; Use HL for 2-byte result

MVI L,00h

The Loop section begins by testing whether a bit in the multiplicand is a 1 or 0:

Loop: MOV A,D

ANA E ; Test whether bit is 0 or 1

JZ Skip

If the bit is 1, the result is not zero, and the following code is executed to add the value of the BC register pair to the HL register pair. But the 8-bit registers need to be handled individually. Notice that ADD is used for the low bytes, while ADC is used for the high byte to take account of a carry:

MOV A,L ; Add BC to HL

ADD C

MOV L,A

MOV A,H

ADC B

MOV H,A

If you were using a real Intel 8080 rather than the subset that I’ve constructed, you could replace those six instructions with DAD BC, which very conveniently adds BC to HL. DAD is one of several 8080 instructions that work with 16-bit values.

The next job is to double the value of BC, essentially shifting it leftward for the next addition. This code is executed regardless of whether BC has been added to HL or not:

Skip: MOV A,C ; Double BC, the multiplier

ADD C

MOV C,A

MOV A,B

ADC B

MOV B,A

The next step is to double the value of register E, which is the bit tester. If the value is not zero, then it jumps back up to the Loop label for another iteration.

MOV A,E ; Double E, the bit tester

ADD E

MOV E,A

JNZ Loop

The code following the Loop label is executed exactly eight times. After E has been doubled eight times, the 8-bit register overflows, and E now equals zero. The multiplication is complete:

Done: HLT ; HL contains result

If you need to multiply two 16-bit values or two 32-bit values, the job obviously gets rather more complicated, and you’ll need to use more registers. When you run out of registers for storing intermediate values, you can use an area of memory for temporary storage. A small block of memory used in that way is commonly referred to as scratchpad memory.

The object of this exercise is not to frighten you. The object is not to dissuade you from pursuing a career in computer programming. It is to demonstrate that an assemblage of logic gates that respond to codes stored in memory can indeed combine very simple operations to perform complex tasks.

In real-life computer programming, multiplication is much easier using high-level languages (as they are called), which I’ll discuss in Chapter 27. It is the magic of software that other people have done the hard work so you don’t have to.

Multiplication in machine code requires shifting bits, which was accomplished earlier by adding a value to itself. If you were working with a real Intel 8080 microprocessor rather than the subset I’ve built, you would have a better way to shift bits. The Intel 8080 contains four instructions that perform bit shifting without the nuisance of adding registers to themselves. They are called rotate instructions:

A table showing the four Intel 8080 Rotate instructions.

These instructions always perform the operation on the value in the accumulator, and they affect the Carry flag.

The RLC instruction shifts the bits of the accumulator to the left. However, the most significant bit is used to set both the Carry flag and the least significant bit:

The RLC instructions shifts bits to the left with the most-significant bits setting the Carry flag and least-significant bit.

The RRC instruction is similar except that it shifts bits to the right. The least significant bit is used to set both the Carry flag and the most significant bit:

The RRC instructions shifts bits to the right with the least-significant bit setting the Carry flag and most-significant bit.

The RAL instruction is similar to doubling the accumulator, except that the existing Carry flag is used to set the least significant bit. This is useful when shifting a multibyte value:

The RAL instruction shifts bits to the left with the Carry flag setting the least-significant bit, and the most-significant bit setting the Carry flag.

The RAR instruction is similar to RAL but rotates the bits right:

The RAR instruction shifts bits to the right with the Carry flag setting the most-significant bit, and the least-significant bit setting the Carry flag.

While these rotate instructions are certainly useful in some circumstances, they are not essential, and I won’t be adding them to the CPU that I’ve been building.

You’ve seen how you can use jumps and loops to execute a group of instructions over and over again. But there will often be times when you want a more flexible way to execute a group of instructions. Perhaps you’ve written a group of instructions that you need to execute from different parts of a computer program. (Perhaps a versatile multiplication is one of them.) These groups of instructions are often called functions or ­procedures or subroutines, or simply routines.

The Intel 8080 CPU implements subroutines with an instruction named CALL. The syntax of the CALL instruction looks a lot like JMP in that it is followed by a memory address:

CALL addr

Like the JMP instruction, the CALL statement jumps to that address to continue execution. But CALL is different from JMP in that it first saves a reminder of where it jumped from—specifically, the address of the instruction that follows the CALL instruction. As you’ll see shortly, this address is stored in a very special place.

Another instruction, called RET (meaning “return”), is also similar to JMP, but the address that it jumps to is the address saved by the CALL instruction. Subroutines often end with a RET statement.

Here are the 8080 instructions for CALL and RET:

Images

The Intel 8080 also supports conditional calls and conditional returns, but these are used much less frequently than are CALL and RET.

Let’s look at a practical example. Suppose you were writing a program where you need to display the value of a byte—for example, the byte 5Bh. You saw in Chapter 13 how you can use ASCII to display letters, numbers, and symbols. But you can’t display the value of the byte 5Bh using the ASCII code 5Bh. That’s the ASCII code for a left square bracket character! Instead, a byte such as 5Bh would need to be converted into two ASCII codes:

· 35h, which is the ASCII code for the character 5

· 42h, which is the ASCII code for the character B

This conversion displays the value of bytes in a way that people understand (or at least people who know hexadecimal).

The strategy here is to first separate the byte into two parts: the top 4 bits and the bottom 4 bits, sometimes called nibbles. In this example, the byte 5Bh is separated into 05h and 0Bh.

Then, each of these 4-bit values is converted into ASCII. For values from 0h to 9h, the ASCII codes are 30h to 39h for the characters 0 through 9. (See the tables on pages 153 and 154 in Chapter 13 if you need an ASCII refresher.) For values from Ah to Fh, the ASCII codes are 41h to 46h for the characters.

Here’s a little subroutine that converts a 4-bit value in the accumulator into ASCII:

A subroutine to convert a four-bit value into ASCII.

I’ve reverted back to showing the memory locations because they’re important to demonstrate what’s going on here. This subroutine happens to begin at memory location 1532h, but there’s nothing special about that. It just happens to be where I decided this subroutine resides in memory.

The subroutine assumes that the accumulator contains the value to be converted. Such an assumed value is often called an argument or parameter to the subroutine.

The subroutine begins with a Compare Immediate instruction, which sets the ALU flags as if it had performed a subtraction. If the accumulator contains 05h (for example), subtracting 0Ah from that number requires a borrow, so the instruction sets the Carry flag. Because the Carry flag is set, the JC instruction jumps to the instruction at the label Number, which adds 30h to the accumulator, making it 35h, the ASCII code for the number 5.

If instead the accumulator contains something like the value 0Bh, a borrow is not required when 0Ah is subtracted. The CPI instruction does not set the Carry flag, so no jump occurs. First, 07h is added to the accumulator (making it 0Bh plus 07h, or 12h, in this example), and then the second ADI instruction adds 30h, making it 42h, the ASCII code for the letter B. Adding two values is a little trick to make use of the second ADI instruction for both letters and numbers.

In either case, the next instruction is a RET, which ends the subroutine.

I said we’d be writing a subroutine that converted a whole byte into two ASCII codes. This second subroutine has two CALL instructions to Digit, once with the low nibble and then with the high nibble. At the beginning of the subroutine, the byte to be converted is in the accumulator, and results are stored in registers H and L. This subroutine, called ToAscii, happens to reside beginning at memory address 14F8h:

A subroutine to convert a byte into two ASCII code.

This subroutine first saves the original byte in B, and then the ANI (AND Immediate) instruction performs a bitwise AND operation with 0Fh to preserve only the low four bits. Then it makes a CALL to the Digit subroutine, located at address 1532h. That result is saved in L. The original byte is retrieved from register B, and then four RRC instructions shift the high nibble down to the low 4 bits. After another ANI instruction is another call to Digit. That result is stored in register H, and the subroutine ends with the RET instruction.

Let’s see how this works. Somewhere else might be a little snippet of code that contains a CALL instruction for the ToAscii subroutine, located at 14F8h:

A snippet of code that makes a call to the ToAscii subroutine.

When the program continues at address 0628h, the values of H and L contain ASCII codes for the two digits of 5Bh.

How do CALL and RET work?

I mentioned earlier that when a CALL instruction is executed, an address is stored in a very special place that allows the code to resume after the subroutine has completed. That very special place is called the stack. It’s an area of memory that is located as far from everything else as possible. In an 8-bit CPU like the Intel 8080, the stack resides at the very end of memory.

The Intel 8080 contains a 16-bit register called the stack pointer. When the 8080 is reset, the stack pointer is initialized to the address 0000h. However, a program can change that address with the instructions SPHL (set stack pointer from HL) or LXI SP (load stack pointer from immediate address). But let’s leave it at the default value of 0000h.

When the Intel 8080 executes the CALL ToAscii instruction, several things happen in sequence:

· The stack pointer is decremented. Because it was initially set to the value 0000h, decrementing causes it to become the value FFFFh, which is the maximum 16-bit value, and which points to the last byte of 16-bit memory.

· The high byte of the address following the CALL instruction (which is address 0628h, and is the current value of the program counter) is saved in memory at the location addressed by the stack pointer. That byte is 06h.

· The stack pointer is decremented, now becoming the value FFFEh.

· The low byte of the address following the CALL instruction is saved in memory at the location addressed by the stack pointer. That byte is 28h.

· The address in the CALL statement (14F8h) is loaded into the program counter, in effect jumping to that address. This is the address of the ToAscii routine.

The upper area of RAM now looks like this:

Upper area of memory following the call to ToAscii.

The CALL instruction has effectively left a little trail of breadcrumbs to find its way back home.

The ToAscii routine is now being executed. It too has a CALL instruction to the Digit routine. The memory location in the ToAscii routine following that instruction is 14FEh, so when that CALL instruction occurs, that address is stored on the stack, which now looks like this:

Upper area of memory following the first call to Digit.

The value of the stack pointer is now FFFCh, and the Digit routine is now in progress. When the RET instruction in the Digit routine is executed, here’s what happens:

· The byte at the memory location addressed by the stack pointer is accessed. That byte is FEh.

· The stack pointer is incremented.

· The byte at the memory location addressed by the stack pointer is accessed. That byte is 14h.

· The stack pointer is incremented.

· Those two bytes are loaded into the program counter, which effectively jumps to memory location 14FEh in the ToAscii routine, returning to the routine that called Digit.

The stack is now returned to the state it was in prior to the first call to Digit:

Upper area of memory following a return from the Digit routine.

The stack pointer is now FFFEh. The address 14FEh is still stored in memory, but it has become irrelevant. The next call to Digit causes a new return address to be stored on the stack:

Upper area of memory following the second call to Digit.

That’s the address following the second call to Digit in the ToAscii routine. When Digit executes the RET instruction again, it jumps to the address 1509h in the ToAscii routine. The stack now looks like this again:

Upper area of memory following a return from the second call to Digit.

Now the RET instruction in the ToAscii routine can be executed. That retrieves the address 0628h from the stack and branches to that address, which is the address following the call to ToAscii.

And that’s how the stack works.

Formally, the stack is categorized as a Last-In-First-Out (or LIFO) form of storage. The most recent value added to the stack becomes the next value retrieved from the stack. Often the stack is visualized as a pile of cafeteria plates held aloft by a springy support. Plates can be added to the pile and then retrieved in the opposite order.

When something is added to the stack, it is said to be “pushed,” and when something is removed from the stack, it is “popped.” The Intel 8080 also supports several PUSH and POP instructions to save registers on the stack and later retrieve them:

A table showing the Intel 8080 Push and Pop instructions.

The abbreviation PSW stands for Program Status Word, and it’s nothing new. It’s just the accumulator in one byte and the ALU flags in another byte.

The PUSH and POP instructions are a convenient way to save the contents of registers when making calls to subroutines. Sometimes code calling a subroutine will push the contents of the registers before the CALL and pop them afterwards. This allows the subroutine to use the registers without worrying about how this will affect the code calling the subroutine. Or a subroutine itself will push registers at the beginning and pop them before the RET.

PUSH and POP instructions must be balanced, as must CALL and RET instructions. If a subroutine calls PUSH twice and POP only once and then executes a RET instruction, the code will jump to a location you probably don’t want it to go!

It’s possible for errant code to pop the stack too many times, which causes the stack pointer to begin addressing the beginning of memory rather than the end! This problem is called stack underflow, and it can result in the contents of the stack overwriting code. A related problem is when too much is pushed on the stack and it grows in size, also probably overwriting code. This is called stack overflow, a condition that has also provided the name for a popular internet forum for programmers seeking answers to their technical problems.

The CALL and RET instructions are not required for a CPU to be Turing complete, but in practice they are quite convenient, and some would even call them indispensable. Subroutines are the primary organizational elements of assembly-language programs, and they also play important roles in many other types of programming languages.

I’m afraid I won’t be adding CALL, RET, PUSH, and POP to the CPU I’ve been designing over the past several chapters. I feel very bad about it, but they would require a more versatile design than the one I’ve shown you.

But I’m sure you can easily imagine how they would be implemented: A new 16-bit latch called the stack pointer would be added to the address bus. This would look much like the latch that stores the program counter. That’s the easy part. But it would also be necessary to push the program counter on the stack during CALL instructions and pop it from the stack during RET instructions, and that would require that the 2 bytes of the program counter also be on the data bus. That is not the case in the present design.

Images

Although I have not added the stack-related instructions to the CPU I’ve been building, I have created a complete 8080 emulator on the website CodeHiddenLanguage.com.

Over the past several chapters, you have seen how an 8-bit microprocessor like the Intel 8080 executes instruction codes. Intel introduced the 8080 in 1974, and it is now considered quite primitive in comparison with everything that’s come since. As CPUs grew in size to accommodate 16-bit, 32-bit, and even 64-bit processing, they became much more complex as well.

Still, however, all CPUs work in basically the same way: They execute instructions to fetch bytes from memory, perform arithmetic and logic operations on them, and store them back into memory.

It’s time to explore what else is required to make a real computer.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!