Chapter Twenty-Seven

Coding

All computers execute machine code, but programming in machine code is like eating with a toothpick. The bites are so small and the process so laborious that dinner takes forever. Likewise, the bytes of machine code perform the tiniest and simplest imaginable computing tasks—loading a number from memory into the processor, adding it to another, storing the result back to memory—so that it’s difficult to imagine how they contribute to an entire meal.

We have at least progressed from that primitive era at the beginning of the previous chapter, when we were using switches on a control panel to enter binary data into memory. In that chapter, we discovered that we could write simple programs that let us use the keyboard and the video display to enter and examine hexadecimal bytes of machine code. This was certainly better, but it’s not the last word in improvements.

As you know, the bytes of machine code are associated with certain short mnemonics, such as MOV, ADD, JMP, and HLT, that let us refer to the machine code in something vaguely resembling English. These mnemonics are often written with operands that further indicate what the machine-code instruction does. For example, the 8080 machine-code byte 46h causes the microprocessor to move into register B the byte stored at the memory address referenced by the 16-bit value in the register pair HL. This is more concisely written as

MOV B,M

where the M stands for “memory.” The total collection of these mnemonics (with some additional features) is a programming language of a type called assembly language. It’s much easier to write programs in assembly machine code. The only problem is that the CPU can’t understand assembly language directly!

In the early days of working with such a primitive computer, you’d probably spend a lot of time writing assembly-language programs on paper. Only when you were satisfied that you had something that might work would you then hand-assemble it, which means that you’d convert the assembly-language statements to machine-code bytes by hand using a chart or other reference material, and then enter them into memory.

What makes hand assembling so hard are all the jumps and calls. To hand-assemble a JMP or CALL instruction, you have to know the exact binary address of the destination, and that is dependent on having all the other machine code instructions in place. It’s much better to have the computer do this conversion for you. But how would this be done?

You might first write a text editor, which is a program that allows you to type lines of text and save them as a file. (Unfortunately, you’d have to hand-assemble this program.) You could then create text files containing assembly-language instructions. You would also need to hand-assemble another program, called an assembler. This program would read a text file containing assembly-language instructions and convert those instructions into machine code, which would be saved in another file. The contents of that file could then be loaded into memory for execution.

If you were running the CP/M operating system on your 8080 computer, much of this work would already be done for you. You’d already have all the tools you need. The text editor is named ED.COM and lets you create and modify text files. (Simple modern-day text editors include Notepad in Windows, and TextEdit included in macOS on Apple computers.) Let’s suppose you create a text file with the name PROGRAM1.ASM. The ASM file type indicates that this file contains an assembly-language program. The file might look something like this:

ORG 0100h

LXI DE,Text

MVI C,9

CALL 5

RET

Text: DB 'Hello!$'

END

This file has a couple of statements we haven’t seen before. The first one is an ORG (for Origin) statement. This statement does not correspond to an 8080 instruction. Instead, it indicates that the address of the next statement is to begin at address 0100h, which is the address where CP/M loads programs into memory.

The next statement is an LXI (Load Extended Immediate) instruction, which loads a 16-bit value into the register pair DE. This is one of several Intel 8080 instructions that my CPU doesn’t implement. In this case, that 16-bit value is given as the label Text. That label is located near the bottom of the program in front of a DB (Data Byte) statement, something else we haven’t seen before. The DB statement can be followed by several bytes separated by commas or (as I do here) by some text in single quotation marks.

The MVI (Move Immediate) statement moves the value 9 into register C. The CALL 5 statement makes a call into the CP/M operating system, which looks at the value in register C and jumps to the appropriate function. That function displays a string of characters beginning at the address given by the DE register pair and stopping when a dollar sign is encountered. (You’ll notice that the text in the last line of the program ends with a dollar sign. The use of a dollar sign to signify the end of a character string is quite odd, but that’s the way CP/M happens to work.) The final RET statement ends the program and returns control to CP/M. (That’s actually one of several ways to end a CP/M program.) The END statement indicates the end of the assembly-language file.

So now you have a text file containing seven lines of text. The next step is to assemble it. CP/M includes a program named ASM.COM, which is the CP/M assembler. You run ASM.COM from the CP/M command line like this:

ASM PROGRAM1.ASM

The ASM program examines the file PROGRAM1.ASM and creates a new file, named PROGRAM1.COM, that contains the machine code corresponding to the assembly-language statements that we wrote. (Actually there’s another step in the process, but it’s not important in this account of what happens.)

The PROGRAM1.COM file contains the following 16 bytes:

11 09 01 0E 09 CD 05 00 C9 48 65 6C 6C 6F 21 24

The first 3 bytes are the LXI instruction, the next 2 are the MVI instruction, the next 3 are the CALL instruction, and the next is the RET instruction. The last 7 bytes are the ASCII characters for the five letters of “Hello,” the exclamation point, and the dollar sign. You can then run the PROGRAM1 program from the CP/M command line:

PROGRAM1

The operating system loads that program into memory and runs. Appearing on the screen will be the greeting

Hello!

An assembler such as ASM.COM reads an assembly-language program (often called a source-code file) and writes out to a file containing machine code—an executable file. In the grand scheme of things, assemblers are fairly simple programs because there’s a one-to-one correspondence between the assembly-language mnemonics and machine code. The assembler works by separating each line of text into mnemonics and arguments and then comparing these small words and letters with a list that the assembler maintains of all the possible mnemonics and arguments. This is a process called parsing, and it involves a lot of CMP instructions followed by conditional jumps. These comparisons reveal which machine-code instructions correspond to each statement.

The string of bytes contained in the PROGRAM1.COM file begins with 11h, which is the LXI instruction. This is followed by the bytes 09h and 01h, which constitute the 16-bit address 0109h. The assembler figures out this address for you: If the LXI instruction itself is located at 0100h (as it is when CP/M loads the program into memory to run), address 0109h is where the text string begins. Generally a programmer using an assembler doesn’t need to worry about the specific addresses associated with different parts of the program.

The first person to write the first assembler had to hand-assemble the program, of course. A person who writes a new (perhaps improved) assembler for the same computer can write it in assembly language and then use the first assembler to assemble it. Once the new assembler is assembled, it can assemble itself.

Every time a new microprocessor is developed, a new assembler is needed. The new assembler, however, can first be written on an existing computer using that computer’s assembler. This is called a cross-assembler. The assembler runs on Computer A but creates code that runs on Computer B.

An assembler eliminates the less creative aspects of assembly-language program (the hand-assembling part), but assembly language still has two major problems. You’ve probably already surmised that the first problem is that programming in assembly language can be very tedious. You’re working down on the level of the CPU, and you have to worry about every little thing.

The second problem is that assembly language isn’t portable. If you were to write an assembly-language program for the Intel 8080, it would not run on the Motorola 6800. You must rewrite the program in 6800 assembly language. This probably won’t be as difficult as writing the original program because you’ve already solved the major organizational and algorithmic problems. But it’s still a lot of work.

Much of what computers do is mathematical calculation, but the way that math is carried out in assembly language is clumsy and awkward. It would be much preferable to instead express mathematical operations using a time-honored algebraic notation, for example:

Angle = 27.5

Hypotenuse = 125.2

Height = Hypotenuse × Sine(Angle)

If this text were actually part of a computer program, each of the three lines would be known as a statement. In programming, as in algebra, names such as Angle, Hypotenuse, and Height are called variables because they can be set to different values. The equals sign indicates an assignment: The variable Angle is set to the value 27.5, and Hypotenuse is set to 125.2. Sine is a function. Somewhere is some code that calculates the trigonometric sine of an angle and returns that value.

Keep in mind also that these numbers are not the integers common in assembly language; these are numbers with decimal points and fractional parts. In computing lingo, they are known as floating-point numbers.

If such statements were in a text file, it should be possible to write an assembly-language program that reads the text file and converts the algebraic expressions to machine code to perform the calculation. Well, why not?

What you’re on the verge of creating here is known as a high-level programming language. Assembly language is considered a low-level language because it’s very close to the hardware of the computer. Although the term high-level is used to describe any programming language other than assembly language, some languages are higher level than others. If you were the president of a company and you could sit at your computer and type in (or better yet, just prop your feet up on the desk and dictate) “Calculate all the profits and losses for this year, write up an annual report, print off a couple of thousand copies, and send them to all our stockholders,” you would be working with a very high-level language indeed! In the real world, programming languages don’t come anywhere close to that ideal.

Human languages are the result of thousands of years of complex influences, random changes, and adaptations. Even artificial languages such as Esperanto betray their origins in real language. High-level computer languages, however, are more deliberate conceptions. The challenge of inventing a programming language is quite appealing to some people because the language defines how a person conveys instructions to the computer. When I wrote the first edition of this book, I found a 1993 estimate that there had been over 1000 high-level languages invented and implemented since the beginning of the 1950s. At year-end 2021, a website entitled the Online Historical Encyclopedia of Programming Languages (hopl.info) puts the total at 8,945.

Of course, it’s not enough to simply define a high-level language, which involves developing a syntax to express all the things you want to do with the language. You must also write a compiler, which is the program that converts the statements of your high-level language to machine code. Like an assembler, a compiler must read through a source-code file character by character and break it down into short words and symbols and numbers. A compiler, however, is much more complex than an assembler. An assembler is simplified somewhat because of the one-to-one correspondence between assembly-language statements and machine code. A compiler usually must translate a single statement of a high-level language into many machine-code instructions. Compilers aren’t easy to write. Whole books are devoted to their design and construction.

High-level languages have advantages and disadvantages. A primary advantage is that high-level languages are usually easier to learn and to program in than assembly languages. Programs written in high-level languages are often clearer and more concise. High-level languages are often portable—that is, they aren’t dependent on a particular processor, as are assembly languages. They allow programmers to work without knowing about the underlying structure of the machine on which the program will be running. Of course, if you need to run the program on more than one processor, you’ll need compilers that generate machine code for those processors. The actual executable files are still specific to individual CPUs.

On the other hand, it’s very often the case that a good assembly-language programmer can write faster and more efficient code than a compiler can. What this means is that an executable produced from a program written in a high-level language will be larger and slower than a functionally identical program written in assembly language. (In recent years, however, this has become less obvious as microprocessors have become more complex and compilers have also become more sophisticated in optimizing code.)

Although a high-level language generally makes a processor much easier to use, it doesn’t make it any more powerful. Some high-level languages don’t support operations that are common on CPUs, such as bit shifting and bit testing. These tasks might be more difficult using a high-level language.

In the early days of home computers, most application programs were written in assembly language. These days, however, assembly languages are rarely used except for special purposes. As hardware has been added to processors that implements pipelining—the progressive execution of several instruction codes simultaneously—assembly language has become trickier and more difficult. At the same time, compilers have become more sophisticated. The larger storage and memory capacity of today’s computers has also played a role in this trend: Programmers no longer feel the need to create code that runs in a small amount of memory and fits on a small diskette.

Interim Archives/Getty Images

Designers of early computers attempted to formulate problems for them in algebraic notation, but the first real working compiler is generally considered to be Arithmetic Language version 0 (or A-0), created for the UNIVAC by Grace Murray Hopper (1906–1992) at Remington-Rand in 1952. Dr. Hopper also coined the term “compiler.” She got an early start with computers when she worked for Howard Aiken on the Mark I in 1944. In her eighties, she was still working in the computer industry doing public relations for Digital Equipment Corporation (DEC).

The oldest high-level language still in use today (although extensively revised over the years) is FORTRAN. Many early computer languages have made-up names that are written in uppercase because they’re acronyms of sorts. FORTRAN is a combination of the first three letters of FORmula and the first four letters of TRANslation. It was developed at IBM for the 704 series of computers in the mid-1950s. For many years, FORTRAN was considered the language of choice for scientists and engineers. It has very extensive floating-point support and even supports complex numbers, which are combinations of real and imaginary numbers.

COBOL—which stands for COmmon Business Oriented Language—is another old programming language that is still in use, primarily in financial institutions. COBOL was created by a committee of representatives from American industries and the US Department of Defense beginning in 1959, but it was influenced by Grace Hopper’s early compilers. In part, COBOL was designed so that managers, while probably not doing the actual coding, could at least read the program code and check that it was doing what it was supposed to be doing. (In real life, however, this rarely occurs.)

An extremely influential programming language that is not in use today (except possibly by hobbyists) is ALGOL. ALGOL stands for ALGOrithmic Language, but ALGOL also shares its name with the second brightest star in the constellation Perseus. Originally designed by an international committee in 1957 and 1958, ALGOL is the direct ancestor of many popular general-purpose languages of the past half century. It pioneered a concept eventually known as structured programming. Even today, sometimes people refer to “ALGOL-like” programming languages.

ALGOL established programming constructs that are now common to nearly all programming language. These were associated with certain keywords, which are words within the programming language to indicate particular operations. Multiple statements were combined into blocks, which were executed under certain conditions or with a particular number of iterations.

The if statement executes a statement or block of statement based on a logical condition—for example, if the variable height is less than 55. The for statement executes a statement or block of statements multiple times, usually based on incrementing a variable. An array is a collection of values of the same type—for example, the names of cities. Programs were organized into blocks and functions.

Although versions of FORTRAN, COBOL, and ALGOL were available for home computers, none of them had quite the impact on small machines that BASIC did.

BASIC (Beginner’s All-purpose Symbolic Instruction Code) was developed in 1964 by John Kemeny and Thomas Kurtz, of the Dartmouth Mathematics department, in connection with Dartmouth’s time-sharing system. Most students at Dartmouth weren’t math or engineering majors and hence couldn’t be expected to mess around with the complexity of computers and difficult program syntax. A Dartmouth student sitting at a terminal could create a BASIC program by simply typing BASIC statements preceded by numbers. The numbers indicated the order of the statements in the program. The first BASIC program in the first published BASIC instruction manual was

10 LET X = (7 + 8) / 3

20 PRINT X

30 END

Many subsequent implementations of BASIC have been in the form of interpreters rather than compilers. While a compiler reads a source-code file and creates an executable file of machine code, an interpreter reads source code and executes it directly without creating an executable file. Interpreters are easier to write than compilers, but the execution time of the interpreted program tends to be slower than that of a compiled program. On home computers, BASIC got an early start when buddies Bill Gates (born 1955) and Paul Allen (born 1953) wrote a BASIC interpreter for the Altair 8800 in 1975 and jump-started their company, Microsoft Corporation.

The Pascal programming language inherited much of its structure from ALGOL but included features from COBOL. Pascal was designed in the late 1960s by Swiss computer science professor Niklaus Wirth (born 1934). It was quite popular with early IBM PC programmers, but in a very specific form: the product Turbo Pascal, introduced by Borland International in 1983 for the bargain price of $49.95. Turbo Pascal was written by Danish student Anders Hejlsberg (born 1960) and came complete with an integrated development environment (or IDE). The text editor and the compiler were combined in a single program that facilitated very fast programming. Integrated development environments had been popular on large mainframe computers, but Turbo Pascal heralded their arrival on small machines.

Pascal was also a major influence on Ada, a language developed for use by the United States Department of Defense. The language was named after Augusta Ada Byron, who appeared in Chapter 15 as the chronicler of Charles Babbage’s Analytical Engine.

And then there’s C, a much-beloved programming language created between 1969 and 1973 largely by Dennis M. Ritchie at Bell Telephone Laboratories. People often ask why the language is called C. The simple answer is that it was derived from an early language called B, which was a simplified version of BCPL (Basic CPL), which was derived from CPL (Combined Programming Language).

Most programming languages seek to eliminate remnants of assembly language such as memory addresses. But C does not. C includes a feature called the pointer, which is basically a memory address. Pointers were very convenient for programmers who knew how to use them, but dangerous for nearly everyone else. By their ability to write over important areas of memory, pointers were a common source of bugs. Programmer Alan I. Holub wrote a book about C entitled Enough Rope to Shoot Yourself in the Foot.

C became the grandparent for a series of languages that were safer than C and added the facility to work with objects, which are programming entities that combine code and data in a very structured way. The most famous of these languages are C++, created by Danish computer scientist Bjarne Stroustrup (born 1950) in 1985; Java, designed by James Gosling (born 1955) at the Oracle Corporation in 1995; and C#, originally designed by Anders Hejlsberg at Microsoft in 2000. At the time of this writing, one of the most used programming languages is another C-influenced language called Python, originally designed by Dutch programmer Guido von Rossum (born 1956) in 1991. But if you’re reading this book in the 2030s or 2040s, you might be familiar with languages that haven’t even been invented yet!

Different high-level programming languages compel the programmer to think in different ways. For example, some newer programming languages focus on manipulating functions rather than variables. These are referred to as functional programming languages, and for a programmer accustomed to working with conventional procedural languages, they can initially seem quite strange. Yet they offer alternative solutions that can inspire programmers to entirely reorient their way of approaching problems. Regardless of the language, however, the CPU still executes the same old machine code.

Yet there are ways in which software can smooth over the differences among various CPUs and their native machine codes. Software can emulate various CPUs, allowing people to run old software and ancient computer games on modern computers. (This is nothing new: When Bill Gates and Paul Allen decided to write a BASIC interpreter for the Altair 8800, they tested it on an Intel 8080 emulator program that they wrote on a DEC PDP-10 mainframe computer at Harvard University.) Java and C# can be compiled into machine-code-like intermediate code that is then converted into machine code when the program is executed. A project called LLVM is intended to provide a virtual link between any high-level programming language and any set of instructions implemented by a CPU.

This is the magic of software. With sufficient memory and speed, any digital computer can do anything that any other digital computer can do. This is the implication of Alan Turing’s work on computability in the 1930s.

Yet what Turing also demonstrated is that there are certain algorithmic problems that will forever be out of reach of the digital computer, and one of these problems has startling implications: You can’t write a computer program that determines if another computer program is working correctly! This means that we can never be assured that our programs are working the way they should.

This is a sobering thought, and it’s why extensive testing and debugging are so important a part of the process of developing software.

One of the most successful C-influenced languages is JavaScript, originally designed by Brendan Eich (born 1961) at Netscape and first appearing in 1995. JavaScript is the language that webpages use to provide interactive capabilities that go beyond the simple presentation of text and bitmaps managed by HTML, the Hypertext Markup Language. As of this writing, almost 98% of the top 10 million websites use at least some JavaScript.

All web browsers in common use today understand JavaScript, which means that you can begin writing JavaScript programs on a desktop or laptop computer without downloading or installing any additional programming tools.

So… would you like to experiment with some JavaScript yourself?

All you need do is create an HTML file that contains some JavaScript using the Windows Notepad or macOS TextEdit program. You save it to a file and then load it into your favorite web browser, such as Edge, Chrome, or Safari.

On Windows, run the Notepad program. (You might need to find it using the Search facility on the Start menu.) It’s ready for you to type in some text.

On macOS, run the TextEdit program. (You might need to locate it using Spotlight Search.) On the first screen that comes up, click the New Document button. TextEdit is designed to create a rich-text file that contains text formatting information. You don’t want that. You want a plain-text file, so in the Format menu, select Make Plain Text. Also, in the Edit menu’s Spelling and Grammar section, deselect the options to check and correct your spelling.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!