Chapter Twenty-Five
The central processing unit (CPU) is certainly the most important component of a computer, but it must be supplemented with other hardware. As you’ve seen, a computer also requires random access memory (RAM) that contains both machine-code instructions for the processor to execute and data for these instructions to access. As you’ll also recall, RAM is volatile—it loses its contents when the power is turned off. So another useful component of a computer is a long-term mass storage device that can retain code and data in the absence of power.
The computer must also include some way for those instructions to get into RAM, and some way for the results of the program to be examined. Modern computers also have microphones, cameras, and speakers, as well as radio transmitters and receivers that connect to Wi-Fi, Bluetooth devices, and the satellites that make up the Global Positioning System (GPS).
These are known as input devices and output devices, commonly referred to collectively by the abbreviation I/O and more generally as peripherals.
The most obvious peripheral is likely the video display because that’s what you’re often staring at regardless of whether you use a desktop computer, a laptop, a tablet, or a cellphone. Perhaps you’re staring at a video display while reading this very book!
All video displays in common use today create an image composed of rows and columns of pixels, which are little colored dots that you can see if you examine a display with a magnifying glass. The composite number of rows and columns of pixels is often referred to as the display resolution.
For example, the standard high-definition television (HDTV) resolution is denoted as 1920 × 1080, which means 1,920 pixels horizontally and 1,080 pixels vertically, for a total of about 2 million pixels, each of which can be a different color. This has almost become the minimum resolution of computer displays.
These pixels are not illuminated all at once. The contents of the display are stored in a special block of memory, and the individual pixels of the display are refreshed sequentially, starting left to right with the row at the top, and continuing down the display. To prevent flickering, this process occurs very quickly, and the entire display is generally refreshed at least 60 times per second. The circuitry that controls this process is known as a video display adapter.
How much memory is required to store the contents of a 1920 × 1080 display?
Each of the 2 million pixels is a specific color that is a combination of red, green, and blue primary colors, also known as an RGB color. (If you’re an artist, you might be familiar with a different set of primary colors, but these are the three used in video displays.) Varying the intensity of these individual components creates all the colors possible on the video display. The intensities are generally controlled by 1 byte for each primary, which is set to 00h for no color and FFh for maximum intensity. This scheme allows a video display to be capable of 256 different levels of red, 256 levels of green, and 256 levels of blue, for a total of 256 × 256 × 256, or 16,777,216 different colors. (Under the philosophy that everything about computers can be improved, some companies are forging ahead to increase color range and resolution. Doing so requires more than 8 bits per primary.)
If you do any work with HTML in designing webpages, you might know that colors can be specified with six-digit hexadecimal values preceded by a pound sign. Here are the 16 standard colors established by the HTML 4.01 specification from 1999:
Other colors are defined with different values. Following the pound sign are three pairs of hexadecimal digits: The first is the level of red from 00h to FFh, the second is the level of green, and the third is the level of blue. Black results when all three components are 00h, and white results when all three components are FFh. Shades of gray are possible when all three components are the same.
For a 1920 × 1080 display, each of the 2 million pixels requires 3 bytes for the red, green, and blue components, for a total of 6 million bytes, or 6 megabytes.
In previous chapters, I treated the RAM that a CPU accesses as a monolithic block of memory. In reality, memory containing code and data is usually shared with memory devoted to the video display. This configuration allows the computer to update the video display very quickly just by writing bytes into RAM, allowing very high-speed graphical animations.
The 8-bit CPU that I built over the past several chapters has a 16-bit memory address that is capable of addressing 64 kilobytes of memory. Obviously you cannot fit 6 megabytes of video memory into 64 kilobytes of memory! (Actually, you might rig up something in which multiple chunks of memory are swapped in and out of the CPU’s memory space, but it would certainly slow things down.)
This is why high-resolution video displays became feasible only when memory became cheap, and when more powerful CPUs could access this memory with more agility. A 32-bit CPU can access memory in 32-bit data chunks, and for that reason, video display memory is often arranged with 4 bytes per pixel rather than just the 3 required for the red, green, and blue components. This means that video memory for a 1920 × 1080 display requires 8 megabytes of memory rather than just 6 megabytes.
This video memory is generally arranged in the same order in which the display is refreshed. First row first, starting with the leftmost pixel: 3 bytes for the red, green, and blue components, and an unused byte. Drawing anything on the screen—be it text or graphics—requires a program to determine what pixels to set in the graphics memory.
Computer graphics often involves mathematical tools associated with analytic geometry. The entire display—or a smaller rectangular area of the display—can be treated as a simple coordinate system in which every pixel is a point that is referenced with horizontal and vertical (x, y) coordinates. For example, the pixel at position (10, 5) is ten pixels from the left and five pixels down. Drawing a diagonal line from that point to the position (15, 10) involves coloring the pixels at points (10, 5), (11, 6), (12, 7), (13, 8), (14, 9), and (15, 10). Other types of lines and curves are more complex, of course, but there are plenty of software tools to help out.
Text is a subset of graphics. Each character of a particular font is defined by a collection of straight lines and curves with additional information (called “hints”) that allow text to be rendered for maximum readability.
Three-dimensional graphics get much more complex, involving various types of shading to indicate the effect of light and shadows. Nowadays, programs are often assisted by a graphics processing unit (GPU) that does some of the heavy mathematics often required for 3D graphics.
When personal computers first became available, high-resolution displays were just not feasible. The first graphics display available for the IBM PC was called the Color Graphics Adapter (CGA), which was capable of three graphics formats (or modes): 160 × 100 pixels with 16 colors (but using 1 byte per pixel), 320 × 200 pixels with four colors (2 bits per pixel), and 640 × 200 pixels with two colors (1 bit per pixel). Regardless of the graphics mode, only 16,000 bytes of memory were required. For example, 320 pixels across times 200 pixels down times ¼ byte per pixel equals 16,000.
Some early computer displays were not capable of displaying graphics at all and were limited to text. This is another way to reduce memory requirements, and this was the rationale behind the Monochrome Display Adapter (MDA), the other display available with early IBM PCs. The MDA was capable only of displaying 25 lines of 80-character text in one color, which was green on a black background. Each character was specified by an 8-bit ASCII code and was accompanied by an “attribute” byte that could be used for brightness, reverse video, underlining, or blinking. The number of bytes required for storing the contents of the display was therefore 25 × 80 × 2, or 4,000 bytes. The video adapter contained circuitry that used read-only memory to convert each ASCII character to rows and columns of pixels. Just as a CPU contains internal busses to move data between the CPU components, the CPU itself is often connected to external busses that move data between the CPU, memory, and peripherals.
Memory for the video display occupies the regular memory space of the CPU. Other peripherals might do so also. This is called memory-mapped I/O. But a CPU might define a separate bus for accessing peripherals, and it might include special facilities for working with these input/output devices.
In the previous several chapters, I’ve been building a CPU based on the Intel 8080 microprocessor. Among the 244 instructions implemented by the 8080 are two instructions named IN and OUT:
Both instructions are followed by an 8-bit port number, which is similar to a memory address but is only 8 bits wide and intended specifically for I/O devices. The IN instruction reads from that port and saves the result in the accumulator. The OUT instruction writes the contents of the accumulator to that port. A special signal from the 8080 indicates whether it is accessing RAM (the normal case) or accessing an I/O port.
For example, consider the keyboard on a desktop or laptop computer. Each key on the keyboard is a simple switch that is closed when the key is pressed. Each key is identified by a unique code. This keyboard might be set up to be accessed as port number 25h. A program could execute the instruction:
IN 25h
The accumulator would then contain a code indicating what key has been pressed.
It’s tempting to assume that this code is the ASCII code for the key. But it’s neither practical nor desirable to design hardware that figures out the ASCII code. For example, the A key on the keyboard could correspond to the ASCII code 41h or 61h depending on whether a user also pressed the Shift key, which is the key that determines whether a typed letter is lowercase or uppercase. Also, computer keyboards have many keys (such as function keys and arrow keys) that don’t correspond to ASCII characters at all. A short computer program can figure out what ASCII code (if any) corresponds to a particular key being pressed on the keyboard.
But how would the program know when a key has been pressed on the keyboard? One approach is for the program to check the keyboard very frequently. This approach is called polling. But a better approach is for the keyboard to somehow inform the CPU when a key has been pressed. In the general case, an I/O device can inform a CPU of such an event by issuing an interrupt, which is just a special signal going to the CPU.
To assist with interrupts, the 8080 CPU implemented eight instructions called restart instructions:
Each of these instructions causes the CPU to save the current program counter on the stack and then jump to the memory address 0000h, 0008h, 0010h, and so forth. A RST 0 is essentially the same as a CPU reset, but the others might contain jump or call instructions.
Here’s how this works: The 8080 CPU included an external interrupt signal. When a peripheral device (such as a physical keyboard) sets this interrupt signal, it also puts the byte for one of these reset instructions on the data bus. That memory location contains code to handle that particular I/O device.
This is called interrupt-driven I/O. The CPU doesn’t have to bother polling the I/O devices. It can be doing other tasks until the I/O device uses the interrupt signal to inform the CPU that something new has happened. This is how a keyboard can inform the CPU that a key has been pressed.
It’s also desirable to use interrupts for the mouse on a desktop or laptop computer, a touchpad on a laptop, or a touchscreen on a tablet or cellphone.
A mouse seems to be connected very directly to the video display. After all, you move the mouse up, down, left, or right on your desk, and the mouse pointer moves accordingly on the screen. But that connection is really just an illusion. The mouse is delivering electrical pulses indicating the direction it’s moved. It is the responsibility of software to redraw the mouse pointer in different locations. Besides movement, the mouse also signals the computer when a mouse button has been pressed and when it’s released, or when a scroll button is turned.
A touchscreen is usually a layer on top of a video display that can detect a change in electrical capacitance when touched by a finger. The touchscreen can indicate the location of one or more fingers using the same (x, y) coordinates that a program uses to display graphics to the screen. Programs can be informed when a finger touches the screen, when it is removed from the screen, and by how it moves when it’s touching the screen. This information can then assist the program in performing various tasks, such as scrolling the screen or dragging a graphical object across the screen. A program can also interpret the movement of two-finger gestures such as pinch and zoom.
Everything in the computer is digital. Everything is a number. Yet the real world is often analog. Our perceptions of light and sound seem continuous rather than being of discrete numeric values.
To assist in converting real-world analog data into numbers and back again, two devices have been invented:
· The analog-to-digital converter (ADC)
· The digital-to-analog converter (DAC)
The input of an ADC is a voltage that can vary continuously between two values, and the output is a binary number representing that voltage. ADCs commonly have 8-bit or 16-bit outputs. For example, the output of an 8-bit ADC might be 00h for an input voltage of zero volts, 80h for 2.5 volts, and FFh for 5 volts.
The DAC goes the other way. The input is a binary number, perhaps 8 bits or 16 bits in width, and the output is a voltage corresponding to that number.
DACs are used in video displays to convert the digital values of the pixels into voltages that govern the intensity of light emitted from the red, green, and blue components of each pixel.
Digital cameras use an array of active-pixel sensors (APS) that respond to light by emitting a voltage that is then converted to numbers with an ADC. The result is an object called a bitmap, which is a rectangular array of pixels, each of which is a particular color. Just as with the memory in a video display, the pixels of a bitmap are stored sequentially, row by row starting with the top row and ending with the bottom, and within each row from left to right.
Bitmaps can be huge. The camera on my cellphone creates images that are 4032 pixels wide and 3024 pixels high. But not all that data is necessary to reproduce the image. For that reason, engineers and mathematicians have devised several techniques to reduce the number of bytes required to store bitmaps. This is called compression.
One simple form of bitmap compression is run-length encoding, or RLE. For example, if there are ten pixels of the same color in a row, the bitmap need only store that pixel and the number 10. But this works well only for images that contain large swaths of the same color.
A more sophisticated file-compression scheme that’s still in common use is the Graphics Interchange Format, or GIF, pronounced jif like the brand of peanut butter (though not everyone agrees). It was developed in 1987 by the former online service CompuServe. GIF files use a compression technique called LZW (standing for its creators, Lempel, Zif, and Welch), which detects patterns of differently valued pixels rather than just consecutive strings of same-value pixels. GIF files also incorporate a rudimentary animation facility using multiple images.
More sophisticated than GIF is Portable Network Graphics (PNG), dating from 1996. PNG effectively converts adjacent pixel values to differences between the values, which are generally smaller numbers that can be more efficiently compressed.
A GIF or PNG file is not necessarily smaller than the original uncompressed bitmap! If some bitmaps are reduced in size by a particular compression process, others must be increased in size. This can happen for images with a great many colors or much detail.
In that case, other techniques become useful. Introduced in 1992, the JPEG (pronounced jay-peg) file format has become enormously popular for bitmap files of real-world images. Today’s cellphone cameras create JPEG files ready to be shared or transferred to another computer.
JPEG stands for the Joint Photographic Experts Group, and unlike previous compression techniques, it is based on psychovisual research to exploit the way that the human eye perceives images. In particular, JPEG compression can discard sharp transitions in colors, which reduces the amount of data necessary to reproduce the image. Quite sophisticated mathematics are employed!
The disadvantage of JPEG is that it’s not reversible: You can’t go back to exactly the original image after it’s compressed. In contrast, GIF and PNG are reversible; nothing is lost in the compression process. For this reason, GIF and PNG are referred to as lossless compression techniques, while JPEG is categorized as a form of lossy compression. Information is lost, and in extreme cases, this can result in visual distortions.
Computers often have a microphone that detects sounds from the real world, and a speaker that creates sounds.
Sound is vibration. Human vocal cords vibrate, a tuba vibrates, a tree falling in a forest vibrates, and these objects cause air molecules to move. The air alternately pushes and pulls, compresses and thins, back and forth some hundreds or thousands of times a second. The air in turn vibrates our eardrums, and we sense sound.
A microphone responds to these vibrations by producing an electrical current whose voltage varies analogously to the sound waves. Also analogous to these waves of sound are the little hills and valleys in the surface of the tin foil cylinder that Thomas Edison used to record and play back sound in the first phonograph in 1877, and the hills and valleys in the grooves of vinyl records still beloved by modern audiophiles and enthusiasts of retro technologies.
But for computers, this voltage needs to be digitized—that is, turned into numbers—and that’s another job for the ADC.
Digitized sound made a big consumer splash in 1983 with the compact disc (CD), which became the biggest consumer electronics success story ever. The CD was developed by Philips and Sony to store 74 minutes of digitized sound on one side of a disc 12 centimeters in diameter. The length of 74 minutes was chosen so that Beethoven’s Ninth Symphony could fit on one CD. (Or so the story goes.)
Sound is encoded on a CD using a technique called pulse code modulation, or PCM. Despite the fancy name, PCM is conceptually a fairly simple process: The voltage representing a sound wave is converted to digital values at a constant rate and stored. During playback, the numbers are converted to an electrical current again using a DAC.
The voltage of the sound wave is converted to numbers at a constant rate known as the sampling rate. In 1928, Harry Nyquist of Bell Telephone Laboratories showed that a sampling rate must be at least twice the maximum frequency that needs to be recorded and played back. It’s commonly assumed that humans hear sounds ranging from 20 Hz to 20,000 Hz. The sampling frequency used for CDs is a bit more than double that maximum, specifically 44,100 samples per second.
The number of bits per sample determines the dynamic range of the CD, which is the difference between the loudest and the softest sound that can be recorded and played back. This is somewhat complicated: As the electrical current varies back and forth as an analog of the sound waves, the peaks that it hits represent the waveform’s amplitude. What we perceive as the intensity of the sound is proportional to twice the amplitude. A bel (which is three-quarters of Alexander Graham Bell’s last name) is a tenfold increase in intensity; a decibel is one-tenth of a bel. One decibel represents approximately the smallest increase in loudness that a person can perceive.
It turns out that the use of 16 bits per sample allows a dynamic range of 96 decibels, which is approximately the difference between the threshold of hearing (below which we can’t hear anything) and the threshold of pain, louder than which might prompt us to hold our hands over our ears. The compact disc uses 16 bits per sample.
For each second of sound, a compact disc contains 44,100 samples of 2 bytes each. But you probably want stereo as well, so double that for a total of 176,400 bytes per second. That’s 10,584,000 bytes per minute of sound. (Now you know why digital recording of sound wasn’t common before the 1980s.) The full 74 minutes of stereo sound on the CD requires 783,216,000 bytes. Later CDs increased that capacity somewhat.
Although CDs have faded in importance in recent years, the concepts of digital sound remain the same. Because you don’t always need CD quality when recording and playing back sound on home computers, lower sampling rates are often available, including 22,050 Hz, 11,025 Hz, and 8000 Hz. You can record using a smaller sample size of 8 bits, and you can cut the data in half by recording monophonically.
Just as with bitmaps, it’s often useful to compress audio files to reduce storage and decrease the amount of time required to transfer files between computers. One popular compression technique for audio is MP3, which originated as part of a compression technique for movies called MPEG (standing for Moving Picture Experts Group). MP3 is lossy compression but based on psychoacoustic analysis to reduce data that does not appreciably contribute to the perception of the music.
Bitmaps compressed with GIF, PNG, or JPEG, and audio compressed with MP3, can occupy memory, particularly while a program is working with the information, but very often they’re stored as files on some kind of storage device.
As you’ll recall, random access memory—whether constructed from relays, tubes, or transistors—loses its contents when the electrical power is shut off. For this reason, a complete computer also needs something for long-term storage. One time-honored approach involves punching holes in paper or cardboard, such as in IBM punch cards. In the early days of small computers, rolls of paper tape were punched with holes to save programs and data and to later reload them into memory. A step up from that was using audio cassette tapes, which were also popular in the 1980s for recording and playing music. These were just a smaller version of the magnetic tapes used by large computers for mass storage of data.
Tape, however, isn’t an ideal medium for storage and retrieval because it’s not possible to move quickly to an arbitrary spot on the tape. It can take a lot of time to fast-forward or rewind.
A medium geometrically more conducive to fast access is the disk. The disk itself is spun around its center, while one or more heads attached to arms can be moved from the outside of the disk to the inside. Any area on the disk can be accessed very quickly. Bits are recorded by magnetizing small areas of the disk. The first disk drives used for computers were invented at IBM in 1956. The Random Access Method of Accounting and Control (RAMAC) contained 50 metal disks 2 feet in diameter and could store 5 megabytes of data.
Popular on personal computers were smaller single sheets of coated plastic inside a protective casing made of cardboard or plastic. These were called floppy disks or diskettes and started out as 8 inches in diameter, then 5.25 inches, and then 3.5 inches. Floppy disks could be removed from the disk drive, allowing them to be used for transferring data from one computer to another. Diskettes were also an important distribution medium of commercial software. Diskettes have all but disappeared except for a little drawing of a 3.5-inch diskette, which survives as the Save icon in many computer applications.
A hard disk still found inside some personal computers usually contains multiple metal disks permanently built into the drive. Hard disks are generally faster than floppy disks and can store more data, but the disks themselves can’t easily be removed.
These days, storage is more often in the form of a solid-state drive (SSD) built either inside the computer (or tablet or cellphone) or as flash memory in a portable thumb drive.
Mass-storage devices must accommodate files of various sizes that might originate from a variety of sources on the computer. To facilitate this, the mass-storage device is divided into areas of a fixed size, called sectors. Floppy disks and hard drives often had a sector size of 512 bytes. SSDs often have sector sizes of 512 bytes and 4,096 bytes.
Every file is stored in one or more sectors. If the sector size is 512 bytes and the file is less than 512 bytes, storing the file requires only one sector, but any remaining space can’t be used for anything else. A file that is 513 bytes requires two sectors, and a file that is a megabyte in size requires 4,096 sectors.
The sectors associated with a particular file don’t have to be consecutive. They can be spread out all over the drive. As files are deleted, sectors are freed up for other files. As new files are created, available sectors are used, but the sectors are not necessarily grouped together.
Keeping track of all this—including the whole process of storing files and retrieving them—is the province of an extremely important piece of software known as the operating system.