Chapter 4
IN THIS CHAPTER
Using standard hardware
Using specialized hardware
Improving your hardware
Interacting with the environment
In Chapter 1, you discover that one of the reasons for the failure of early AI efforts was a lack of suitable hardware. The hardware just couldn’t perform tasks quickly enough for even mundane needs, much less something as complex as simulating human thought. This issue is described at some length in the movie The Imitation Game (see the description at Amazon.com), in which Alan Turing finally cracked the Enigma code by cleverly looking for a particular phrase, “Heil Hitler,” in each message. Without that particular flaw in the way that operators used the Enigma, the computer equipment that Turing used would never have worked fast enough to solve the problem (and the movie had no small amount of griping about the matter). If anything, the historical account — what little of it is fully declassified — shows that Turing’s problems were more profound than the movie expressed (see “Cracking the Uncrackable” at ScienceABC.com for details). Fortunately, standard, off-the-shelf hardware can overcome the speed issue for many problems today, which is where this chapter begins.
To truly begin to simulate human thought requires specialized hardware, and even the best specialized hardware isn’t up to the task today. Almost all standard hardware relies on the von Neumann architecture (the von Neumann computer model is explained at c-jump.com), which separates memory from computing, creating a wonderfully generic processing environment that just doesn’t work well for some kinds of algorithms because the speed of the bus between the processor and memory creates a von Neumann bottleneck. The second part of this chapter helps you understand the various methods used to overcome the von Neumann bottleneck so that complex, data-intensive algorithms run faster.
Even with custom hardware specially designed to speed computations, a machine designed to simulate human thought can run only as fast as its inputs and outputs will allow. Consequently, people are working to create a better environment in which the hardware can operate. This need can be addressed in a number of ways, but this chapter looks at two: enhancing the capabilities of the underlying hardware, and using specialized sensors. These changes to the hardware environment work well, but as the following material explains, it still isn’t enough to build a human brain.
Ultimately, hardware is useless, even with enhancements, if the humans who rely on it can’t interact with it effectively. The final sections of this chapter describe techniques for making those interactions more efficient. Of special importance now is the use of Deep Learning Processors (DLPs), which are designed specifically to work with deep learning algorithms. However, there are also more mundane approaches that are simply the result of the combination of enhanced output and clever programming. Just as Alan Turing used a trick to make his computer seemingly do more than it was able to do, these techniques make modern computers look like miracle workers. In fact, the computer understands nothing; all the credit goes to the persons who program the computer.
Relying on Standard Hardware
Most AI projects that you create will at least begin with standard hardware because modern off-the-shelf components actually provide significant processing power, especially when compared to components from the 1980s when AI first began to produce usable results. Consequently, even if you can’t ultimately perform production-level work by using standard hardware, you can get far enough along with your experimental and preproduction code to create a working model that will eventually process a full dataset.
Understanding the standard hardware
The architecture (structure) of the standard PC hasn’t changed since John von Neumann first proposed it in 1946 (see the article “John von Neumann: The Father of the Modern Computer” at https://www.maa.org/external_archive/devlin/devlin_12_03.html for details). Reviewing the history at https://lennartb.home.xs4all.nl/coreboot/col2.html shows you that the processor connects to memory and peripheral devices through a bus in PC products as early as 1981 (and long before). All these systems use the von Neumann architecture because this architecture provides significant benefits in modularity. Reading the history tells you that these devices allow upgrades to every component as individual decisions, allowing increases in capability. For example, within limits, you can increase the amount of memory or storage available to any PC. You can also use advanced peripherals. However, all these elements connect through a bus.
That a PC becomes more capable doesn’t change the facts of its essential architecture. So, the PC you use today has the same architecture as devices created long ago; they’re simply more capable. In addition, the form factor of a device doesn’t affect its architecture, either. The computers in your car rely on a bus system for connectivity that directly relies on the von Neumann architecture. (Even if the kind of bus is different, the architecture is the same.) Lest you think any device remains unaffected, look at the block diagram for a Blackberry at http://mobilesaudi.blogspot.com/2011/10/all-blackberry-schematic-complete.html. It, too, relies on a von Neumann setup. Consequently, almost every device you can conceive of today has a similar architecture, despite having different form factors, bus types, and essential capabilities.
Describing standard hardware deficiencies
The ability to create a modular system does have significant benefits, especially in business. The ability to remove and replace individual components keeps costs low while allowing incremental improvements in both speed and efficiency. However, as with most things, there is no free lunch. The modularity provided by the von Neumann architecture comes with some serious deficiencies:
· von Neumann bottleneck: Of all the deficiencies, the von Neumann bottleneck is the most serious when considering the requirements of disciplines such as AI, machine learning, and even data science. You can find this particular deficiency discussed in more detail in the “Considering the von Neumann bottleneck” section, later in this chapter.
· Single points of failure: Any loss of connectivity with the bus necessarily means that the computer fails immediately, rather than gracefully. Even in systems with multiple processors, the loss of a single processor, which should simply produce a loss of capability, instead inflicts complete system failure. The same problem occurs with the loss of other system components: Instead of reducing functionality, the entire system fails. Given that AI often requires continuous system operation, the potential for serious consequences escalates with the manner in which an application relies on the hardware.
· Single-mindedness: The von Neumann bus can either retrieve an instruction or retrieve the data required to execute the instruction, but it can’t do both. Consequently, when data retrieval requires several bus cycles, the processor remains idle, reducing its ability to perform instruction-intensive AI tasks even more.
· Tasking: When the brain performs a task, a number of synapses fire at one time, allowing simultaneous execution of multiple operations. The original von Neumann design allowed just one operation at a time, and only after the system retrieved both the required instruction and data. Computers today typically have multiple cores, which allow simultaneous execution of operations in each core. However, application code must specifically address this requirement, so the functionality sometimes remains unused.
EXAMINING THE HARVARD ARCHITECTURE DIFFERENCE
You may encounter the Harvard architecture during your hardware travels because some systems employ a modified form of this architecture to speed processing. Both the von Neumann architecture and Harvard architecture rely on a bus topology. However, when working with a von Neumann architecture system, the hardware relies on a single bus and a single memory area for both instructions and data, whereas the Harvard architecture relies on individual buses for instructions and data, and can use separate physical memory areas (see the comparison in “Difference between Von Neumann and Harvard Architecture” at GeeksforGeeks.org). The use of individual buses enables a Harvard architecture system to retrieve the next instruction while waiting for data to arrive from memory for the current instruction, thereby making the Harvard architecture both faster and more efficient. However, reliability suffers because now you have two failure points for each operation: the instruction bus and the data bus.
Microcontrollers, such as those that power your microwave, often use the Harvard architecture. In addition, you might find it in some unusual places for a specific reason. The iPhone and Xbox 360 both use modified versions of the Harvard architecture that rely on a single memory area (rather than two), but still rely on separate buses. The reason for using the architecture in this case is Digital Rights Management (DRM). You can make the code area of memory read-only so that no one can modify it or create new applications without permission. From an AI perspective, this can be problematic because one of an AI’s capabilities is to write new algorithms (executable code) as needed to deal with unanticipated situations. Because PCs rarely implement a Harvard architecture in its pure form or as its main bus construction, the Harvard architecture doesn’t receive much attention in this book.
Relying on new computational techniques
Reading literature about how to perform tasks using AI can feel like you’re seeing a marketer on TV proclaiming, “It’s new! It’s improved! It’s downright dazzling!” So it shouldn’t surprise you much that people are always coming up with ways to make the AI development experience faster, more precise, and better in other ways. The problem is that many of these new techniques are untested, so they might look great, but you have to think about them for a while.
One way around the various issues surrounding AI computational speed is to create new techniques for performing tasks. Although many data scientists rely on the Graphical Processing Unit (GPU) to speed execution of complex code, the article “The startup making deep learning possible without specialized hardware” at MIT Technology Review.com describes another approach, based on a product called Neural Magic (https://neuralmagic.com), which essentially compresses data to make a CPU more efficient. Neural Magic also keeps costs lower than using specialized hardware. (The more specialized the hardware, the higher the costs.)
Reading the fine print with any new technology is always important, however, and this is the case with Neural Magic. The process for using the Neural Magic approach still involves training the model on hardware robust enough to perform the task, which usually means relying on GPUs. In addition, you now take the additional step of converting the model using Neural Magic to run on a standard CPU. So, the Neural Magic approach really isn’t an option for someone who is experimenting. Anyone using Neural Magic already has a well-developed application and simply wants to run it on a low-cost machine. In addition, Neural Magic is currently used only for computer vision tasks (for which the computer relies on cameras to capture images and then interprets those images mathematically to do things like categorize objects), which is a somewhat smallish part of AI as a whole.
The advantage of using the Neural Magic approach is that an organization can buy just a few high-cost machines to perform research and create an application. It can then run the resulting application on as many low-cost systems as needed to satisfy user requirements. The big payoff is that these systems need not rely on desktop technology, but can use mobile devices as well, so the application can run anywhere. Consequently, this is a valuable approach within the limits of the technology it currently uses.
Another new approach relies on using hash tables instead of matrices to model problems. According to the “CPU algorithm trains deep neural nets up to 15 times faster than top GPU trainers” article at TechXplore.com, the Sub-Linear Deep Learning Engine (SLIDE) can train models using commodity processors rather than using GPUs. Beside using hash tables in place of matrix multiplication, using SLIDE also eliminates some of the more wasteful elements of training a model (see the KD-nuggets.com article “Deep Learning Breakthrough: a sub-linear deep learning algorithm that does not need a GPU?”). The problem with this new approach (as with many new approaches) is that it requires a complete change in how tasks are performed. Obviously, organizations won’t be happy about throwing out millions of dollars in existing development to try something new. The white paper “SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems” at arXiv.org provides a more formal discussion of this new methodology.
CONSIDERING ALAN TURING’S BOMBE MACHINE
Alan Turing’s Bombe machine wasn’t any form of AI. In fact, it isn’t even a real computer. It broke Enigma cryptographic messages, and that’s it. However, it did provide food for thought for Turing, which eventually led to a paper entitled “Computing Machinery and Intelligence.” Turing published that paper, which describes the imitation game, in the 1950s (The Imitation Game movie is a depiction of the events surrounding the creation of this game). However, the Bombe itself was actually based on a Polish machine called the Bomba.
Even though some sources imply that Alan Turing worked alone, the Bombe was produced with the help of many people, most especially Gordon Welchman. Turing also didn’t spring from a vacuum, ready-made to break German encryption. His time at Princeton was spent with greats like Albert Einstein and John von Neumann (who would go on to invent the concept of computer software). The papers Turing wrote inspired these other scientists to experiment and see what is possible.
Specialized hardware of all sorts will continue to appear as long as scientists are writing papers, bouncing ideas off of each other, creating new ideas of their own, and experimenting. When you see movies or other media, assuming that they’re historically accurate at all, don’t leave with the feeling that these people just woke up one morning, proclaimed, “Today, I will be brilliant!” and went on to do something marvelous. Everything builds on something else, so history is important because it helps show the path followed and illuminates other promising paths — those not followed.
Using GPUs
After creating a prototypical setup to perform the tasks required to simulate human thought on a given topic, you may need additional hardware to provide sufficient processing power to work with the full dataset required of a production system. Many ways are available to provide such processing power, but a common way is to use Graphic Processing Units (GPUs) in addition to the central processor of a machine. The following sections describe the problem domain that a GPU addresses, what precisely the term GPU means, and why a GPU makes processing faster.
Considering the von Neumann bottleneck
The von Neumann bottleneck is a natural result of using a bus to transfer data between the processor, memory, long-term storage, and peripheral devices. No matter how fast the bus performs its task, overwhelming it — that is, forming a bottleneck that reduces speed — is always possible. Over time, processor speeds continue to increase while memory and other device improvements focus on density — the capability to store more in less space. Consequently, the bottleneck becomes more of an issue with every improvement, causing the processor to spend a lot of time being idle.
Within reason, you can overcome some of the issues that surround the von Neumann bottleneck and produce small, but noticeable, increases in application speed. Here are the most common solutions:
· Caching: When problems with obtaining data from memory fast enough with the von Neumann architecture became evident, hardware vendors quickly responded by adding localized memory that didn’t require bus access. This memory appears external to the processor but as part of the processor package. High-speed cache is expensive, however, so cache sizes tend to be small.
· Processor caching: Unfortunately, external caches still don’t provide enough speed. Even using the fastest RAM available and cutting out the bus access completely doesn’t meet the processing capacity needs of the processor. Consequently, vendors started adding internal memory — a cache smaller than the external cache, but with even faster access because it’s part of the processor.
· Prefetching: The problem with caches is that they prove useful only when they contain the correct data. Unfortunately, cache hits prove low in applications that use a lot of data and perform a wide variety of tasks. The next step in making processors work faster is to guess which data the application will require next and load it into a cache before the application requires it.
· Using specialty RAM: You can get buried by RAM alphabet soup because there are more kinds of RAM than most people imagine. Each kind of RAM purports to solve at least part of the von Neumann bottleneck problem, and they do work — within limits. In most cases, the improvements revolve around the idea of getting data from memory and onto the bus faster. Two major (and many minor) factors affect speed: memory speed (how fast the memory moves data) and latency (how long it takes to locate a particular piece of data). You can read more about memory and the factors that affect it in “Different RAM Types and its uses” at Computer Memory Upgrade.net.
As with many other areas of technology, hype can become a problem. For example, multithreading, the act of breaking an application or other set of instructions into discrete execution units that the processor can handle one at a time, is often touted as a means to overcome the von Neumann bottleneck, but it doesn’t actually do anything more than add overhead (making the problem worse). Multithreading is an answer to another problem: making the application more efficient. When an application adds latency issues to the von Neumann bottleneck, the entire system slows. Multithreading ensures that the processor doesn’t waste yet more time waiting for the user or the application, but instead has something to do all the time. Application latency can occur with any processor architecture, not just the von Neumann architecture. Even so, anything that speeds the overall operation of an application is visible to the user and the system as a whole.
Defining the GPU
The original intent of a GPU was to process image data quickly and then display the resulting image onscreen. During the initial phase of PC evolution, the CPU performed all the processing, which meant that graphics could appear slowly while the CPU performed other tasks. During this time, a PC typically came equipped with a display adapter, which contains little or no processing power. A display adapter merely converts the computer data into a visual form. In fact, using just one processor proved almost impossible after the PC moved past text-only displays or extremely simple 16-color graphics. However, GPUs didn’t really make many inroads into computing until people began wanting 3D output. At this point, a combination of a CPU and a display adapter simply couldn’t do the job.
A first step in this direction was taken by systems such as the Hauppauge 4860 (see details at Geekdot.com), which included a CPU and a special graphics chip (the 80860, in this case) on the motherboard. The 80860 has the advantage of performing calculations extremely fast (see “Intel 80860 (i860) CPU family” at CPU-World.com for details). Unfortunately, these multiprocessor, asynchronous systems didn’t quite meet the expectations that people had for them (although they were incredibly fast for systems of the time) and they proved extremely expensive. Plus, there was the whole issue of writing applications that included that second (or subsequent) chip. The two chips also shared memory (which was abundant for these systems).
A GPU moves graphics processing from the motherboard to the graphics peripheral board. The CPU can tell the GPU to perform a task, and then the GPU determines the best method for doing so independently of the CPU. A GPU has a separate memory, and the data path for its bus is immense. In addition, a GPU can access the main memory for obtaining data needed to perform a task and to post results independently of the CPU. Consequently, this setup makes modern graphics displays possible.
However, what really sets a GPU apart is that a GPU typically contains hundreds or thousands of cores (see the article about supercharged computing at NVIDIA.com), contrasted with just a few cores for a CPU. (Eight cores is about the best that you get, even with the newer i9 processor, described in “11th Generation Intel Core i9 Processors” at Intel.com. According to the NVIDIA blog post at https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/, an A100 GPU can host up to 80GB of RAM and has up to 8,192 FP32 (single-precision floating-point format) CUDA (Compute Unified Device Architecture) Cores per full GPU. CUDA is a parallel computing platform and Application Programming Interface (API) developed by NVIDIA. Even though the CPU provides more general-purpose functionality, the GPU performs calculations incredibly fast and can move data from the GPU to the display even faster. This ability is what makes the special-purpose GPU a critical component in today’s systems.
Considering why GPUs work well
As with the 80860 chip described in the previous section, the GPUs today excel at performing the specialized tasks associated with graphics processing, including working with vectors. All those cores performing tasks in parallel really speed AI calculations. For example, they’re indispensable in creating compute-intensive AI models, like the Generative Adversarial Networks (GANs) that perform tasks like the ones described in the “18 Impressive Applications of Generative Adversarial Networks (GANs)” article at Machine Learning Mastery.com.
In 2011, the Google Brain Project (https://research.google.com/teams/brain/) trained an AI to recognize the difference between cats and people by watching movies on YouTube. However, to make this task work, Google used 2,000 CPUs in one of Google’s giant data centers. Few people would have the resources required to replicate Google’s work.
On the other hand, Bryan Catanzaro (NVIDIA’s research team) and Andrew Ng (Stanford) were able to replicate Google’s work using a set of 12 NVIDIA GPUs (see the “Accelerating AI with GPUs: A New Computing Model” post at the NVIDIA.com blog for details). After people understood that GPUs could replace a host of computer systems stocked with CPUs, they could start moving forward with a variety of AI projects. In 2012, Alex Krizhevsky (Toronto University) won the ImageNet computer image recognition competition using GPUs. In fact, a number of researchers have now used GPUs with amazing success (see “The 9 Deep Learning Papers You Need To Know About” at https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html for details).
Working with Deep Learning Processors (DLPs)
Researchers engage in a constant struggle to discover better ways to train, verify, and test the models used to create AI applications. One of those ways is to use new computing techniques, as described in the “Relying on new computational techniques” section, earlier in this chapter. Another way is to throw more processing power at the problem, such as by using a GPU.
However, a GPU is beneficial only because it can perform matrix manipulation quickly, and on a massively parallel level. Otherwise, using a GPU can create problems as well, as discussed in the “Using GPUs” section of the chapter. So, the search for something better is ongoing, and you can find a veritable alphabet soup of processor types described on sites such as Primo.ai, with this page: “Processing Units - CPU, GPU, APU, TPU, VPU, FPGA, QPU.” This resource page will acquaint you with all of the current processor types. However, you should start with the overview provided in the following sections because it’s easy to get mired in the quicksand of too many options (and then your head explodes).
Defining the DLP
A Deep Learning Processor (DLP) is simply a specialized processor that provides some advantages in training, verifying, testing, and running AI applications. They try to create an environment in which AI applications run quickly even on smaller or less capable devices. Most DLPs follow a similar pattern by providing
· Separate data and code memory areas
· Separate data and code buses
· Specialized instruction sets
· Large on-chip memory
· Large buffers to encourage data reuse patterns
In 2014, Tianshi Chen (and others) proposed the first DLP, called DianNoa (Chinese for electric brain), in a white paper at http://novel.ict.ac.cn/ychen/pdf/DianNao.pdf. Of course, a first attempt is never good enough, so there is a whole family of DianNoa chips: DaDianNao, ShiDianNao, and PuDianNao (and possibly others).
Since these first experiments with DLPs, the number and types of DLPs have soared, but most of these endeavors are currently part of university research efforts. The exceptions are the Neural Processing Unit (NPU) created by Huawei (https://developer.huawei.com/consumer/en/doc/2020314) and Samsung (https://news.samsung.com/global/samsung-electronics-introduces-a-high-speed-low-power-npu-solution-for-ai-deep-learning) for mobile devices, and the Tensor Processing Unit (TPU) created by Google (https://cloud.google.com/tpu/docs/tpus) specifically for use with TensorFlow (https://www.tensorflow.org/). These two DLP types are described next.
Using the mobile Neural Processing Unit (NPU)
A number of mobile devices, notably those by Huawei and Samsung, have a Neural Processing Unit (NPU) in addition to a general CPU to perform AI predictive tasks using models such as Artificial Neural Networks (ANNs) and Random Forests (RFs). You can’t use an NPU for general computing needs because it’s so specialized. However, an NPU characteristically performs up to ten times faster than a GPU does for the same task. An NPU is specialized in these ways:
· It accelerates the running of predefined models (as contrasted to training, verification, and testing)
· It’s designed for use with small devices
· It consumes little power when contrasted to other processor types
· It uses resources, such as memory, efficiently
Because the precise boundaries between processor types are hard to define, you might see a number of NPU look-alikes or alternatives classified as NPUs. However, here is a list of processors that you can currently classify as true NPUs:
· Ali-NPU, by Alibaba
· Ascend, by Huawei
· Neural Engine, by Apple
· Neural Processing Unit (NPU), by Samsung
· NNP, Myriad, EyeQ, by Intel
· NVDLA (mostly used for Internet of Things [IoT] devices), by NVIDIA
Accessing the cloud-based Tenser Processing Unit (TPU)
Google specifically designed the Tensor Processing Unit (TPU) in 2015 to more quickly run applications built on the TensorFlow framework. It represents a true chip specialization in that you can’t use it effectively without TensorFlow. However, it’s different in another way in that it’s an Application-Specific Integrated Circuit (ASIC), rather than a full-blown CPU-type chip. The differences are important:
· An ASIC can perform only one task, and you can’t change it.
· Because of its specialization, an ASIC is typically much less expensive than a CPU.
· Most ASIC implementations are much smaller than the same implementation created with a CPU.
· Compared to a CPU implementation, an ASIC is more power efficient.
· ASICs are incredibly reliable.
Creating a Specialized Processing Environment
Deep learning and AI are both non-von Neumann processes, according to many experts, including Massimiliano Versace, CEO of Neurala Inc. (https://www.neurala.com/). Because the task the algorithm performs doesn’t match the underlying hardware, all sorts of inefficiencies exist, hacks are required, and obtaining a result is much harder than it should be. Therefore, designing hardware that matches the software is quite appealing. The Defense Advanced Research Projects Agency (DARPA) undertook one such project in the form of Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE). The idea behind this approach is to duplicate nature’s approach to solving problems by combining memory and processing power, rather than keeping the two separate. They actually built the system (it was immense), and you can read more about it at https://www.darpa.mil/program/systems-of-neuromorphic-adaptive-plastic-scalable-electronics and https://www.darpa.mil/news-events/2014-08-07.
The SyNAPSE project did move forward. IBM built a smaller system by using modern technology that was both incredibly fast and power efficient (see https://www.research.ibm.com/articles/brain-chip.shtml). The only problem is that no one is buying them. Just as many people would argue that Betamax was a better way of storing data than VHS, VHS won out on cost, ease of use, and compelling features (see “Betamax vs. VHS: How Sony Lost the Original Home Video Format War” at GIZMODO.com). The same holds true for IBM’s SyNAPSE offering, TrueNorth. It has been hard to find people who are willing to pay the higher price, programmers who can develop software using the new architecture, and products that genuinely benefit from the chip. Consequently, a combination of CPUs and GPUs, even with its inherent weaknesses, continues to win out.
Increasing Hardware Capabilities
The CPU still works well for business systems or in applications in which the need for general flexibility in programming outweighs pure processing power. However, GPUs are now the standard for various kinds of data science, machine learning, AI, and deep learning needs. Of course, everyone is constantly looking for the next big thing in the development environment. Both CPUs and GPUs are production-level processors. In the future, you may see one of two kinds of processors used in place of these standards:
· Application-Specific Integrated Circuits (ASICs): In contrast to general processors, a vendor creates an ASIC for a specific purpose. An ASIC solution offers extremely fast performance using very little power, but it lacks flexibility. You can find an example of an ASIC earlier in this chapter in the form of a TPU (see the “Accessing the cloud-based Tenser Processing Unit (TPU)” section for details).
· Field Programmable Gate Arrays (FPGAs): As with an ASIC, a vendor generally crafts an FPGA for a specific purpose. However, contrary to an ASIC, you can program an FPGA to change its underlying functionality. An example of an FPGA solution is Microsoft’s Brainwave, which is used for deep learning projects (see “Microsoft Brainwave aims to accelerate deep learning with FPGAs” at TechCrunch.com).
The battle between ASICs and FPGAs promises to heat up, with AI developers emerging as the winner. For the time being, Microsoft and FPGAs appear to have taken the lead (see “Microsoft: FPGA Wins Versus Google TPUs For AI” at Moor Insights & Strategy.com). The point is that technology is fluid, and you should expect to see new developments. The article “AI Chips Technology Trends & Landscape (GPU + TPU + FPGA + Startups),” by Jonathan Hui, provides an even better idea of just how much things are changing.
Vendors are also working on entirely new processing types, which may or may not actually work as expected. For example, Graphcore is working on an Intelligence Processing Unit (IPU), as described at https://www.prnewswire.com/news-releases/sequoia-backs-graphcore-as-the-future-of-artificial-intelligence-processors-300554316.html. The company has developed the line of processors shown at https://www.graphcore.ai/products/ipu. However, you have to take the news of these new processors with a grain of salt, given the hype that has surrounded the industry in the past. When you see real applications from large companies such as Google and Microsoft, you can start to feel a little more certain about the future of the technology involved.
Adding Specialized Sensors
An essential component of AI is the capability of the AI to simulate human intelligence using a full set of senses. Input provided through senses helps humans develop the various kinds of intelligence described in Chapter 1. A human’s senses provide the right sort of input to create an intelligent human. Even assuming that it becomes possible for an AI to fully implement all seven kinds of intelligence, it still requires the right sort of input to make that intelligence functional.
Humans typically have five senses with which to interact with the environment: sight, sound, touch, taste, and hearing. Oddly enough, humans still don’t fully understand their own capabilities, so it’s not too surprising that computers lag when it comes to sensing the environment in the same way that humans do. For example, until recently, taste comprised only four elements: salt, sweet, bitter, and sour. However, two more tastes now appear on the list: umami and fat (see “Sweet, Sour, Salty, Bitter, Umami … And Fat?” at FiveThirtyEight.com for details). Likewise, some women are tetrachromats (https://concettaantico.com/tetrachromacy/), who can see 100,000,000 colors rather than the more usual 1,000,000. (Only women can be tetrachromats because of the chromosomal requirements.) Knowing how many women have this capability isn’t even possible yet, but some sources have the number as high as 20 percent; see http://sciencevibe.com/2016/12/11/the-women-that-see-100-million-colors-live-in-a-different-world/ for details.
The use of filtered static and dynamic data enables an AI to interact with humans in specific ways today. For example, consider Alexa, the Amazon device that apparently hears you and then says something back. Even though Alexa doesn’t actually understand anything you say, the appearance of communication is quite addicting and encourages people to anthropomorphize these devices. To perform its task at all, Alexa requires access to a special sensor: a microphone that allows it to hear. Actually, Alexa has a number of microphones to help it hear well enough to provide the illusion of understanding. Unfortunately, as advanced as Alexa is, it can’t see, feel, touch, or taste anything, which makes it far from human in even the smallest ways.
In some cases, humans actually want their AI to have superior or different senses. An AI that detects motion at night and reacts to it might rely on infrared rather than normal vision. In fact, the use of alternative senses is one of the valid uses for AI today. The capability to work in environments that people can’t work in is one reason that some types of robots have become so popular, but working in these environments often requires that the robots have, or be connected to, a set of nonhuman sensors. Consequently, the topic of sensors actually falls into two categories (neither of which is fully defined): human-like sensors and alternative environment sensors.
Devising Methods to Interact with the Environment
An AI that is self-contained and never interacts with the environment is useless. Of course, that interaction takes the form of inputs and outputs. The traditional method of providing inputs and outputs is directly through data streams that the computer can understand, such as datasets, text queries, and the like. However, these approaches are hardly human friendly, and they require special skills to use.
Interacting with an AI is increasingly occurring in ways that humans understand better than they do direct computer contact. For example, input occurs through a series of microphones when you ask Alexa a question. The AI turns the keywords in the question into tokens it can understand. These tokens then initiate computations that form an output. The AI tokenizes the output into a human-understandable form: a spoken sentence. You then hear the sentence as Alexa speaks to you through a speaker. In short, to provide useful functionality, Alexa must interact with the environment in two different ways that appeal to humans, but which Alexa doesn’t actually understand.
Interactions can take many forms. In fact, the number and forms of interaction are increasing continually. For example, an AI can now smell (see “Artificial intelligence grows a nose” at ScienceMag.org). However, the computer doesn’t actually smell anything. Sensors provide a means to turn chemical detection into data that the AI can then use in the same way that it does all other data. The capability to detect chemicals isn’t new; the ability to turn the analysis of those chemicals isn’t new; nor are the algorithms used to interact with the resulting data new. What is new is the datasets used to interpret the incoming data as a smell, and those datasets come from human studies. An AI’s “nose” has all sorts of possible uses. For example, think about the AI’s capability to use a nose when working in some dangerous environments, such as to smell a gas leak before being able to see it by using other sensors.
Physical interactions are also on the rise. Robots that work in assembly lines are old hat, but consider the effects of robots that can drive. These are larger uses of physical interaction. Consider also that an AI can react in smaller ways. Hugh Herr, for example, uses an AI to provide interaction with an intelligent foot, as described in “Is This the Future of Robotic Legs?” at Smithsonian Magazine.com and “New surgery may enable better control of prosthetic limbs” at MIT News.edu. This dynamic foot provides a superior replacement for people who have lost their real foot. Instead of the static sort of feedback that a human gets from a standard prosthetic, this dynamic foot actually provides the sort of active feedback that humans are used to obtaining from a real foot. For example, the amount of pushback from the foot differs when walking uphill than walking downhill. Likewise, navigating a curb requires a different amount of pushback than navigating a step.
The point is that as AI becomes more able to perform complex calculations in smaller packages with ever-larger datasets, the capability of an AI to perform interesting tasks increases. However, the tasks that the AI performs may not currently have a human category. You may not ever truly interact with an AI that understands your speech, but you may come to rely on an AI that helps you maintain life or at least make it more livable.