Chapter 9

Hardware

FOR MANY OF US, THE MOST EXCITING ASPECT OF the Metaverse is the development of new devices that we might use to access, render, and operate it. This usually leads to visions of super-powerful, yet lightweight, augmented reality and immersive virtual reality headsets. These devices are not required for the Metaverse, but are often assumed to be the best or most natural way to experience its many virtual worlds. Big-tech executives seem to agree, even though supposed consumer demand for these devices has yet to translate into sales.

Microsoft began developing its HoloLens AR headset and platform in 2010, releasing the first device in 2016 and the second in 2019. After five years on the market, fewer than half a million units have shipped. Still, investment in the division continues and Microsoft CEO Satya Nadella still highlights the device to investors and customers, particularly in the context of the company’s Metaverse ambitions.

Although Google Glass, Google’s AR device, quickly earned a reputation as one of the most overhyped and failed products in consumer electronics history after it launched in 2013, Google continues to support it. In 2017, the company released an updated model, named the Google Glass Enterprise Edition, with a follow-up coming later in 2019. Since June 2020, Google has spent $1 billion–$2 billion acquiring AR glasses start-ups such as North and Raxium.

Though Google’s efforts in VR received less press attention than Google Glass, they’ve been more significant and arguably more disappointing. Google’s first foray came in 2014 and was named Google Cardboard and had the stated goal of inspiring interest in immersive virtual reality. For developers, Google produced a Cardboard software development kit, which helped developers create VR-specific apps built in Java, Unity, or Apple’s Metal. For users, Google created a $15 foldout cardboard “Viewer,” into which users could place their iPhones or Android devices in order to experience VR without needing to buy a new device. A year after Cardboard was announced, Google unveiled Jump, a platform and ecosystem for VR filmmaking, and Expeditions, a program focused on providing VR-based field trips for educators. The top-level numbers achieved by Cardboard were impressive: over 15 million viewers were sold by Google in five years, while nearly 200 million Cardboard-enabled apps were downloaded, and over a million students took at least one Expeditions tour within the first year of release. However, these figures reflected evidence of consumer intrigue more than inspiration. In November 2019, Google shut down the Cardboard project and open sourced its SDK. (Expeditions was discontinued in June 2021.)

In 2016, Google launched its second VR platform, Daydream, which was intended to improve upon Cardboard’s foundation. The improvements started with the quality of the Daydream viewer. The $80–$100 headset was made from foam and covered in soft fabric (available in four colors) and unlike the Cardboard viewer, could be strapped to a user’s head, rather than requiring to user to hold it up in front of them when in use. The Daydream viewer also came with a dedicated handheld remote control, and had an NFC (near-field communication) chip that could automatically recognize properties of the phone that was being used and put it into VR mode, instead of requiring users to do so themselves. While Daydream received positive reviews from the press and led companies including HBO and Hulu to produce VR-specific apps, consumers showed little interest in the platform. Google cancelled the project at the same time as Cardboard was terminated.

Despite struggling with AR and VR, Google still appears to view these experiences as central to its Metaverse strategy. Only a few weeks after Facebook publicly unveiled its vision of the future in October 2021, Clay Bavor, Google’s head of AR and VR, was made a direct report to Google/Alphabet CEO Sundar Pichai, and placed in charge of a new group, “Google Labs.” It contains all of Google’s existing AR, VR, and virtualization projects, its in-house incubator, Area 120, and any other “high-potential long-term projects.” According to press reports, Google plans to release a new VR and/or AR headset platform in 2024.

In 2014, Amazon launched its first and only smartphone, the Fire Phone. What differentiated the device from market leaders Android and iOS were the use of four front-facing cameras, which adjusted the interface in response to the user’s head movements, and Firefly, a software tool that automatically recognized texts, sounds, and visual objects. The phone turned out to be—and remains—the company’s biggest failure, cancelled barely a year after launch. Amazon recognized a $170 million write-down, primarily for unsold inventory. Yet the company soon began work on Echo Frames, a pair of glasses that lacked any sort of visual display, but included integrated audio, Bluetooth (to pair to a smartphone), and the Alexa assistant. The first Echo Frames released in 2019, with an updated model edition released a year later. Neither seems to have sold well.

One of the most outspoken proponents of AR and VR devices is Mark Zuckerberg. In 2014, Facebook acquired Oculus VR for $2.3 billion, more than twice the sum it paid for Instagram two years earlier, even though Oculus had yet to release its device to the public. Not long thereafter, Zuckerberg and his lieutenants mused publicly about the prospect of VR headset PCs as the primary computer for professionals, with wearable AR glasses becoming the primary way that consumers would access the digital world. Eight years later, Facebook announced that the Oculus Quest 2 had sold over 10 million units between October 2020 and December 2021—a figure that beat Microsoft’s new Xbox Series S and X console, which released around the same time. However, the device has yet to replace the PC, of course, and Facebook has yet to release an AR device. Still, the bulk of Facebook’s $10 billion–$15 billion in annual Metaverse investments are believed to focus on AR and VR devices.

Apple has been, as usual, secretive about its plans for or even belief in AR or VR—but its acquisitions and patent filings are revealing. Over the past three years, Apple has bought start-ups such as Vrvana, which produced an AR headset called Totem; Akonia, which produced lenses for AR products; Emotient, whose machine learning software tracked facial expressions and discerned emotions; RealFace, a facial recognition company; and Faceshift, which remapped a user’s facial movements to a 3D avatar. Apple also purchased NextVR, a VR content producer, as well as Spaces, which created location-based VR entertainment and VR-based experiences for videoconferencing software. On average, Apple is granted over 2,000 patents per year (and files for even more). Hundreds of those relate to VR, AR, or body tracking.

Beyond the tech giants, a number of midsize social technology players are investing in proprietary AR/VR hardware, despite having little to no history producing, let alone distributing and serving, consumer electronics. For example, although Snap’s first AR glasses, 2017’s Spectacles, received more acclaim for their pop-up vending machine sales model than their technical, experiential, or sales success, the company has released three new models over the last five years.

The size of investment in these devices, even in the face of constant rejection from consumers and developers, stems from a belief that history will repeat itself. Every time there is a large-scale transformation in computing and networking, new devices emerge to better suit their capabilities. The companies that first crack these devices, in turn, have the opportunity to alter the balance of power in technology, not just produce a new business line. Thus companies such as Microsoft, Facebook, Snap, and Niantic see the ongoing struggles with AR and VR as proof that they may be able to displace Apple and Google, which operate the most dominant platforms of the mobile era, while Apple and Google understand that they must invest to avoid disruption. There are early signals validating the belief that AR and VR are the next big device technology, too. In March 2021, the US Army announced a deal to buy up to 120,000 customized HoloLens devices from Microsoft over the following decade. This contract was valued at $22 billion—nearly $200,000 per headset (this includes hardware upgrades, repairs, bespoke software, and other Azure cloud computing services).

Another sign that mixed-reality devices are the future is that it’s possible to identify numerous technical shortfalls in VR and AR headsets that might be holding up mass adoption. In this sense, some argue that current devices are to the Metaverse what Apple’s ill-fated Newton tablet was to the smartphone era. The Newton was released in 1993 and offered much of what we expect from a mobile device—a touchscreen, dedicated mobile operating system and software—but it lacked even more. The device was nearly the size of a keyboard (and weighed even more), could not access a mobile data network, and required the use of a digital pen rather than the user’s finger.

With AR and VR, one key constraint is the device display. The first consumer Oculus, released in 2016, had a resolution of 1080 × 1200 pixels per eye, while the Oculus Quest 2, released four years later, had a resolution of 1832 × 1920 per eye (roughly equivalent to 4K). Palmer Luckey, one of Oculus’s founders, believes that more than twice the latter resolution is required for VR to overcome pixilation issues and become a mainstream device. The first Oculus peaked at a 90-Hz refresh rate (90 frames per second), while the second offered 72–80 Hz. The most recent edition, 2020’s Oculus Quest 2, defaults to 72 Hz, but supports most titles at 90 Hz and provides “experimental support” for 120 Hz on less computationally intensive games. Many experts believe 120 Hz is the minimum threshold for avoiding the risk of disorientation and nausea. According to a report published by Goldman Sachs, 14% of those who’ve tried an immersive VR headset say they “frequently” experience motion sickness while using the device, 19% respond “sometimes,” and another 25% encounter it rarely, but not never.

AR devices have even greater limitations. The average person sees roughly 200°–220° horizontally and 135° vertically, representing a roughly 250° diagonal field of view. The most recent version of Snap’s AR glasses, which cost roughly $500, have a 26.3° diagonal field of view—meaning roughly 10% of what you can see can be “augmented”—and runs at 30 frames per second. Microsoft’s HoloLens 2, which costs $3,500, has twice the field of view and frame rate, but still leaves 80% of a user’s eyesight without augmentation, even though the entirety of their eyes (and much of their head) are covered by the device. HoloLens 2 weighs 566 grams, or 1.25 pounds (the lightest iPhone 13 weighs 174 grams, while the iPhone 13 Pro Max is 240 grams) and supports only two to three hours of active use. Snap’s Spectacles 4 weigh 134 grams and can only operate for 30 minutes.

The Hardest Technology Challenge of Our Time

We might assume that tech companies will inevitably find ways to improve displays, reduce weight, increase battery life, while adding new functionality. After all, TV resolutions seem to increase every year, while supported refresh rates go up, prices go down, and the profile of the device itself narrows. Yet Mark Zuckerberg has said that “the hardest technology challenge of our time may be fitting a supercomputer into the frame of normal-looking glasses.”1 As we saw when examining compute, gaming devices don’t just “display” previously created frames, as a TV does—they must render these frames themselves. And as with the challenge of latency, there may be real limitations imposed by the laws of the universe when it comes to what’s possible with AR and VR headsets.

Increasing both the number of pixels rendered per frame as well as the number of frames per second requires substantially greater processing power. This processing power also needs to fit inside a device that can be comfortably worn on your head, rather than be stored inside your living room credenza or held in the palm of your hand. And crucially, we need AR and VR processors to do more than just render more pixels.

The Oculus Quest 2 indicates the scale of the obstacle. Like most gaming platforms, Facebook’s VR device has a battle royale game, Population: One. But this battle royale doesn’t support 150 concurrent users, as Call of Duty Warzone does, nor 100 like Fortnite, or even Free Fire’s 50. Instead, it’s limited to 18. The Oculus Quest 2 can’t handle much more. Furthermore, the graphics of this game come closer to those of the PlayStation 3, which released in 2006, rather than 2013’s PlayStation 4, let alone 2020’s PlayStation 5.

We also need AR and VR devices to perform work we don’t typically ask from a console or PC. For example, Facebook’s Oculus Quest devices include a pair of external cameras that can help alert a user who might otherwise bump into a physical object or a wall. At the same time, these cameras must be able to track user’s hands so that they can be re-created inside a given virtual world, or use them as controllers, with given motions or finger movements substituting for the press of a physical button. This might seem a poor substitute for a real controller, but it liberates a VR or AR headset owner from needing to travel with (or walk down the street using) one. Zuckerberg has also spoken to the desire to include cameras on the interior of an AR or VR headset in order to scan and track the user’s face and eyes so that the device can steer the user’s avatar based on facial and eye movements alone. However, all these additional cameras add more weight and bulk to a headset, while also requiring more computing power, not to mention more battery power. Of course, they add cost, too.

To put this in perspective, we can compare Microsoft’s HoloLens 2 to Snap’s Spectacles 4. Although the former offers twice the field of view and frame rate of the latter, it is also seven times the price ($3,000–$3,500 versus $500), weighs four times as much, and rather than resemble a pair of futuristic Ray-Bans, looks more like the faceplate and skull of a cyborg. For consumer AR devices to take off, we likely need a device more powerful than the HoloLens, yet smaller than the Spectacles 4. While industrial AR headsets can be larger, they’re still constrained by the need to fit under a helmet and the need to minimize neck strain—and must also improve severalfold.

The immense technical challenge of “supercomputer glasses” helps explain how tens of billions of dollars are being spent annually on the problem. But despite this investment, there will be no sudden breakthrough. Instead, there will be a constant process of improvements that reduce the price and size of AR and VR devices, while increasing their computing power and functionality. And even when a key barrier is broken by a given hardware platform or component provider, the rest of the market typically follows within two to three years. What will ultimately differentiate a given platform is the experiences it offers.

We can see this process clearly in the history of the iPhone—the most successful product of the mobile era.

Today, Apple designs many of the chips and sensors inside of its devices, but its first several models were entirely comprised of components created by independent suppliers. The first iPhone’s CPU came from Samsung, its GPU from Imagination Technologies, various image sensors were from Micron Technologies, the touchscreen’s glass was from Corning, and so on. Apple’s innovations were less tangible—how these components were brought together, when, and why.

Most obviously, Apple bet on the touchscreen, skipping a physical keyboard altogether. The move was widely ridiculed at the time, especially by market leaders Microsoft and BlackBerry. Apple also chose to focus on consumers, rather than appealing to large enterprises and small-to-medium businesses, which represented the majority of smartphone sales from the mid-1990s through the late 2000s. Even more radical was the iPhone’s price: $500–$600, compared to $250–$350 for competing smartphones, such as the BlackBerry (which were often free to the end user, too, as they were provided by an employer). Apple co-founder and CEO Steve Jobs believed that its $500 device offered superior value—more than a $200 or $300 device ever could—even if the latter was available for free.

Jobs’s bets on touchscreens, target market, and price point all proved correct. They were also aided by interface choices that often seemed contradictory, yet perfectly navigated the tension between complexity and simplicity. A good case study is the iPhone’s “home button.”

Although Jobs showed little interest in physical keyboards, he nevertheless decided to place a large “home button” on the front of the iPhone. The button is now a familiar design element, but it was a novel approach at the time. It also came at a significant cost. The button took up space that might otherwise have been used for a larger screen, longer-lasting battery, or more powerful processor. However, Jobs saw it as an essential part of introducing consumers to both touchscreens and pocket-sized computing. Not unlike closing a flip phone, a user knew that no matter what was happening on their iPhone’s touchscreen, pressing the home button would always take them back to the main screen.

In 2011, four years after it released the first iPhone, Apple added a new feature to its operating system: multitasking. Before this point, users could only operate a few pre-determined applications at the same time. It was possible to listen to music through the iPod app while reading the New York Times app, but if the user then opened up their Facebook app, the Times app would close. If the user wanted to return to a given article in the Times app, they’d have to reopen the app, then navigate back to the article and their specific place in it. Doing so would mean exiting the Facebook app, too. Multitasking now allowed users to effectively “pause” an app while switching to another, all of which was managed by the home button. If a user clicked the home button, the app would be paused and they’d return to the home screen. If they double-clicked, the app would still be paused, and a tray of all paused apps would be displayed and could be swiped through.

The first few iPhones could have supported multitasking. After all, other smartphones with similar CPUs supported the feature. However, Apple believed it needed to ease users into the mobile computing era, and this meant focusing not just on what technology was possible, but when users were ready for it. To this end, it wasn’t until 2017, with the release of the tenth iPhone, that Apple felt comfortable removing a physical home button and requiring users to “swipe up” from the bottom of the screen instead.

There are no “best practices” within the brand-new device category. In fact, many of the choices we consider obvious today were once controversial—not just the iPhone’s touchscreen. For example, some early Android builds and apps used Apple’s “pinch-to-zoom” concept, but believed it was backward—if you’re bringing your fingers closer together, shouldn’t whatever you’re looking at come closer, not move farther out? It’s almost impossible to imagine this logic today, but that’s partly because we’ve been trained for fifteen years into thinking the opposite is natural. Apple’s “slide-to-unlock” feature was considered so novel the company was awarded a patent for it, and ultimately won over $120 million after a US Court of Appeals found that Samsung had violated this patent, among others owned by Apple. Even the app store model was controversial. Smartphone leader BlackBerry didn’t launch its app store until 2010, two years after Apple and a year after its famous “There’s an app for that” campaign. What’s more, BlackBerry’s focus on business users (and therefore security) led to policies so strict, such as the need for notarized documents just to gain access to BlackBerry’s application development kit, that many developers never even bothered with the platform.

We can already observe echoes of the “smartphone wars” in the VR and AR race. As we’ve seen, Snap’s AR glasses cost less than $500 and target consumers, while Microsoft’s run for $3,000 or more and focus on enterprises and professionals. Google believed that rather than sell a multi-hundred- or multi-thousand-dollar VR headset, consumers should instead place the expensive smartphone they already own into a “viewer” that costs less than $100. Amazon’s augmented reality glasses don’t even have a digital display, and instead emphasize its audio-based Alexa assistant and trendy form factor. Facebook, unlike Microsoft, seems to be focusing on VR before AR, and Zuckerberg and many of his top lieutenants have mused that cloud game streaming may be the only way for a VR user to participate in a richly rendered, high concurrency simulation. Zuckerberg has also said that as a socially focused company, he believes Facebook’s AR devices are likely to place a greater emphasis on face- and eye-tracing cameras, sensors, and capabilities as compared to his competitors, who might otherwise focus on minimizing the size of a device, or maximizing its aesthetics. Yet no one knows the exact trade-offs between, say, device profile and functionality, or price and functionality. To capitalize on developer dissatisfaction with Apple’s and Google’s closed app store models (a topic I’ll explore in more detail in the next chapter), Zuckerberg has promised to keep Oculus “open,” enabling developers to directly distribute their apps to users and for users to install non-Oculus app stores on their Oculus devices. Though this will surely help attract developers, it produces new risks in user and data privacy—especially as the number of cameras on the device grow.

For AR and VR, it seems clear that the hardware challenges are greater than those for smartphones. And by adapting interfaces from 2D touch to a mostly intangible 3D space, it’s likely that interface design will also be harder. What will be the “pinch-to-zoom” or “slide-to-unlock” of AR and VR? What, exactly, are users capable of and when?

Beyond Headsets

Alongside the many investments in immersive headsets are countless other efforts to produce new Metaverse-focused hardware that will complement our primary computing devices, rather than substitute for them, as some imagine AR and VR devices might one day.

Most commonly, gamers imagine wearing smart gloves and even bodysuits that can provide physical (that is, “haptic”) feedback to simulate what’s happening to their avatar in a virtual world. Many such devices exist today, though they’re so costly and functionally limited that they’re typically used exclusively for industrial purposes. Specifically, these wearables use a network of motors and electroactive actuators that inflate tiny air pockets, thereby applying pressure to their owner or limiting their ability to move.

Haptic vibration technology has advanced considerably since Nintendo introduced the Rumble Pak for the Nintendo 64 in 1997. Today’s controller triggers, for example, can be programmed with context-specific resistance—a shotgun, sniper rifle, and crossbow will all feel different to “pull.” The crossbow might even fight back, with the user struggling to hold it down and sensing the vibrations of a virtual bowstring that doesn’t really exist.

Another class of haptic interface devices emit ultrasonic sound (that is, mechanical energy waves beyond the audible range for humans) from a grid of microelectromechanical systems (known as MEMS), producing what a user might describe as a “force field” in the air in front of them. The force field produced by these devices, which looks a bit like a short, perforated tin box, is typically less than six or eight inches tall and wide, but its nuance tends to surprise. Test subjects claim to be able to sense everything from a plush teddy bear to a bowling ball and the shape of a sandcastle as it crumbles, aided in part by the fact that fingertips contain more nerve endings than almost any other part of the body. Crucially, MEMS devices can also detect the user’s interaction, enabling its sound-based teddy bears to respond to the user’s air-based touch, or the castle to crumble if it’s touched.

Gloves and bodysuits can also be used to capture a user’s motion data, rather than just relay feedback, thereby allowing the wearer’s body and gestures to be reproduced in a virtual environment in real time. This information can also be captured using tracking cameras. However, such cameras require unobstructed views, relative proximity to the user, and can struggle if they need to track more than a single user in rich detail. Many users—families, for instance—will want multiple tracking cameras in their “Metaverse rooms” and might add a few smart wearables to their wrists or ankles.

Such bands may seem clumsy (how could a bracelet or anklet substitute for a high-definition camera watching every finger?), but even current technology is impressive. The sensors in an Apple Watch, for example, can distinguish between a user clenching or releasing their fist, between pinching one finger to their thumb and two fingers to their thumb, and can use these movements to interact with the Apple Watch and potentially other devices. In addition, people wearing the watch can use the clenching motion to place a cursor on the Watch face, then use the orientation of their hand to move it. The software involved, Apple’s AssistiveTouch, works using fairly standard sensors, including an electric heart monitor, a gyroscope, and an accelerometer.

Other approaches promise even greater capabilities. Facebook’s highest price acquisition since Oculus VR in 2014 was CTRL-labs, a neural interface start-up that produces armbands that record electrical activity from skeletal muscles (a technique called electromyography). Although CTRL-labs’ devices are worn more than six inches away from the wrist and even farther away from the fingers, CTRL-labs’ software enables minute gestures to be reproduced inside virtual worlds—from individual fingers being raised to count, point, or gesture to “come hither,” as well as pinches between different sets of fingers. Importantly, CTRL-labs’ electromyographic signals can go beyond reproducing human appendages. A famous CTRL-labs demo involves a user—in this case, an employee—mapping their fingers to a crab-like robot and then walking it forward, back, and side-to-side by flexing their fist and moving their fingers.

Facebook is also planning its own line of smart watches. But unlike Apple, Facebook doesn’t see the device as secondary to, or dependent upon, a smartphone. Instead, Facebook’s watch is intended to have its own wireless data plan and includes two cameras, both of which are detachable and intended to be integrated into third-party items, such as a backpack or a hat. Meanwhile, Google’s fifth-largest acquisition ever was the smart wearables company Fitbit, which the company bought for over $2 billion in early 2021.

Wearables will shrink in size and increase in performance, and as the technology improves they will be integrated into our clothes. These developments will help users enhance their interactions with the Metaverse, and enable them to interact with it in more places. Carrying a controller everywhere you go is not practical, and if the primary goal of AR is to make technology disappear into an everyday pair of glasses, then pulling out a thumbstick or a smartphone to use it truly defeats the purpose.

Some believe that the future of computing isn’t a pair of AR glasses or a watch, or another kind of wearable, but something smaller. In 2014, only a year after its ill-fated Google Glass launch, Google announced its first Google Contact Lens project, which was intended to help diabetics monitor their glucose levels. Specifically, this “device” was made up of two soft lenses, with a wireless chip, a wireless antenna that’s thinner than a strand of human hair, and a glucose sensor placed in between. A pinhole between the underlying lens and the wearer’s eyes enabled tear fluid to reach the sensor, which measures blood sugar levels. The wireless antenna drew power from the wearer’s smartphone, which was intended to support at least one reading per second. Google also planned to add a small LED light that could warn users of spikes or drops in blood sugar levels in real time.

Google discontinued its diabetes smart lens program four years after it was launched, but the company claimed that the cancellation stemmed from “insufficient consistency in our measurements of the correlation between tear glucose and blood glucose concentration,” which had been widely noted by researchers in the medical community. Regardless, patent filings show that the major Western, Eastern, and Southeast Asian technology companies continue to invest in smart lens technology.

Fanciful though such technologies might seem in a world in which internet connections remain unstable and computing scarce, they nevertheless feel within reach compared to so-called brain-to-computer interfaces (BCIs), which have been under development since the 1970s and continue to attract greater investment. Many purported BCI solutions are noninvasive—think of Professor X’s helmet in X-Men, or perhaps a grid of wired sensors hidden underneath the wearer’s hair. Other BCIs are partially invasive or fully invasive, depending on how close the electrodes are placed to brain tissue.

In 2015, Elon Musk founded Neuralink, of which he remains CEO, and announced that the company was working on a “sewing machine–like” device which could implant sensors that are four-to-six micrometers thick (roughly 0.000039 inches, or one-tenth the width of human hair) into the human brain. In April 2021, the company released a video in which a monkey played the game Pong using a wireless Neuralink implant. Only three months later, Facebook announced that it was no longer investing in its own BCI program. In prior years, the company had funded several projects inside and outside the company, including a test at the University of California, San Francisco, which involved wearing a helmet which shot light particles through the skull, then measured blood oxygenation levels in groups of brain cells. A blog post on the subject explained that “while measuring oxygenation may never allow us to decode imagined sentences, being able to recognize even a handful of imagined commands, like ‘home,’ ‘select,’ and ‘delete,’ would provide entirely new ways of interacting with today’s VR systems—and tomorrow’s AR glasses.”2 Another Facebook BCI test involved a physical mesh of electrodes placed on top of the user’s skull, which enabled the subject to write at a speed of roughly 15 words per minute purely through thought (the average person types 39 words per minute, two and a half times faster). Facebook reported that “while we still believe in the long-term potential of head-mounted optical [brain-computer interface] technologies, we’ve decided to focus our immediate efforts on a different neural interface approach that has a nearer-term path to market for AR/VR”3 and that “a head-mounted optical silent speech device is still a very long way out. Possibly longer than we would have foreseen.”4 The “differential neural interface approach” that Facebook referred to is likely CTRL-labs, but part of the problem with BCI’s “path to market” is ethics, not tech. How many people want a device that can read their thoughts—and not just the thoughts that are related to the task at hand? Especially if that device is permanent?

The Hardware Around Us

In addition to the devices we hold, wear, and maybe even implant as part of our transition to the Metaverse, there are the devices that will proliferate throughout the world around us.

In 2021, Google unveiled Project Starline, a physical booth designed to make video conversations feel like you’re in the same room with the other participant. Unlike a traditional monitor or videoconferencing station, Starline’s booths are powered by a dozen depth sensors and cameras (together producing seven video streams from four viewpoints and three depth maps), as well as a fabric-based, multilayered light-field display, and four spatial audio speakers. These features allow participants to be captured and then rendered using volumetric data, rather than flattened 2D video. During internal tests, Google found that in comparison to typical video calls, Starline users focused 15% more on those they’re speaking to (based on eye-tracking data), displayed significantly greater non-verbal forms of communication (e.g., ~40% more hand gestures, ~25% more head nods, and ~50% more eyebrow movements), and had 30% better memory recall when asked to remember details from their conversation or meeting.5 The magic, as always, is in the software, but it depends on the extensive hardware to be realized.

Storied camera manufacturer Leica now sells $20,000 photogrammetric cameras that have up to 360,000 “laser scan set points per second,” and that are designed to capture entire malls, buildings, and homes with greater clarity and detail than the average person would ever see if they were physically on-site. Epic Games’ Quixel, meanwhile, uses proprietary cameras to generate environmental “MegaScans” comprising tens of billions of pixel-precise triangles. The satellite imaging company Planet Labs, mentioned in Chapter 7, performs scans of nearly the entire earth daily across eight spectra bands, enabling not just daily high-resolution imagery, but details including heat, biomass, and haze. To produce this imagery, it operates the second-largest fleet of satellites of any company in the world,* with over 150, many of which weigh less than 5 kilograms and are smaller than 10 × 10 × 30 centimeters. Every photo from these satellites covers 20–25 square kilometers and is made up of 47 megapixels, with each pixel representing 3 × 3 meters. Roughly 1.5 GB of data is sent from each of these satellites per second and from an average distance of 1,000 kilometers. Will Marshall, the CEO and co-founder of Planet Labs, believes that the cost per performance of these satellites has improved by 1,000 times since 2011.6 Such scanning devices make it easier and cheaper for companies to produce high-quality “mirror worlds” or “digital twins” of physical spaces—and to use scans of the real world to produce higher-quality and less-expensive fantasy worlds.

Also important are real-time tracking cameras. Consider Amazon’s cashier-less, cashless, and auto-pay grocery stores, Amazon Go. These stores deploy scores of cameras that track every customer by way of facial scanning as well movement tracking and gait analysis. A customer can pick up and put down whatever they like, then simply walk out of the store, having paid for only what they took with them. In the future, this sort of tracking system will be used to reproduce these users, in real time, as digital twins. Technologies such as Google’s Starline might simultaneously allow workers to be “present” in the store (potentially from an offshore “Metaverse call center”), jumping across different screens to help the customer.

Hyper-detailed projection cameras will also play a part, enabling virtual objects, worlds, and avatars to be transplanted into the real world and in realistic detail. Key to these projections are various sensors that enable the cameras to scan and understand the non-flat, non-perpendicular landscapes they will project against, and alter their projection accordingly so that it appears undistorted to the viewer.

Technologists have long imagined an internet-of-things future where sensors and wireless chips are as ubiquitous as electrical outlets, albeit more diverse, thereby enabling us to light up any number of experiences wherever we go. Imagine a construction site with drones overhead, each filled with cameras, sensors, and wireless chips, and with workers below them wearing AR headsets or glasses. This setup would allow a site operator to know exactly what is happening, where, and at all times, including the total volume of sand in a given mound, the number of trips needed to move it by machine, who is closest to a problem area and best able to address it, when, and with what impact.

Of course, not all of these experiences require the Metaverse, or even virtual simulation. However, humans find 3D environments and data presentation far more intuitive—consider the difference between seeing a digital tablet containing the status of a job site and instead seeing that information overlaid on top of the site and its objects. Notably, Google’s second-largest acquisition ever (the largest if we exclude Motorola, which Google divested after three years) is that of Nest Labs, which develops and operates smart sensor devices, for $3.2 billion in 2014. Eight months after the acquisition, Google spent another $555 million to acquire Dropcam, a smart-camera maker, which was then folded into Nest Labs.

Long Live the Smartphone?

It’s fun to imagine all the brilliant new devices that will soon enough allow us to jack into the Metaverse. But, at least through the 2020s, it’s likely that most devices for the Metaverse era will be those we already use.

Most experts, including Unity Technologies CEO John Riccitiello, estimate that by 2030, there will be fewer than 250 million active VR and AR headsets in use.7 Of course, betting on such long-term prognostications is dangerous. The first iPhone was released in 2007, eight years after the first BlackBerry smartphone and at a point when smartphone penetration was less than 5% in the US. Eight years later, the iPhone had sold more than 800 million units and driven US penetration to nearly 80%. Few in 2007 believed that by 2020, two-thirds of everyone on earth would have a smartphone.

Still, AR and VR devices face not just significant technical, financial, and experiential obstacles, but also the need to squeeze in. Behind the rapid growth of smartphones was a simple pair of facts: the personal computer was one of the most significant inventions in human history, but more than 30 years after it was invented, less than one in six people worldwide owned one. And those lucky few who did? Well, their computers were large and immovable. AR and VR devices will not be a person’s first computing device, nor even their first portable one. They are fighting to be a person’s third, or even fourth—and for a long time yet will probably be one of their least powerful, too.

AR and VR may come to replace most of the devices we use today. That time is unlikely to be soon. Even if the combined number of VR and AR headsets (two very different device types) in use by 2030 tops one billion, four times the aforementioned forecast, they would still reach fewer than one in six smartphone users. And that’s okay. There are, in 2022, hundreds of millions of people spending hours each day inside real-time rendered virtual worlds through smartphones and tablets—and these devices are rapidly improving.

In earlier sections, I reviewed the ongoing improvements in smartphone CPUs and GPU power. These are probably the most important Metaverse-related advancements to these devices, but they’re far from the only ones worth highlighting. Since 2017, new iPhone models have included infrared sensors that track and recognize 30,000 points on the user’s face. While this functionality is most commonly used for Face ID, Apple’s face-based authentication system, it also enables app developers to reproduce a user’s face in real time as an avatar or with virtual augmentations. Examples include Apple’s own Animoji, Snap’s AR lenses, and Epic Games’ Unreal-based Live Link Face app. In the years to come, many virtual worlds operators will use this capability to let players map their facial expressions to their in-world avatars—live and without requiring another dollar in hardware.

Apple has also led the deployment of lidar scanners into smartphones and tablets. As a result, even most engineering professionals no longer see the need to buy $20,000–$30,000 lidar-specific cameras, and close to half of American smartphone users can now create and share virtualizations of their homes, offices, yards, and everything inside them. This innovation has transformed companies such as Matterport (discussed in Chapter 7), which now produces thousands of times the number of scans per year, and with far greater diversity.

The iPhone’s high-resolution, three-lens cameras also enable users to create high-fidelity virtual objects and models using photographs, with the asset stored in the Universal Scene Description interchange framework. These objects can then be transplanted into other virtual environments—thereby reducing the cost and increasing the fidelity of synthetic goods—or overlaid into real environments for the purpose of art, design, and other AR experiences.

Oculus VR, meanwhile, uses the high-resolution, multi-angle iPhone camera to produce mixed reality experiences. For example, an Oculus user playing Beat Sabre#x2021; can place their iPhone several meters behind them, so they can see themselves inside a VR environment from within their VR headset, and all from a third-person perspective.

Many new smartphones also feature new ultra-wideband (UWB) chips that emit up to 1 billion radar pulses per second and receivers that process the return information. Smartphones can thereby create extensive radar maps of the user’s home and office, and know exactly where the user is within these maps (or others such as Google’s street or building maps), and relative to other users and devices. Unlike GPS, UWB offers precision down to a few centimeters. The front door to your home can automatically unlock when you approach from the outside, but know that when you’re tidying the shoe rack indoors, the door should not unlock. Using a live radar map, you’ll be able to navigate much of your home without ever needing to remove your VR headset—your device will alert you a potential collision, or render the potential obstacle inside your headset so that you can move around it.

That all of this is possible through standard consumer-grade hardware is astonishing. The growing role of this functionality in our daily lives explains why the average sales price of Apple’s iPhone has increased from roughly $450 in 2007 to over $750 in 2021. Put another way, consumers have not asked Apple to use the cost-side of Moore’s Law to offer the capabilities of the first few iPhones but at lower price. Nor have they asked Apple to use the performance-side of Moore’s Law to improve the prior year’s iPhone while maintaining its price. Instead, consumers want more—more of, well, nearly everything the iPhone might do.

Some believe that the future role of the smartphone includes operating as a user’s “edge computer” or “edge server,” providing connectivity and compute to the world around us. Versions of this model already exist. For example, most of the Apple Watches sold today lack a cellular network chip and instead connect to their owner’s iPhone via Bluetooth. The approach has limitations: the Apple Watch cannot make a phone call when taken too far away from its tethered iPhone, nor play music to a wearer’s AirPods, download new apps, retrieve messages not already stored on the watch, and so on. But in exchange, the device is much cheaper, lighter, and less battery intensive—all because the user’s iPhone, a far more powerful device, and one with greater cost per performance, is doing most of the work.

Similarly, the iPhone will send complex Siri queries to Apple’s servers for processing, while many users choose to store the bulk of their photos in the cloud instead of buying iPhones with larger hard drives, which can cost between $100 and $500 more. Earlier, I mentioned that many believe VR headsets must at least double the screen resolution offered by top-of-the line devices in the market today and reach 33%–50% higher frame rates (that is, produce more than two and a half times the number of pixels per second) if they’re to gain mainstream adoption. Not only that, but this must happen alongside cost reductions, shrinking the device’s profile, and minimizing heat. While the technology doesn’t exist yet in a single device, by connecting to a sufficiently powered PC via Oculus Link, the Oculus Quest 2 can reliably hike its frame rate while also increasing rendering capabilities. In January 2022, Sony announced its PlayStation VR2 platform, which boasted 2000 × 2040 pixels per eye (roughly 10% more than the Oculus Quest 2) and a 90–120-Hz refresh rate (compared to the OQ2’s 72–120), with 110° field of view (versus 90°), plus eye tracking (not available). However, the PSVR2 requires users to own and physically connect to Sony’s PlayStation 5 console, which costs more than the cheapest Oculus Quest 2, and does not come with the PSVR2 headset.

Given the shortage, importance, and cost of compute, it makes sense to focus on a single device’s capabilities, rather than invest in scores of others, especially when those devices have greater physical, thermal, and cost limitations. A computer worn on your wrist or face simply cannot contend with one in your pocket. This logic applies to more than just compute. If Facebook wants us to wear a CTRL-labs band on each limb, why load each one with their own cellular and WiFi network chips, if cheaper, less energy intensive, and smaller Bluetooth chips are available, and they can send that data to a smartphone to manage? Personal data may be the most important consideration. We probably don’t want our data being collected, stored, or sent to a wide network of devices. Instead, most of us would prefer that this data is sent from these devices to the one we trust most (and one that’s stored on our person), and for that device to manage which other devices can have access to other parts of our online histories, information, and entitlements.

Hardware as the Gateway

The many devices required and expected to support the Metaverse can be grouped into three categories. First, the “primary computing devices,” which for most consumers are smartphones, but may be AR or immersive VR at some point in the future. Second, the “secondary” or “supporting computing devices,” such as a PC or PlayStation, and likely AR and VR headsets. These devices may or may not rely on a primary device, or be complemented by them, but they will be used less frequently than a main device and for more specific purposes. Finally, we have the tertiary devices, such as a smartwatch or tracking camera, which enrich or extend a Metaverse experience, but will rarely operate it directly.

Each category and sub-category of devices will increase Metaverse engagement time and total revenues—and offer manufacturers an opportunity to generate a new business line. However, the massive investment in these devices—many of which are years from mainstream viability—has broader motivations.

The Metaverse is a mostly intangible experience: a persistent network of virtual worlds, data, and supporting systems. However, physical devices are the gateway to accessing and creating these experiences. Without them there is no forest to be known, heard, smelled, touched, or seen. This fact provides the device manufacturers and operators with significant soft and hard power. The manufacturers and operators will determine what GPUs and CPUs are used, the wireless chipsets and standards deployed, the sensors included, and so on. Critical though these intermediary technologies are to a given experience, they rarely interface directly with a developer or end user. Instead, they’re accessed through an operating system, which manages how, when, and why the capabilities are used by a developer, which sorts of experiences they’re allowed to offer a user, and whether and to what extent a commission must be paid to the maker of said device.

In other words, hardware is not just about what the Metaverse might offer and when, but a fight to influence how it works and, ideally, to take a cut of as much of its economic activity as possible. The more important the device—and the more devices that connect to it—the greater the control afforded to the company which makes it. To understand what this means in practice, we need to dive deep into payments.

* As points of comparison, China has fewer than 500 satellites, while Russia has fewer than 200. However, they are typically far larger and more capable than those of Planet Labs.

Lidar determines the distance and shape of objects by measuring the time it takes for a reflected laser (i.e., light beam) to return to a receiver, similar to how radar scanners use radio waves.

#x2021; Beat Sabre is like Guitar Hero, though notes are hit not by pressing a button on a physical keyboard, but by hitting a virtual button with a virtual lightsaber.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!