But Echo spoke for itself—and not just literally. People quickly saw that the pint-size cylinder was a genuinely new kind of consumer-electronics device. They got excited over the concept, and, soon thereafter, by where Amazon was taking it.
Beyond that, Amazon has established itself as a peer of Apple, Google, and Microsoft when it comes to building voice-controlled interfaces and infusing them with artificial intelligence. The number of “skills” that Alexa possesses—tasks that it can perform, such as setting a thermostat or summoning an Uber—has grown from 135 in January to 1,000 in June to 4,000 today, thanks to the tools that the company has given developers to integrate their services and devices into Alexa. By the time proprietors of other voice services have caught up with that figure, Alexa may have sprinted even further ahead.
To hear about what Amazon has learned during the first two years of this journey, I visited its campus and spoke with two of the people who have shaped Echo and Alexa from the days before they’d been announced. Toni Reid, director of product, Echo devices and Alexa, is an 18-year Amazon veteran who’s responsible for the overall consumer experience of Echo and related products such as the Dot and Tap. And Rohit Prasad, VP and head scientist for Alexa machine learning, oversees the artificial-intelligence underpinnings that allow Alexa to understand and fulfill user requests.
Amazon started work on the Alexa and Echo project in 2011, long before it ever told the world about it. And its initial goal couldn’t have been much loftier.
“The inspiration for Alexa,” declares Prasad, “was the Star Trek computer.” In other words, Amazon aspired to create a computing interface that wasn’t all that different from communicating with a really smart, capable human being. That’s not exactly a unique goal: Google has often turned to the same cultural touchstone. But it clearly provided Amazon with enough challenges to keep itself busy for many years and multiple product cycles.
At the same time, the company wanted to focus its effort on something discrete and achievable. “You’ll hear us talk about our working-backwards process, where we write a press release as if it’s going out to customers at the very beginning of a process, and then we start working backwards from that,” explains Reid.
As part of this working-backwards analysis, Amazon decided that Echo would not only be a voice-controlled speaker but that it wouldn’t have a display at all, a design gambit that meant that Alexa would not only emphasize speech (like Apple’s Siri or Microsoft’s Cortana) but be 100% reliant on it. “Not having a screen meant there was no escape route,” says Prasad. “We had to solve that problem.”
In multiple ways, the Alexa and Echo projects required Amazon to ramp up its expertise in machine learning and other artificial intelligence technologies beyond all the things it was already using them for, such as recommending products and forecasting prices. “A voice-based computer that you could just talk to was definitely the natural progression of that in terms of complete artificial intelligence working at the scale you’d want,” says Prasad.
The company turned to machine learning from the start as it figured out how to teach a speaker that might be located across the room to recognize the “wake word” (“Alexa,” by default) among all the other other conversation that might be going on in a room—a different, more daunting challenge than that presented by earlier, smartphone-centric voice assistants such as Siri.
Here’s what an Echo would hear without any noise cancellation, and what it hears after the software Amazon created has scrubbed out almost all sound except for the wake word “Alexa”:
From a feature standpoint, some of Amazon’s decisions about what to build were pretty straightforward. “We started to look at use cases, what customers would be doing in their home,” Reid says. “Music floated to the top.”
At first, that music you got was mostly from Amazon’s own catalog—which, in a world in which many people get their music elsewhere, was limiting. “That wasn’t something we were oblivious to,” Reid. “It was just the timing. But our customers very quickly [said] ‘We want Pandora, we want Spotify, we want other providers.’ And so we did the work to implement them.”
Another obvious missing service in the earliest days: Amazon’s own audiobook arm, Audible. “You’d think Amazon would have it, but we didn’t, and that was a big ask for customers,” Reid remembers. “They had this device in their house and they said, ‘I want to listen to my audiobooks.'”
Amazon had a good handle on the types of content people would want to consume via Alexa. But it also listened carefully to the people who bought Echoes—which was a major reason why it sold the product only in limited quantities through an invitation system at first. “We thought, let’s go to a subset of customers who we think will give us feedback and who want to shape the product,” Reid explains. “It turned out that it worked. We had a group of great customers who gave us early feedback and high usage.”
It turned out that those consumers were interested in using Alexa to control other devices in their households. Amazon hadn’t ignored such a scenario entirely, but it hadn’t been one of its initial priorities, as evidenced by this introductory video, which doesn’t even mention home automation:
Two years later, Alexa has become a primary interface for other household gizmos, an area where other giants such as Apple are still finding their footing. Amazon has built the APIs that allow sellers of other consumer-electronics devices to hook their creations into Alexa, and sells many such products on the Alexa Smart Home section of its site, including light switches, light bulbs, thermostats, fans, security cameras, and other devices.
“The smart home has been a big surprise for us from a usage perspective,” says Reid. “You’re moving out of this very early-adopter perspective, with high-tech hackers setting it up. It’s becoming more mainstream and I think Alexa is actually allowing that to happen and simplifying it for customers.”
As third-party companies use Amazon’s tools to make their devices and services work with Alexa, they’re able to build on top of Amazon’s investment in machine learning and other technologies. But they’ve had to confront the question of how to make their offerings usable through voice alone, an interface-design exercise that they would have to go through sooner or later for other voice assistants as well.
“When you talk to developers, you will hear them say, ‘It was a needed challenge for me to think about how I could transition my app that used a GUI [graphic user interface] to a voice experience,'” Prasad says, “I don’t think we intended to challenge developers. But I think it was welcome, given that it was a big shift.”
From a monetization standpoint, Amazon’s long-term goal isn’t just to sell scads of Echoes, Dots, and Taps. As with other Amazon hardware, Alexa-powered gadgets are designed to make buying stuff from the company so easy that the more you use them, the better the company will do. “We build the devices in a way that will start that flywheel effect, from a business perspective,” Reid says, pointing to services such as Amazon Music and Audible that people are more likely to pay for if they use them on an Echo.
Then there’s another way Amazon can make money from Alexa: by turning it into a store. The easiest type of shopping to do through voice alone is simply to order more of something you’ve bought before from Amazon, so that’s where the Alexa team started. “It’s a seamless experience,” says Prasad. You know very quickly when it’s going to show up.”
It’s when someone wants to buy something new that things get tricky—especially since shoppers usually don’t begin their quest by naming a specific product, and probably don’t want to spend 20 minutes talking to Alexa to narrow down their options.
“I ordered a Roomba through voice,” Prasad says. “But you can’t order a blue shirt. A lot of it is discovery-based: You look for ‘blue shirt’ rather than ‘Calvin Klein blue shirt.’ You’re either brand conscious or price conscious and we need to balance all of these [factors].”
Even for traditional Amazon shopping on a device with a screen, Reid says, “we’re optimizing for the first few search results. You’ve got the concept of ‘above the fold’ on a laptop. In voice, you really just have one—maybe you can argue two—shots. You don’t want Alexa to read off three or four items.”
To do that winnowing for commoditized types of products such as AA batteries, Amazon now uses algorithms to boil down many similar offerings into one pick that it calls “Amazon’s Choice.” But there are many categories where it’s tough for the company to make a selection on behalf of a customer, especially for items that involve an aesthetic aspect, such as clothing.
“I’m not that picky,” says Prasad. “I could say ‘medium blue shirt,’ and I’ll be happy with that. Will everyone in the world be happy with that experience? I don’t think so. We’re going to experiment a lot. I’m not saying we won’t do it. We just have to do it right.”
Amazon is continuing to chip away at the challenge of making voice shopping make sense. In the long run, it sees it as something different than online shopping in its conventional form, and perhaps more personal. “In India, it used to be that you’d go to the same shop every time and see the same salesman who’d route you to what you needed,” Prasad says. “It’s that kind of relationship you need to build with a voice-based shopping assistant. You want that kind of trust. And it’s not just a technical problem.”
In the Star Trek universe, the computer had a human quality in part because it was voiced by a human: actress Majel Barrett, who eventually married Gene Roddenberry, the series’ creator. And well before it announced Alexa and Echo, Amazon was at work on making Alexa feel like a personality rather than an automaton. Here, for instance, is a comparison of what Alexa sounded like a year before it debuted—intelligible, but rather lifeless—and the considerably more peppy version that Amazon shipped:
If anything, the consumers who use Alexa have treated it even more like a person than Amazon expected, a fact that it can monitor by keeping track of what they ask for in aggregate. For instance, 13,000 more users have asked Alexa who it’s voting for than have asked who they should vote for, which the company sees as a sign that the service’s personality matters. (For the record, Alexa responds to “Who should I vote for?” with “You should vote for the candidate who best reflects your views and has the best policies” and to “Who are you voting for?” with “There are no voting booths in the cloud—believe me, I’ve looked…It’s all just ones and zeroes up here.”)
“Big data can get you great information, but then customers are also telling us what’s important to them,” Reid says, noting that a U.K. customer was nonplussed that Alexa can’t sing “Happy Birthday” there. (In the U.S., it can.)
Users might care about playful answers and songs, but the next frontier for services such as Alexa is interacting in ways that feel less like automated responses to commands and more like genuine human interaction. That’s a basic research problem rather than something any one company is likely to fully achieve on its own. So in September, Amazon announced the Alexa Prize, an annual competition, with up to $2.5 million in prizes, in which university students will try to create bots that can intelligently chat about topical matters for 20 minutes.
The scenarios that Amazon aims to help create with this contest, says Prasad, are “pushing the boundaries, not just in terms of personality but how you interact, in a setting where you can have more long-term discussions, rather than this transactional nature of interactions, where you say ‘play music,’ but you’re not actually engaging in a full conversation. That’s where we’re going next. “
That sort of ambition will be necessary if Alexa and the devices that run it are to remain trendsetters. Google’s Home speaker, powered by its Google Assistant service, is unmistakably an attempt to out-Echo the Echo, and is getting respectable reviews. Rumor also has it that Apple is working on an Echo-esque device, a move that would make Siri and Alexa into direct rivals. Microsoft, Samsung, and other companies with the wherewithal to compete in the voice-assistant game are presumably also hatching plans to take on Alexa and the Echo.
The other companies whose products fall into the same ballpark as Alexa and the Echo “are building some interesting things,” Reid says. “We want to understand where our gaps are relative to common experiences. What are they doing, and doing better than we are?”
Still, like virtually any other big-company executive you might ask about the competitive landscape, Reid and Prasad emphasize that they don’t obsess over what others are doing. “We have our own roadmap,” says Prasad. “We have a bunch of features we want to get done for our customers. That’s what we worry about mostly, daily.”
When Alexa and the Echo were still just ambitious ideas rather than shipping projects, he adds, “my own team felt like we had a nine out of 10 chance of making this work. I felt that if we were able to solve the technical problems, we’d have a new category that we’d created.”
Prasad was right—and the fact that the category Amazon created has become one of the hottest ones in tech is one reason why the next two years for these products promise to be even more eventful than their first two.