The rise, and fall, and rise of conversational AI
The long-awaited golden age of the chatbot is here.
Am I the only one old enough to remember palling it up with SmarterChild on AIM?
For the uninitiated, SmarterChild was an artifact from the Microsoft Clippy era—a friendly, if dull, AIM chatbot who peaked circa 2004, with whom I had many a middle school conversation. The discourse was one-sided, sure. He could give you some basic information on the weather, but didn’t give much back on average. For certain queries, though, SmarterChild would retort with a whimsical response that made it seem like you were talking to an intelligent Internet friend.
My old friend SmarterChild has come to mind lately as I’ve played with its much more sophisticated progeny, ChatGPT.
If you’ve breathed on the Internet in the past month, you’ve seen the myriad screenshots, experiments, and thinkpieces, so I don’t need to tell you—ChatGPT pretty much broke the Internet in 2022. The tech is incredible. But part of its popularity is also almost certainly its conversational form factor, which has captured the public imagination. You can ask ChatGPT virtually anything, get answers immediately, and have back-and-forth discourse. This is an intelligent Internet friend. (Plausible-sounding misinformation aside. We’ll come back to that.) With over a million users in five days, ChatGPT may be what dialogue systems have been striving towards for decades, with ups and downs along the way.
Welcome to a new era of conversational AI. We’re about to see an explosion of this technology in enterprise and consumer contexts. SmarterChild, you’re up: now may finally be the chatbot’s moment.
Consider the chatbot: A brief history
The core of human experience is interaction—so how can a machine possibly be intelligent if you can’t talk to it? Conversational experiences have long been central to the vision of machine intelligence.
The Botfathers
Alan Turing’s famous Turing test, proposed in 1950, suggested using a computer’s fluency in answering natural language questions as a proxy for intelligence. This may well have been the original germ of the idea of the conversational agent.
There were many experiments in the intervening decades. Most famous was ELIZA, a 1964 project from MIT professor Joseph Weizenbaum, which used keywords in user inputs to determine responses and often responded with vague questions of its own. ELIZA paved the way for many other conversational experiments, but couldn’t derive context in conversation and had limited general knowledge.
The predecessor of the term “chatbot” was first used in the 1990s with Chatterbot, a player in the multi-player virtual world TINYMUD that conversed with other players. 1995 brought ALICE, the first online chatbot (supposedly Spike Jonze’s inspiration for the film Her!). And in 2001, our buddy SmarterChild came on the scene. Available on AOL’s AIM and Microsoft’s MSN, SmarterChild took a step forward by connecting users to external information sources like weather, stock prices, and news. Plus, people liked it! At its high point, SmarterChild reportedly had 30M users in its AIM Buddy List.
The early deep learning era
If you’re not old enough to have trolled SmarterChild, you are likely old enough to remember a later generation of chatbots, born in the deep learning boom of the mid-to-late 2010s.
For instance, let’s pour one out for Tay, ill-fatedly framed as “the AI with zero chill,” whose brief, hateful time on earth left Microsoft with long-lasting reputational scars. Chatbots in this era were at worst, well, Tay, and at best voice-powered conversational assistants like the Google Assistant, Apple’s Siri, Microsoft Cortana, or Amazon’s Alexa. These digital assistants felt like evolved Smarter Child(ren), with mildly amusing “personality quirks” and enhanced, personalized integration with external applications. The technology was good, but not mind-blowing. Experiences could be brittle: outside of a range of pretty narrowly defined queries, assistants were stumped. They also largely couldn’t take action for you outside of their proprietary software ecosystems.
Chatbots were much hyped during this time period, and they didn’t quite live up. in 2018, Gartner predicted that 15% of all customer service interactions would be handled entirely by AI by 2021. But by 2022, despite broader B2C adoption, customers didn’t really like interacting with chatbots. In a 2022 Verint study, over 30% of respondents shared that chatbots rarely answered their questions, and the same proportion abandoned their efforts after engaging with a chatbot.
Liftoff: Transformers enter the chat
Conversational AI got a lot better with the advent of the Transformer in 2017. Transformers revolutionized the field of language by processing language in context, rather than sequentially as in previous approaches like RNNs. Their approach of “parallelization” was also more computationally efficient, making it easier to scale up model parameter and training tokens. Not long after, Transformer-based models like Google’s BERT and OpenAI’s GPT model series were released with blistering performance on many NLP tasks. Unlike BERT, the GPT model series was focused on generative language tasks.
Using the Transformer architecture, Google led several groundbreaking experiments in conversational AI, developing models that excelled at conversing about a wide range of topics. In 2020, the company published research on Meena, an open-domain chatbot trained on public dialogue data that began to get closer to approximating human-level conversation. Google built on this research with LaMBDA in 2021, which improved on Meena’s results with enhanced pre-training and fine-tuning. LaMBDA was so good at simulating natural language conversation that one engineer claimed it was sentient.
And then, of course, came ChatGPT. ChatGPT was fine-tuned using human feedback on a model from the Transformer-based GPT 3.5 series. OpenAI refers to ChatGPT as a “sibling model” to InstructGPT and uses similar human-in-the-loop fine-tuning methodologies, but by training with human dialogue datasets, as well as with InstructGPT’s dataset transferred into a dialogue format, the result is much more conversational. The model exhibits a degree of memory, enabling users to have a back-and-forth conversation.
The results are pretty incredible. Within days, a million users had tried ChatGPT, and pieces with headlines like “ChatGPT Will Kill High-School English” started to emerge. A big piece of the buzz was also its easy-to-access interface, and, of course, that it was free (the compute costs are supposedly “eye-watering").
Where will conversational AI show up next?
With ChatGPT, the conversational interface has come roaring back into the public eye, and not just through OpenAI. New applications are emerging in search, creative tools, customer service, and even social media. Expect to see more interfaces that offer the ability to engage via natural language, rather than solely clicks, swiping, and scrolling.
Search
The one that everyone’s talking about.
Many have warned that conversational interfaces powered by large language models (LLMs) represent an existential threat to Google Search, as more users turn to ChatGPT for question answering. Google is reportedly in “code red” in advance of Google I/O in May in response to the ChatGPT threat.
Based on my own experience, code red is no joke at Google, and I/O is a forcing function for all kinds of product announcements across the company. Expect some updates come May. My guess is that Sundar’s I/O keynote will unveil new research or product experiences related to conversational AI. It’s also worth noting that Google/Alphabet has quite a bit of research muscle here, having developed, of course, the Transformer, but also dialogue models like LaMDA in addition to PaLM, Flan-T5, Gopher, BERT, and other LLMs. BERT is reportedly already integrated into today’s Search.
Other incumbents are also taking note. Quora is beta-testing a product called Poe, a conversational question-answering platform. It’s not clear what underlying models they’re using, but nonetheless, Poe is an interesting acknowledgement of both performance advances and burgeoning consumer interest in conversational interfaces.
On the startup side, there are plenty of new players working on reinventing consumer search with LLMs and conversational interfaces, including You.com, Perplexity, Metaphor, and Character.ai. Enterprise search is also full of activity, with startups working on both verticalized natural language solutions (Hebbia) and more horizontal platforms (Glean).
Brand interaction
The one whose best days have always been just around the corner.
As described previously, chatbots have long been heralded as the cheaper, scalable solution to enterprise customer service, providing value by routing or even resolving customer queries. Enterprises might also proactively reach out to customers with chatbots, e.g. for marketing campaigns or e-commerce order updates.
Customer service is high-stakes. According to Bain, enterprises that perform well in customer experience can grow revenues 4-8%x above their market. The costs can also be high. Beyond the costliness of customer support to the enterprise, a Zendesk report found that 80% of customers would rather go to a competitor after a bad customer experience.
There are several well-funded startups creating chatbots for brand interaction, including Ada and Forethought. Ada recently announced a partnership with OpenAI to both automate customer interactions and provide customer insights to live agents (for instance, by summarizing a customer conversation before a handoff occurs).
Consumers still don’t much like engaging with chatbots, but the opportunity to bring ChatGPT-style experiences to brand engagement is meaningful.
Creativity
The latest pile-on in generative AI world.
One popular use case for ChatGPT has been creative generation. Twitter is full of people using the tool to generate stories or write poems or lyrics.
In a previous post, I discussed the brave new world of generative AI and the swarm of startups tackling AI-powered media generation across text, imagery, video, and 3D assets. The furthest along of these new entrants are copywriting startups like Jasper, Copy.ai, and Anyword, which are focused on generating sales and marketing copy. There are signs this crop may be soon to adopt conversational interfaces. Jasper recently announced JasperChat, a ChatGPT-like experience where users can have an iterative, back-and-forth dialogue with Jasper. Expect more generative writing tools doing the same. The big question will be differentiation from ChatGPT itself.
Agents & social experiences
The one that may be most intriguing.
With conversational natural language experiences at their core, how will single-player consumer experiences evolve? What about multiplayer social experiences?
Natural language interfaces may be the way of engaging with AI agents that take actions for users. Adept’s ACT-1 provides a preview of what this might look like. Users can share natural language instructions with ACT-1—either in text or voice—and models can take action across software tools, including developing spreadsheet formulas, logging information in CRMs, and more. Interfaces like this supercharge the proposition of assistance made by the Google Assistant, Siri, and others.
Social experiences also represent a fascinating new frontier. Remember SmarterChild’s 30M internet friends? If history is any indication, people are entertained by engaging with conversational AI, even in a single-player setting. Chatbots with personality fare even better here: the SmarterChild creator ascribes his bot’s personality as key to its popularity. Are we headed for a future filled with colorful, personalized internet friends?
One indication comes from Circle Labs, a startup building a platform for anyone to develop their own chatbots and embed them into existing social platforms. These representations, which they call “Shapes,” are created through natural language prompts from users, and could be anything: a favorite video game character, a food item, or just an imaginary friend. Circle Labs has started with Discord, but Shapes’ personalities persist across social platforms and in the future could be deployed across Twitch, Twitter, and more. It’s an intriguing look into how conversational interfaces could change digital communication, or even relationships, in the future.
What to watch
Welcome to the golden age of the bot! It’s going to be a fun and wild ride. A few of the evolving trends and challenges to watch as this space unfolds:
Multimodality: More conversational models will incorporate both text and multimedia imagery in conversational interfaces, to, say, ask questions about or manipulate an image or video. DeepMind’s Flamingo model’s multimodal dialogue capabilities are an indication of what this might look like; Twelve Labs also provides an example in an applied setting.
Trustworthiness. ChatGPT’s Achilles heel is that it often relays convincing, but false, information. If anyone’s going to unseat Google, the results need to be auditable and trustworthy.
Efficacy: Prior generations of chatbots have been dismissed by consumers because they worked poorly. ChatGPT is amazingly performant, but what happens when conversational experiences are embedded into existing workflows or product experiences? Trustworthiness for conversational AI will also depend on efficacy.
Content moderation: Remember Tay? This has long been a challenge for chatbots in the deep learning era. ChatGPT uses a moderation API to filter out harmful, toxic, or violent content, but some have claimed there are still embedded stereotypes in the technology. On the other hand, some have railed against content restrictions, suggesting that it is slowing down the pace of innovation. Expect lots of heated debates here.
Anthropomorphism: From ELIZA to Samantha from Her, we’ve long ascribed human characteristics to the chatbots we create. There are already discussions of ChatGPT’s personality versus Google’s LaMBDA. This is human impulse, and I’m sure it’ll continue.
Interestingly, real-life or fictionalized digital assistants have often been explicitly female-identifying or have been ascribed traditionally female characteristics by default, like gendered names or female-sounding voices. For instance, Alexa, Siri, Cortana, and the Google Assistant all launched with female-sounding voices, and all but Google used gendered names. And then, of course, there’s ELIZA, Samantha, etc.
There’s a whole lot here about how we conceptualize and socialize the role of women in our society. It will be interesting to see how the next generation of conversational AI will manage the concept of gender, especially given Gen Z’s changing conceptions. Circle Labs “Shapes” designation provides one clue.