3 Things ChatGPT Needs Before it Can Be Deployed in Customer Service

By now the buzz has likely caught up with you: “Generative AI” has taken the world by storm.

Generative AI, a relatively new subfield field of AI, consists of models that generate impressive images, text, and even videos, usually in response to a text prompt.

In particular, ChatGPT is a remarkable new model that was released earlier this month by OpenAI and has generated (no pun intended) massive attention and excitement.

I’ve received lots of questions over the last few weeks about what GPT means for AI, Forethought, and the future of customer experience.

Most importantly, can ChatGPT just automate your entire customer service for you?

TLDR: While a model like ChatGPT may one day power the interface for many customer support responses, it must be supplemented with other key models and AI techniques before it can be deployed in the wild and start automatically answering customers.

I believe the future of customer support automation will be a comprehensive AI platform with a conversational interface powered by a large language model similar to ChatGPT, but trained in a different way—on company-specific data, with significant guardrails, and with many other AI models and techniques applied around it.

To understand why, it’s useful to explain at a high level how ChatGPT was trained and what it actually does (feel free to skip ahead if you’re already familiar with this information).

How ChatGPT was Trained

ChatGPT is the model that has garnered the most attention from the general public to date. However, to the AI community, ChatGPT is only one large language model (LLM) in a series of large language models released by OpenAI and others. In fact, ChatGPT is a very close sibling model to another OpenAI model called InstructGPT which has been available for over a year. The main difference with ChatGPT? It’s been repackaged to generate text in response to prompts communicated in the context of a conversation.

In short, ChatGPT’s popularity exploded when previous impressive models from OpenAI did not primarily because of its intuitive interface.

ChatGPT and InstructGPT are built on top of a model OpenAI has dubbed GPT3.5, which is a large language model consisting of a stack of transformer decoders that have been trained to generate text one word at a time, or more precisely, one part of a word (called a token) at a time. These models are very large: they consist of billions and sometimes hundreds of billions of parameters. Models in the GPT3.5 series have been trained on a large corpus of text and code, from which the models derive their initial understanding of the world (this is the reason why you may have seen many examples of ChatGPT generating code).

A GPT3.5 model is taken and further trained using new data and a specific process called “Reinforcement Learning from Human Feedback”, to produce a model like ChatGPT.

This process (overly simplified) consists of 3 steps:

Human labelers write conversations between a person and a hypothetical ideal chatbot. A GPT3.5 model is trained on these conversations to generate good chatbot responses.

The resulting bot (and potentially other models) are prompted and some of the responses are sampled (multiple sampled responses per prompt). Then the labelers rank the appropriateness of each response to the prompt. These rankings are used to train (in a supervised fashion) a model to score the appropriateness of a response to a prompt.

The two models from steps 1 and 2 are used in a reinforcement learning setting to generate responses, reward the generated responses based on model 2, and keep improving the response generation by repeating the process.

Illustration taken from OpenAI’s website

With this information in mind, let’s dive into what it would take to successfully repurpose it for customer support automation.

What ChatGPT Would Need to Be Ready for Customer Support Automation

ChatGPT is an incredible tool. But on its own, it’s not ready for widespread implementation in customer support automation. Below are 3 requirements needed to take it to the next level.

1. Factually Accurate Responses

First and foremost, the system must produce factually accurate responses. This is crucial. Unfortunately, while ChatGPT almost always generates a coherent and plausible response, one of its major limitations is that the information may be incorrect or outdated.

Sam Altman, the CEO of OpenAI himself, has cautioned against the use of ChatGPT for scenarios where “robustness and truthfulness” are important, and customer support certainly falls under that category.

Websites like StackOverflow have banned answers provided by ChatGPT citing that “the average rate of getting correct answers from ChatGPT is too low.”

Let’s dive into why.

One of the fundamental ways that a model like ChatGPT can break is by not being aware of new context or new information.

At the time of the conversation above (December 23rd, 2022), Qatar had already finished hosting the Fifa World Cup 2022, which Argentina won. This wrong (outdated) answer is provided because the GPT3.5 model that ChatGPT was based on was trained prior to 2021 (on older data).

It’s clear that a model like ChatGPT must be regularly retrained and retaught what the correct new information is. In order to do that effectively, new techniques or a different approach altogether are required.

The other fundamental issue related to accuracy is that ChatGPT generates responses based on the conversations and text corpora it’s been trained on, but these will not contain answers to the actual domain and company-specific questions that your customers are sending your way.

Thoughts from Forethought’s Primary AI Advisor, Professor Chris Manning

In a recently recorded conversation I had with Professor Chris Manning (one of the leading NLP experts in the world and our primary AI advisor at Forethought), he shares his opinion on whether LLMs will ever be powerful enough to overcome a lack of domain-specific knowledge for certain applications (like customer support).

“We’ve seen a succession of enormous large language models […] the idea of these foundation models is they give a big base of world understanding that could then be useful for all kinds of different problems. And so to some extent, as the breadth and functionality of those base or foundation models grows, it seems like there’s less need for the sort of domain particular data. And to some extent, I think we will see some of that in the future.

But on the other hand, I think there’s huge limits as to how far that will go because these models are essentially being built from the kind of material you can grab off the internet […]. There’s a lot that you can see there, but it doesn’t actually give you a depth into how most industries and companies work. […] I spend plenty of time hanging around on the internet as a lot of people do these days, but it’s not that that’s really taught me how customer service agents at an insurance company deal with questions, right? There’s just so much information and knowledge that’s particular to these very many different industries and companies, and I think that just isn’t going to go away.”

Here’s an example of ChatGPT responding to a request for a car insurance policy extension.

At best, ChatGPT may actually be smart enough (as it is in the example above) to understand that it can’t guarantee the accuracy of a given answer, which is actually a very difficult problem in and of itself when it comes to text generation. In this case, you at least won’t be misleading your customer, but actually helping them is another thing altogether.

Fortunately, training generative models on industry and company-specific, accurate, and up-to-date information can bridge this gap and unlock the enormous value of the knowledge that’s been accumulated by a support organization over the course of years. This is easier said than done and requires peripheral models to achieve, but you can see how incorporating high quality historical data can provide the missing ingredient.

2. Observability & Control

The second property of a great AI solution for customer experience automation: a company must have observability around what the system is doing, and some control over the experience it’s delivering.

If every incoming customer query is met with a potentially different free-form text response by a large language model, how can a company understand at an aggregate level what its customers are reaching out about in the first place? That feedback loop is essential for product teams to understand the customer pain points and prioritize features.

Similarly, how can a support team improve the AI’s responses when it can’t pinpoint the areas for improvement, much less affect a model that consists of a black box of billions of parameters trained on large datasets?

Another very important consequence of the fact that these models are black boxes is that it’s not obvious how a business can go about pairing a response from the bot with an actual automated action.

For example, consider the following hypothetical bot response: “I’m so sorry to hear about your poor experience with feature X. Since you are a valued customer, I’ve gone ahead and processed a refund to your account. The funds should reach your bank account in the next 3-5 business days.”

While this is a seemingly good, even empathetic response, it’s not useful in and of itself unless accompanied by an actual transaction, which the business would look to configure themselves with a high degree of confidence as part of some tangible, editable workflow.

One way to bridge these gaps is to divide the space of support inquiries into a finite set of customer intents, and to create automation workflows (which may involve actions or just informational responses) for each. Then, a model like ChatGPT can be applied to the responses in these workflows in order to produce a personalized but still accurate response. The outcomes of these conversations can now also be aggregated at the intent level, and support managers can control the substance of the action or information they’d like to respond with.

3. Minimal Effort & Low Time to Value

The third property of a great AI solution for customer experience automation: a company should be able to implement and maintain it with minimal effort and low time to value.

A great AI solution would perform the steps above automatically, instead of requiring the support organization to do the heavy lifting. This means the AI platform would:

Train models on the history of support conversations
Automatically split those conversations up into intents
Automatically generate the workflows required to resolve each intent
Finally, use a ChatGPT-like model to accurately and delightfully (and yes automatically) respond to the customer.
Collect the feedback from customers as well as updates to policies or answers, and improve on an ongoing basis without requiring manual intervention.

ChatGPT & CX: The Takeaway

Am I implying that ChatGPT is not great or not useful in the customer experience context?

Absolutely not. A generative model like ChatGPT can make automated answers more personalized, more human-like, and more delightful to customers.

Furthermore, ChatGPT has excited millions of people around the world about AI and as such represents a great milestone for the field of NLP. More generally, OpenAI is a remarkable organization that keeps pushing the boundaries of what foundational models can achieve and pioneering innovative AI techniques in the process. All of us in the NLP community owe OpenAI and its researchers our gratitude for how far they have advanced the field.

However, for the reasons mentioned above, while a model like ChatGPT has an exciting part to play in the comprehensive customer service automation platform, it cannot be the platform.

Forethought: The Leading Platform in Customer Experience AI

While ChatGPT was not built to help you automate customer experiences, Forethought is! This is what we do. It is the only thing we do. We are laser-focused on building the world’s leading AI platform for customer experience, and have been for the last 5+ years.

In addition to leveraging models like GPT3.5, we build many of our own generative large language models, and we’re always innovating, plugged into the latest advances in the field, and looking for ways to improve our product offering.

Today, we offer what I (and many of our amazing customers) truly believe is the best AI platform for customer experience that satisfies all 3 properties described above.

We value accuracy of information above all else. Our models are trained on an organization’s historically accumulated customer interactions, and are updated on an ongoing basis.
We give you insights into what your customers are reaching out about and how we’re responding, and enable you to confidently improve our performance (including through automating actions and transactions) as you see fit.
You can implement our platform yourself end-to-end in days, not weeks or months. The burden of uncovering customer intents and the appropriate workflows to execute is on us, not you.

No matter your industry, from e-commerce to SaaS to FinTech and more, conversational AI from Forethought can elevate your customers’ experiences, improving every interaction throughout their journey.

But don’t just take our word for it—leading companies like Upwork, Lime, and Instacart have experienced incredible results using Forethought, including a 77% average reduction in response time and a 47% average ticket deflection rate.

All of these figures add up to one thing: making your CX team more efficient while providing top-tier customer service. Let us show you how you can too in a quick demo with our team. Contact us to schedule yours today.

‍