Live Roleplays powered by OpenAI Realtime API
October 1, 2024
We’re excited to announce that Speak has been working with OpenAI for the past few months to test their new Realtime API to build next-generation language learning experiences. Language learning and live conversational practice is such a natural fit and amazing use case for speech-to-speech that we jumped at the opportunity to collaborate with OpenAI and deeply embed this technology into our core experience.
Our entire team has been blown away by just how much more immersive Speak’s conversational learning method now feels. We’re incredibly excited about the new experience we’ve built, the underlying technology, and what it means for the future of language learning.
Speaking out loud with Speak
The vast majority of language learners fail to achieve fluency because they try everything but speaking out loud, and they have extremely limited access to speaking practice and conversational partners. We started Speak because we saw a future where we could use AI to empower anyone on the planet to access the best speaking tutor and conversational partner in the world.
Nearly two years ago, we released the world’s first AI-powered roleplay experience for conversational practice. It set the standard across the language learning industry, became one of our most popular features, and was Speak’s first step to go from a supplemental speaking practice tool to becoming a true tutoring experience.
However, there were still many limitations; the conversations felt slow and unnatural because of the clunkiness of transcribing the learner’s speech, running it through a text-based LLM workflow, and then synthesizing the AI character’s speech on the other end. Each of these steps introduced significant lag and error. GPT-4o’s direct speech-to-speech capabilities with Realtime API fundamentally solves these problems!
Live Roleplays with Speak
Today, we’re announcing Live Roleplays, a new Speak experience that combines Realtime API with Speak’s learning engine to enable immersive, life-like speaking practice in a variety of roleplay scenarios.
With the Realtime API on GPT-4o, our AI tutor now responds as fast or faster than a human partner would, and can understand and provide feedback on aspects of speech beyond the pure text transcript, like tone, pronunciation, prosody, and more.
And while a huge part of this unlock is the new speech-to-speech model, just as important is how we productize the model and leverage our existing learning engine to build something truly specialized to language learning by combining the best technology, product design, and pedagogy. With Live Roleplays, here are some key aspects of how this manifests beyond a general AI assistant voice mode:
- As a user progresses through a conversation, we incorporate our proficiency graph system that tracks the exact state of their language knowledge to ensure the dialogue is at the right level and uses the most appropriate sentence patterns and vocabulary.
- We give the user specific goals/objectives to try and achieve in the roleplay to give them direction and nudge them to practice the most effectively.
- And when a user is stuck, we support them with hints that give just the right amount of help.
All of this is powered by our proprietary learning engine and updates dynamically alongside the live conversation, making the roleplay conversation feel far more immersive, natural, and effective for improving your fluency.
There are still limitations - these new speech-to-speech models aren’t as good as text models on instruction following, and they’re not great yet at more nuanced language learning specific tasks like pronunciation coaching and feedback. We expect all of this to improve dramatically in the near future and we’re excited to continue building with OpenAI to help us get there.
Live Roleplays is rolling out to a limited number of users in the next few weeks with full rollout expected later this year. In the meantime, we’re leveraging OpenAI’s Realtime API across our entire learning experience and will be shipping many other updated experiences very soon.