AI Voice: Probing Conversations and the Next Trillion-Token Dataset
Voice offers a low-friction interface to train agents through probing dialogue, enabling know-how to be codified.
We are at a pivotal moment with AI Voice—just listen to the demo of Eleven Labs v3, or Sesame. The underlying technology has progressed from relatively good a year ago to very good six months ago, and is now indistinguishable from reality. Importantly, it is increasingly accessible. The barriers to building and deploying voice agents have never been lower.
AI Voice will, of course, automate customer support and call centres, but it will also unlock entirely new markets where:
The context required to train agents is not codified and needs to be discovered through dialogue (Psychotherapy intake; sales prospecting; complex onboarding)
Voice is a more natural interface to interact with LLMs (digital laggard demographics; deskless workers; services where emotion matters, ie, coaching and sales)
It was previously cost-prohibitive to scale voice-based offerings (always-on patient check-ins; at-scale debt servicing)
The opportunities unlocked by these advancements are vast, spanning sectors and use cases–it is an exciting time to build in this space. Below, I explore one of these areas, specifically how AI Voice provides a low-friction interface for training agents through probing dialogue, facilitating the codification of knowledge. We are particularly excited by this, as it implies there are novel datasets up for grabs, and untapped markets to be served. If you are a founder building or operating in the space, please reach out (pascual@northzone.com).
AI Voice agents will create new datasets and enrich existing ones. Probing conversations can surface know-how that remains inaccessible today.
Advances in reasoning and performance of LLMs mean agents increasingly offer more value on day 1. However, just like a great new hire, it takes time and context to ramp up. For workers, reaching productive capacity involves acquiring knowledge that is embedded throughout the organisation. That information enables employees to execute tasks proficiently and, over time, become proactive contributors. It is no different with AI agents, especially as we entrust more complex tasks.
The problem: not all data lives online. This is becoming increasingly evident as agentic solutions penetrate beyond tech-forward businesses with abundant digital context and move into less digitised sectors of the economy, as well as into use cases where the information needed to train agents is not readily available online, such as therapy.
To effectively build products for these users and use cases, novel ways of accessing know-how accrued from lived experience are needed. This is where AI Voice agents can be particularly helpful.
Whether you are the owner of a small roofing business and are teaching a new employee the ins and outs of customer success, or you are a therapist forming an understanding of a patient’s life context as you make a treatment plan—the data needed to complete these tasks to a high and personalised standard, is not codified in a PDF, nor is it available online. In the case of the roofing company, that know-how is held by the owner and stems from their 30 years of experience in the space. For the therapist, the information is retrieved through deliberate dialogue that progressively adapts as more information is collected.
AI Voice agents can facilitate the training of agentic products through proactive and probing dialogue (initiated by the agent). Voice, as a low-friction interface, makes it easier for knowledge to be codified than a text-based conversation. Companies that leverage dialogue for data discovery and onboarding can create novel datasets and highly personalised agentic solutions for these types of customers and use cases.
Voice, however, is only part of the answer. Leveraging this data effectively hinges on building valuable use cases, asking the right questions, building trust, recognising when context is missing, and iterating through dialogue until sufficient data is collected. There are also behavioural and regulatory obstacles to navigate. For example, a company building voice-first patient monitoring solutions in healthcare mentioned they saw patient engagement drop when deploying the latest, hyperrealistic voice models, so they reverted to less human-like voices (the uncanny valley?). But solving those challenges is worth the effort.
Nine out of ten firms globally have fewer than 250 employees, and most still rely on clipboards, phone calls, and institutional memory. Voice is also the ideal medium to serve many of these users. Hands-busy work (tradespeople; field workers), low digital literacy (seniors; children) and services where emotion matters (coaching; sales prospecting)—across all, tapping a screen introduces friction, whereas speaking feels natural.
Non-technical businesses will undoubtedly benefit from AI Voice solutions, but the opportunity is far larger. Take education. If you’ve read The Diamond Age, it's hard not to imagine that we may soon have an always-on voice-based tutor that can continuously evaluate and teach through dialogue, adapting to children’s individual learning needs, lowering barriers to a world-class education (and hopefully reducing screen time). While it may sound dystopian at first, if we consider 70% of 10-year-olds are unable to understand a simple written text in low and middle-income countries, and that there is a worsening global shortage of teachers as attrition rates grow, the prospect of AI Voice and education sounds pretty promising to me.
Similarly, although there is evidence of sycophantic behaviour and harmful outcomes from LLM-based therapy products, we are still in the early days. Like with self-driving cars, I am optimistic that we will reach a point where sufficient guardrails are in place, such that the net effect will be positive. That opens the door to rethinking how therapy is provided. Therapists may be able to amplify their reach through AI-enabled content between sessions, or they may even license their unique approach through an AI Voice therapy agent personalised to their style, offered at a lower rate, increasing access. Looking further, therapy is likely to change considerably as the limitations of time, method, and interface evolve, giving rise to always-on support, delivered not just through AI voice but perhaps also through micro-actions, booking time directly on our calendar to meditate, or nudging us to make healthier choices.
Be it through agentic solutions for non-technical SMBs or to build products that require a deeper understanding of our needs and desires—dialogue and proactive inquiry can surface valuable information, and voice can capture far more information than text alone can. Hesitation, excitement, confidence—paralinguistic signals reduce intent-recognition errors by enriching data with more context. With foundational models having exhausted publicly available data, and lacking much know-how or tacit knowledge in the training data of existing models, data unlocked through AI voice can be transformative for both foundational and application-layer solutions.
Voice unlocks the next trillion‑token dataset and the next billion users. If you are tackling either, we want to hear from you: pascual@northzone.com.