Ponder: Voice AI Infrastructure for the Web

Voice AI Infrastructure for the Web

Ponder lets developers build ultra realistic Voice AI agents for web applications in just a few lines of code. We abstract away the messy pipelines of async websockets, voice activity detection, audio streaming and interruption handling, integrating with language models, handling function calling, keeping the voice realistic and contextual to make sure that you get consistent 300-500ms latency that scales across thousands of users. Ponder lets you easily bring your own LLM or pick from one of the most popular providers. This way developers can focus on building great voice experiences and not maintain complex infrastructure. Ponder provides abilities to take actions on both front end and backend which enables developers to build agents that can navigate, click, scroll and also access databases and APIs on the backend. We also offer a plug and play component that renders in the bottom right corner of the website letting you get started with Voice AI in just one line of code.

Ponder: Voice AI Infrastructure for Web Applications

See original launch post

tl;dr

Ponder lets developers add an ultra-realistic Voice AI agent to the bottom right corner of their websites in one line of code.

It can interact with your UI, call functions, access APIs, navigate, basically anything that can be wrapped in a JS function.

Users can talk to it like with a human, for use cases such as faster onboarding, customer support, bookings, data entry, or having a hands-free UX.

You can also build your own UI for the voice agent using our React SDK

The Problem

Voice AI models are having a moment currently, and the ability to simply talk like with a friend has opened up a higher bandwidth way of communicating with software, especially with the rise of prompt-based and conversational workflows, where it fits perfectly.

But if you have to add a voice agent to your website, you have to build your own stack dealing with the async hell of parallel websockets + streaming model calls, voice activity detection, background noise suppression, speaker diarization, TTS, STT, integrating with language models, streaming function calls AND somehow keep the latency to ~400-700 ms

Similar services have made notable advances, but their stack still remains heavily optimized for telephony.

Web app-specific cases, such as handling UI state updates asynchronously, waiting for user action, controlling responses based on function calls, interruption handling (user takes action midway when AI is talking) - while maintaining a fluid & human-like conversational flow remains a challenge

Solution: Meet Ponder

Using Ponder’s React SDK - you can add a voice agent to the bottom right corner of the screen. All you have to do is wrap your _app.js component in PonderProvider

Just with this, you will have the Ponder widget render on your website that users can talk to.

After that, you dynamically control the context and actions the agent can take on each page by using the setActions and setInstructions hooks, anywhere in your app.

Actions can have JavaScript functions that are already part of your code base. Simply pass them to setActions along with a description for the agent. (checkout docs for more details)

You can configure the agent on Ponder’s dashboard, choose a Voice, LLM, and the system prompt. If you want to attach external docs to the context, simply add curly braces variables in the system prompt, and for each variable, pick a data source (currently supports Confluence and Google Docs)

Ponder supports both Voice and Text modes. The messages get populated in the Ponder widget as the conversation unfolds:

Who needs Ponder?

Companies that offer prompt-based conversational apps - e.g., Creativity tools, text-to-anything, agentic workflows - where the user needs to craft a prompt. Making it conversational lets the user brainstorm the prompt with the agent - “I want it to look more like a…..“
Companies that have a data entry part to their apps - For example we are working with an inspections software provider to make construction inspection officials enter large amounts of data into the software simply by speaking- making it much faster - “The inspection ID is 1234 and the beam is 30 by 30 by 52 inches“
Companies that are struggling with onboarding - especially for applications whose users are not the most tech savvy, and despite your best efforts at building an intuitive UI, struggle to operate it. With Ponder, users can simply talk with your application.
In-App assistance, customer support, scheduling, and other usual suspects

Our Ask 🙏:

If this sounds interesting, reach out to me at sarang@useponder.ai or try out Ponder today at useponder.ai 🚀

Ponder

Voice AI Infrastructure for the Web

Sarang Zambare, Founder

SuperCraft: The new way to design physical products

Cerelyze: Tool for engineers to implement research papers 100x faster