A fast-growing voice AI startup based in San Francisco is building enterprise-ready AI phone agents that handle real conversations with human-like quality and intelligence. Backed by $65M from leading investors including Emergence Capital, Scale Venture Partners, Y Combinator, and the founders of Twilio, Affirm, and ElevenLabs, the company is redefining how businesses communicate with customers—at scale.
About the Role
As a Senior Machine Learning Engineer, you’ll take ownership of the intelligence layer powering the platform. From model optimization to infrastructure design, your work will determine whether these AI agents sound convincingly human or fall flat. This is a rare opportunity to build and scale ML systems that operate in real-time, directly impact revenue, and are used by enterprise customers every day.
What You’ll Do
-
Own the ML stack: Lead efforts across speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS) systems—from research through deployment
-
Build inference systems: Design high-throughput, low-latency infrastructure for serving millions of voice interactions per day
-
Push model performance: Implement cutting-edge techniques to improve conversational quality, optimize RAG pipelines, and reduce response times
-
Optimize for scale: Tackle challenges like model quantization, GPU utilization, and cost-effective inference at enterprise scale
-
Collaborate cross-functionally: Partner with Deployment Engineers to deliver ML solutions that solve real customer problems in production
-
Innovate at the edge: Explore new frontiers in real-time speech, conversational AI, and multi-modal understanding
What Makes You a Great Fit
- 3+ years of experience in machine learning, including at least 1 year focused on speech or conversational AI
- Strong understanding of TTS, STT, or related systems—especially in production settings
- Experience building and scaling ML infrastructure from scratch, with clear distinction between research and production environments
- Comfort across the ML pipeline—from data and training to inference and monitoring
- Startup experience—comfortable working in ambiguity, owning outcomes, and iterating fast
Bonus Points For
- Hands-on work with real-time speech processing, telephony, or streaming systems
- Background in large-scale distributed training and inference
- Experience with chatbots, voice assistants, or conversational AI
- PhD or research background in AI/ML
How You Show Up
-
Owner mentality: You take end-to-end responsibility for the performance of your systems
-
Craft-obsessed: You care that the AI sounds real, not robotic
-
Data-driven: You run experiments, measure impact, and make decisions based on results
-
Collaborative: You work closely with engineering, deployment, and customer teams
-
Relentless: You dig into complex technical problems and don’t stop until they’re solved
Compensation & Benefits
- Salary: $140,000–$250,000
- Meaningful equity in a fast-growing company
- Full medical, dental, and vision coverage
- All tools and resources provided
- Beautiful Jackson Square SF office with rooftop views
Note: ML experience at a U.S.-based company is required for this role.
If you’re ready to own the intelligence layer of next-gen AI voice agents and help push the boundaries of real-time conversational AI, this role is a chance to do career-defining work.