Creating an AI-powered live translation solution for multilingual virtual meetings.
Product designer / 2024-2025
SpeechLab , an AI startup backed by Andrew Ng’s AI Fund, focuses on AI-powered speech-to-speech products.
Our first product, AI dubbing, leverages machine learning, text-to-speech, and natural language processing to help users to localize video and audio content efficiently.
Our AI dubbing ↗ enables users to generate lifelike multilingual voiceovers easily.
As a startup, we’re evolving from a single-product focus to a diverse multi-product portfolio. To address more real-time user scenarios, our next step is to develop a low-latency speech-to-speech product.
In today's globalized world, effective communication across language barriers remains a significant challenge. Traditional solutions have several limitations:
Key Pain Points
Reduced participation due to language barriers in cross-language meetings and conferences.
Multiple interpreters needed to cover different languages, with advance booking required.
Beyond other complexities, human interpreters are expensive, require substantial investment.
Other challenges:
Need for transcription alongside translation;
Requirement for real-time translation without disrupting;
Seamless integration with existing meeting platforms.
Define
Solution: AI live interpretation for multilingual virtual meetings.
Platform: Web-based application integrated with meeting platforms.
Target Users
• Global business with distributed teams
• Conference and virtual event organizers
• International content creators and livestream hosts
In short, it caters to business clients.
User Goals
Effortless communication across languages
Reliable high-quality real-time translation
Simple process integrates with existing workflows
Flexible language selection and audio control
Business Goals
1. User growth. Multilingual support + integration with popular platforms, to increase user adoption & retention + expand product reach.
2. Revenue Generation. Drive recurring revenue + cross-sell with the AI dubbing product through different plans.
3. User satisfaction. Multiple products and cost-effective, improve key metrics to enhance user loyalty.
In our startup, I took on the role of both the product designer and the product manager for this project.
Team Collaboration & Project Planning
To ensure the successful collaborative development of the project, I defined the product model and focus for different stage among team members, along with a concept idea for a mobile app.
Devs
Handle platform integrations; Implement the user interface; Manage the backend to enable real-time audio processing.
LLM
Provide language model capabilities, including real-time speech-to-text processing, translation, and speech synthesis.
Me
Define product vision, translate user needs into actionable tasks, prioritize features; Design all the way from concept to finalized pixels.
Phase 1: MVP Development
Single-speaker, audio-only live translation.
Goal: Validate core concept with minimal viable feature set.
Scope:
• One Core Flow: Live translation for Google Meet.
• Audio-Only: Provides interpretation audio alongside translated text.
• Single Language Direction: One presenter → multiple listeners.
• Basic Controls: Translation language & audio volume
Simple 2-Step Create Experience:
Easy 1-Step Join Process:
Attending live-translated meetings
In this audio-only MVP, guests can either listen to the interpretation with transcript, or view the video while listening to the translated audio through the original meeting.
User Value:
- Instant solution for basic translation needs
- Simple setup and intuitive, low-barrier workflow
- Google Meet integration for a familiar experience
Business Value:
- Accelerate market entry to gain a competitive edge
- Quick market validation with minimal features
- Build initial user base for feedback and core data
Phase 2: Enhance Functionality
More feature support making it a more comprehensive product.
Goal: Expand functionalities to meet more advanced user needs.
Phase 2 focused on expanding the product’s capabilities to address more meeting scenarios and improve overall user experience, laying the groundwork for broader adoption. Key features including:
Support for multiple
same-language presenters
Meeting
recording/playback
Additional meeting
platform support
Volume control for both
original and translated audio
At this point, users can find our product on the Zoom App Marketplace:
User Value:
- Support multiple same-language presenters
- Record meetings and enable playback
- Enhance user experience with better audio controls
Business Value:
- Boost user retention and platform engagement
- Enable cross-selling opportunities with the dubbing product
- Achieve competitive differentiation
We're working on this phase now. Try it out: SpeechLab Live ↗
Phase 3: Complete Solution
Goal: Support complex use cases to meet almost all user needs.
Expected scope:
• Video meeting integration support
• Multiple presenters support
• Different language settings for each presenter
• Full control features
We're working on the Phase 2 now. Try it out: SpeechLab Live ↗
Product Decisions & Trade-offs
Platform Integration
- User Benefit: Familiar interface, minimal learning curve
- Business Benefit: Faster adoption, reduced development costs
Language Selection Flow
- User Benefit: Intuitive language switching
- Business Benefit: Scalable architecture for future expansion
Post-Meeting Integration
- User Benefit: Complete communication solution
- Business Benefit: Cross-product synergy
Key Innovations
Cross-Platform Integration; Language Flexibility; Post-Meeting Features.
Key Learnings
Product Strategy
- Balance between user needs and business goals is crucial
- Iterative development allows for quick adjustments
- Strong core experience drives organic growth
User Experience
- Simple onboarding drives adoption
- Integration with existing workflows is key
- User control builds trust
Business Development
- Solving real problems leads to natural growth
- Cross-product integration creates lasting value
- Data-driven decisions improve outcomes
Team:
Me, 1 designer, 2 developers, 2 LLM engineers, 1 QA
— Live Translation —
That’s it, thank you!
by Akah (Yaxiong Fu)