@siteed/sherpa-onnx.rn
React Native wrapper for sherpa-onnx providing on-device speech-to-text and text-to-speech capabilities
Overview
The @siteed/sherpa-onnx.rn package is a comprehensive React Native wrapper for the sherpa-onnx library, enabling high-performance speech processing directly on mobile devices. This library brings powerful on-device speech recognition and text-to-speech synthesis to mobile applications without cloud dependencies.
Live Demo
Demo Features
The demo above showcases the core functionality of the Sherpa ONNX wrapper:
- Real-time Transcription: Instantly convert speech to text as you speak
- Text-to-Speech: Generate natural sounding voice from entered text
- Cross-platform: Same experience on web, iOS and Android platforms
- No Cloud Services: All processing happens directly on your device
Tech Stack
C++React NativeTypeScriptONNXTTSSTTSpeech Recognition
Key Features
- On-device Speech-to-Text: Process speech without cloud services using ONNX models
- Text-to-Speech Synthesis: Generate natural sounding speech on-device
- Streaming Recognition: Real-time speech recognition with partial results
- Native C++ Performance: Optimized implementation for mobile devices
- Cross-platform Support: Works on iOS, Android, and Web
- Multiple Architecture Support: Compatible with both old and new React Native architectures
- Pre-built Binaries: Ready-to-use native libraries for easy integration
API Levels
High-level API
Streamlined interfaces for common use cases:
- Text-to-Speech: Simple API for generating speech from text
- Speech-to-Text: Straightforward audio-to-text conversion
- Streaming Recognition: Real-time audio processing with partial results
Low-level API
Direct access to sherpa-onnx capabilities for advanced use cases:
- Custom Recognizers: Fine-grained control over recognition parameters
- Stream Management: Manual control of audio streams
- Model Configuration: Advanced model setup and tuning
Model Support
- Transducer Models: Support for encoder-decoder-joiner architecture
- CTC Models: Single-file model support
- Paraformer Models: Advanced encoder-decoder models
- VITS/Matcha/Kokoro: Multiple TTS model architectures
Tech Challenges
- Native Integration: Bridging complex C++ libraries to React Native environment
- Memory Management: Handling large model files and audio buffers efficiently
- Cross-platform Consistency: Ensuring identical behavior across iOS, Android, and Web
- Performance Optimization: Balancing accuracy and speed for real-time speech processing
- Architecture Compatibility: Supporting both old and new React Native architectures
Implementation Highlights
The package includes several technical innovations:
- Static Library Integration: Pre-compiled native libraries for performance
- Dual Architecture Support: Sophisticated bridging layer for compatibility
- Asset Management: Efficient handling of model files in app bundles
- Streaming Architecture: Non-blocking audio processing pipeline