@siteed/sherpa-onnx.rn

React Native wrapper for sherpa-onnx providing on-device speech-to-text and text-to-speech capabilities

Overview

The @siteed/sherpa-onnx.rn package is a comprehensive React Native wrapper for the sherpa-onnx library, enabling high-performance speech processing directly on mobile devices. This library brings powerful on-device speech recognition and text-to-speech synthesis to mobile applications without cloud dependencies.

Live Demo

Try the Live Web Demo →

Demo Features

The demo above showcases the core functionality of the Sherpa ONNX wrapper:

Real-time Transcription: Instantly convert speech to text as you speak
Text-to-Speech: Generate natural sounding voice from entered text
Cross-platform: Same experience on web, iOS and Android platforms
No Cloud Services: All processing happens directly on your device

Tech Stack

C++React NativeTypeScriptONNXTTSSTTSpeech Recognition

Key Features

On-device Speech-to-Text: Process speech without cloud services using ONNX models
Text-to-Speech Synthesis: Generate natural sounding speech on-device
Streaming Recognition: Real-time speech recognition with partial results
Native C++ Performance: Optimized implementation for mobile devices
Cross-platform Support: Works on iOS, Android, and Web
Multiple Architecture Support: Compatible with both old and new React Native architectures
Pre-built Binaries: Ready-to-use native libraries for easy integration

API Levels

High-level API

Streamlined interfaces for common use cases:

Text-to-Speech: Simple API for generating speech from text
Speech-to-Text: Straightforward audio-to-text conversion
Streaming Recognition: Real-time audio processing with partial results

Low-level API

Direct access to sherpa-onnx capabilities for advanced use cases:

Custom Recognizers: Fine-grained control over recognition parameters
Stream Management: Manual control of audio streams
Model Configuration: Advanced model setup and tuning

Model Support

Transducer Models: Support for encoder-decoder-joiner architecture
CTC Models: Single-file model support
Paraformer Models: Advanced encoder-decoder models
VITS/Matcha/Kokoro: Multiple TTS model architectures

Tech Challenges

Native Integration: Bridging complex C++ libraries to React Native environment
Memory Management: Handling large model files and audio buffers efficiently
Cross-platform Consistency: Ensuring identical behavior across iOS, Android, and Web
Performance Optimization: Balancing accuracy and speed for real-time speech processing
Architecture Compatibility: Supporting both old and new React Native architectures

Implementation Highlights

The package includes several technical innovations:

Static Library Integration: Pre-compiled native libraries for performance
Dual Architecture Support: Sophisticated bridging layer for compatibility
Asset Management: Efficient handling of model files in app bundles
Streaming Architecture: Non-blocking audio processing pipeline

Links

Back to projects