Soothly
An offline baby cry detection system I built to run locally on mobile devices, helping parents decode what their little ones need.
As a new dad, I wanted a small weekend project to help with parenting. What started as a simple idea quickly turned into a deep dive down the rabbit hole of analyzing different baby cry datasets and building a complete training pipeline to make everything work on-device. Classic scope creep, but worth every minute!
In the future, I plan to implement these same models on embedded devices with baby monitoring video/audio systems. Stay tuned!
Preview
Soothly in action, analyzing baby cries and providing insights



Soothly app interface showing cry classification and history tracking
Key Features
- AI-Powered Analysis: The machine learning model analyzes baby cries in real-time to identify their needs - no more guesswork at 3 AM.
- Quick Recognition: Get instant insights into whether your baby is hungry, tired, uncomfortable, or needs a diaper change.
- History Tracking: Keep track of crying patterns and needs over time to better understand your baby's unique communication style.
- Privacy-First: All processing happens locally on your device - no data ever leaves your phone.
- Real-Time Performance: Optimized for speed to deliver near-instant cry detection, even on older devices.
How It Works
- Audio Capture & Preprocessing: The app records audio using the device's microphone, resamples to 16kHz, and converts to mono for consistent processing.
- Feature Extraction: I extract rich audio features including MFCC, Chroma, Mel Spectrogram, Spectral Contrast, and Tonnetz to capture the unique characteristics of the baby's cry.
- Feature Aggregation: Features are calculated over small frames of audio and averaged to create a compact 194-element feature vector.
- Model Inference: The feature vector feeds into an ONNX model that predicts the cry type along with a confidence score.
Tech Challenges
- Implementing efficient audio processing on mobile devices with limited resources was a real pain
- Had to optimize the machine learning model for on-device inference while maintaining accuracy
- Creating a feature extraction pipeline that matches the training environment took several iterations
- Balancing performance with battery consumption for real-time analysis required careful optimization
Technical Implementation
- Feature extraction leverages my own package @siteed/expo-audio-studio (formerly @siteed/expo-audio-stream) for all the audio processing and feature extraction
- Built custom implementations for features like spectral contrast that weren't available in mobile libraries
- Used ONNX model format for cross-platform compatibility and performance
- Implemented in React Native with native modules for performance-heavy tasks