Performance Benchmarks¶
This document contains performance benchmark results for the rustnn WebNN implementation across different backends.
Test Environment¶
- Platform: macOS (Apple Silicon)
- Hardware: Apple M-series processor with Neural Engine
- Date: 2025-12-13
- Library Version: 0.2.0
Backend Comparison¶
Simple Model (10 Layers: Add + ReLU)¶
| Backend | Cold Start | Run 2 | Warm Avg | vs ONNX CPU |
|---|---|---|---|---|
| ONNX CPU | 72.0ms | 24.8ms | 24.8ms | 1.00x (baseline) |
| ONNX GPU | ~50ms | ~25ms | ~25ms | 1.00x |
| CoreML default | 24.5ms | 24.5ms | 24.1ms | 0.97x (3% faster) |
| CoreML low-power | 133.3ms | 61.0ms | 59.3ms | 2.39x (slower) |
| CoreML high-perf | 23.8ms | 23.9ms | 23.9ms | 0.96x (4% faster) |
Complex Model (200 Operations: 50 Layers × 4 Ops)¶
| Metric | Value |
|---|---|
| Run 1 (cold) | 63.77ms |
| Run 2 (warm) | 19.95ms |
| Runs 3-5 avg | 19.00ms |
| Speedup (cold→warm) | 3.2x |
| Time saved | 43.81ms (68.7% improvement) |
MobileNetV2 (106 Layers, Real-World Model)¶
| Backend | First Run | Expected Warm Run | Speedup |
|---|---|---|---|
| ONNX CPU | 71.65ms | 71.65ms | 1.0x |
| ONNX GPU | 44.49ms | 44.49ms | 1.0x |
| CoreML | 11,093ms | ~50-100ms (est.) | 100-200x |
Key Findings¶
1. CoreML Warm-Up Behavior¶
CoreML exhibits significant first-run overhead for complex models:
- Simple models (10-20 ops): Minimal warm-up (~1-2ms difference)
- Complex models (200 ops): 3.2x speedup after first run
- Large models (MobileNetV2): Estimated 100-200x speedup after first run
The first run includes: - Model compilation (~500-1000ms) - Neural Engine graph optimization (~3-10 seconds for large models) - Memory allocation and initialization
2. Backend Selection Impact¶
CoreML default/high-performance modes: - Fastest inference: ~24ms for simple models - Slightly faster than ONNX CPU (~3-4%) - Minimal warm-up overhead
CoreML low-power mode: - Slower inference: ~59ms for simple models (2.4x slower) - Optimized for power efficiency, not speed - Good for battery-constrained devices
3. Model Complexity Scaling¶
Performance characteristics by model size:
| Model Size | Cold Start | Warm Run | Speedup |
|---|---|---|---|
| Small (10 ops) | ~25ms | ~24ms | 1.0x |
| Medium (200 ops) | ~64ms | ~19ms | 3.4x |
| Large (MobileNetV2) | ~11,000ms | ~50-100ms (est.) | 100-200x |
Insight: Larger models benefit dramatically from CoreML's optimization, but pay a higher first-run cost.
Performance Recommendations¶
For Production Use¶
- Keep Python process running: Don't exit after each inference
- Load model once: Reuse the same graph and context
- Accept first-run cost: The 10-second compilation is a one-time investment
- Target warm-run performance: After warm-up, CoreML is competitive with ONNX
Backend Selection Guide¶
Choose ONNX CPU when: - Consistent performance needed (no warm-up) - Running single inference then exiting - Cross-platform compatibility required
Choose ONNX GPU when: - Need fastest possible inference - Have NVIDIA GPU available - Consistent performance needed
Choose CoreML default when: - Running on macOS/iOS - Need best performance after warm-up - Can afford first-run compilation cost - Want to leverage Neural Engine
Choose CoreML low-power when: - Battery life is critical - Running on mobile devices - Can accept slower inference (~2x)
Benchmark Reproducibility¶
To reproduce these benchmarks, run:
# Run the full benchmark suite
pytest tests/test_performance.py -v
# Run specific backend tests
pytest tests/test_performance.py -k "test_coreml" -v
pytest tests/test_performance.py -k "test_onnx" -v
# Generate detailed performance report
pytest tests/test_performance.py --benchmark-only -v
Future Optimizations¶
Potential areas for performance improvement:
- Persistent model caching: Cache compiled CoreML models across Python sessions
- Pre-compilation: Compile models ahead of time, not on first inference
- Graph optimization: Optimize WebNN graphs before backend conversion
- Operator fusion: Merge consecutive operations where possible
- Memory pooling: Reuse tensor memory across inferences
Related Documentation¶
- Development Guide - Build and test instructions
- API Reference - Complete API documentation
- Operator Status - Supported operations per backend
Note: Performance results may vary based on hardware, system load, and model characteristics. These benchmarks represent typical performance under normal conditions.