Performance Benchmarks¶

This document contains performance benchmark results for the rustnn WebNN implementation across different backends.

Test Environment¶

Platform: macOS (Apple Silicon)
Hardware: Apple M-series processor with Neural Engine
Date: 2025-12-13
Library Version: 0.2.0

Backend Comparison¶

Simple Model (10 Layers: Add + ReLU)¶

Backend	Cold Start	Run 2	Warm Avg	vs ONNX CPU
ONNX CPU	72.0ms	24.8ms	24.8ms	1.00x (baseline)
ONNX GPU	~50ms	~25ms	~25ms	1.00x
CoreML default	24.5ms	24.5ms	24.1ms	0.97x (3% faster)
CoreML low-power	133.3ms	61.0ms	59.3ms	2.39x (slower)
CoreML high-perf	23.8ms	23.9ms	23.9ms	0.96x (4% faster)

Complex Model (200 Operations: 50 Layers × 4 Ops)¶

Metric	Value
Run 1 (cold)	63.77ms
Run 2 (warm)	19.95ms
Runs 3-5 avg	19.00ms
Speedup (cold→warm)	3.2x
Time saved	43.81ms (68.7% improvement)

MobileNetV2 (106 Layers, Real-World Model)¶

Backend	First Run	Expected Warm Run	Speedup
ONNX CPU	71.65ms	71.65ms	1.0x
ONNX GPU	44.49ms	44.49ms	1.0x
CoreML	11,093ms	~50-100ms (est.)	100-200x

Key Findings¶

1. CoreML Warm-Up Behavior¶

CoreML exhibits significant first-run overhead for complex models:

Simple models (10-20 ops): Minimal warm-up (~1-2ms difference)
Complex models (200 ops): 3.2x speedup after first run
Large models (MobileNetV2): Estimated 100-200x speedup after first run

The first run includes: - Model compilation (~500-1000ms) - Neural Engine graph optimization (~3-10 seconds for large models) - Memory allocation and initialization

2. Backend Selection Impact¶

CoreML default/high-performance modes: - Fastest inference: ~24ms for simple models - Slightly faster than ONNX CPU (~3-4%) - Minimal warm-up overhead

CoreML low-power mode: - Slower inference: ~59ms for simple models (2.4x slower) - Optimized for power efficiency, not speed - Good for battery-constrained devices

3. Model Complexity Scaling¶

Performance characteristics by model size:

Model Size	Cold Start	Warm Run	Speedup
Small (10 ops)	~25ms	~24ms	1.0x
Medium (200 ops)	~64ms	~19ms	3.4x
Large (MobileNetV2)	~11,000ms	~50-100ms (est.)	100-200x

Insight: Larger models benefit dramatically from CoreML's optimization, but pay a higher first-run cost.

Performance Recommendations¶

For Production Use¶

Keep Python process running: Don't exit after each inference
Load model once: Reuse the same graph and context
Accept first-run cost: The 10-second compilation is a one-time investment
Target warm-run performance: After warm-up, CoreML is competitive with ONNX

Backend Selection Guide¶

Choose ONNX CPU when: - Consistent performance needed (no warm-up) - Running single inference then exiting - Cross-platform compatibility required

Choose ONNX GPU when: - Need fastest possible inference - Have NVIDIA GPU available - Consistent performance needed

Choose CoreML default when: - Running on macOS/iOS - Need best performance after warm-up - Can afford first-run compilation cost - Want to leverage Neural Engine

Choose CoreML low-power when: - Battery life is critical - Running on mobile devices - Can accept slower inference (~2x)

Benchmark Reproducibility¶

To reproduce these benchmarks, run:

# Run the full benchmark suite
pytest tests/test_performance.py -v

# Run specific backend tests
pytest tests/test_performance.py -k "test_coreml" -v
pytest tests/test_performance.py -k "test_onnx" -v

# Generate detailed performance report
pytest tests/test_performance.py --benchmark-only -v

Future Optimizations¶

Potential areas for performance improvement:

Persistent model caching: Cache compiled CoreML models across Python sessions
Pre-compilation: Compile models ahead of time, not on first inference
Graph optimization: Optimize WebNN graphs before backend conversion
Operator fusion: Merge consecutive operations where possible
Memory pooling: Reuse tensor memory across inferences

Development Guide - Build and test instructions
API Reference - Complete API documentation
Operator Status - Supported operations per backend

Note: Performance results may vary based on hardware, system load, and model characteristics. These benchmarks represent typical performance under normal conditions.