Windows Setup Guide: rustnn with TensorRT¶
This guide provides step-by-step instructions for setting up rustnn with TensorRT support on Windows for high-performance GPU inference.
Overview¶
When properly configured, rustnn will automatically use TensorRT as the highest-priority backend for accelerated execution on NVIDIA GPUs, providing significantly better performance than CPU or standard ONNX Runtime execution.
Prerequisites¶
Hardware Requirements¶
- NVIDIA GPU with compute capability 7.0 or higher
- Recommended: T4, RTX 20/30/40 series, A10, A100
- Minimum: GTX 1080, Quadro P4000
- 8GB+ system RAM
- 20GB+ free disk space for dependencies
Software Requirements¶
- Windows 10 (64-bit) or Windows 11
- Administrator access for installation
Installation Steps¶
Step 1: Install NVIDIA GPU Driver¶
-
Check your current driver version:
If this command works, you already have drivers installed. -
Download the latest driver:
- Visit NVIDIA Driver Downloads
- Select your GPU model
-
Download and run the installer
-
Reboot your system after installation
-
Verify installation:
You should see your GPU information displayed.
Step 2: Install CUDA Toolkit¶
TensorRT requires the CUDA runtime libraries.
- Download CUDA Toolkit:
- Visit NVIDIA CUDA Toolkit Downloads
- Select Windows → x86_64 → your Windows version
- Download the installer (network or local installer)
-
Recommended version: CUDA 12.x (check TensorRT-RTX compatibility)
-
Run the installer:
- Choose "Custom Installation"
- At minimum, select:
- CUDA Toolkit
- CUDA Runtime Libraries
- CUDA Development Libraries (if you plan to build from source)
-
Install to default location:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x -
Verify installation:
You should see CUDA compiler version information. -
Verify environment variable (automatically set by installer):
Should output:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x
Step 3: Install TensorRT-RTX¶
- Download TensorRT-RTX:
- Visit NVIDIA Developer TensorRT Downloads
- You may need to create a free NVIDIA Developer account
- Download TensorRT-RTX for Windows (zip archive)
-
Choose the version compatible with your CUDA installation
-
Extract TensorRT-RTX:
- Extract the zip file to a permanent location
- Recommended:
C:\TensorRT-RTX -
The directory structure should look like:
-
Set environment variable:
-
Add TensorRT to PATH:
-
Restart your terminal or reboot for changes to take effect
-
Verify installation:
You should see TensorRT header files and library files.
Step 4: Install Rust Toolchain¶
- Download Rust:
- Visit rustup.rs
-
Download and run
rustup-init.exe -
Install with default settings:
- Choose option 1 (default installation)
-
This installs:
- Rust compiler (rustc)
- Cargo package manager
- Standard library
-
Verify installation:
-
Install Visual Studio Build Tools (required for linking):
- Download Visual Studio Build Tools
- Install "Desktop development with C++"
- Or use full Visual Studio 2019/2022 with C++ workload
Step 5: Install Python (for Python bindings)¶
If you plan to use rustnn from Python:
- Download Python 3.8 or later:
- Visit python.org
-
Download Windows installer (64-bit)
-
Install Python:
- Check "Add Python to PATH" during installation
-
Choose "Install for all users" (recommended)
-
Verify installation:
Step 6: Build rustnn with TensorRT Support¶
-
Clone the rustnn repository:
-
Build Rust library with TensorRT:
This will: - Download and compile dependencies - Link against TensorRT-RTX libraries - Create optimized release build - Take 5-15 minutes on first build
-
Run tests to verify:
-
Build Python package (if using Python bindings):
Step 7: Verify TensorRT Integration¶
-
Create a test Python script (
test_trt.py):import webnn import numpy as np # Create context - should select TensorRT backend ml = webnn.ML() context = ml.create_context( power_preference="high-performance", accelerated=True ) print(f"Backend selected: {context.accelerated}") print("TensorRT backend is active!" if context.accelerated else "Fallback backend") # Create a simple graph builder = context.create_graph_builder() x = builder.input("x", [2, 3], "float32") y = builder.relu(x) graph = builder.build({"output": y}) # Execute inputs = {"x": np.array([[-1, 2, -3], [4, -5, 6]], dtype=np.float32)} outputs = context.compute(graph, inputs) print("Output:", outputs["output"]) print("Success! TensorRT is working.") -
Run the test:
-
Expected output:
Troubleshooting¶
Build Errors¶
Error: "Cannot find TensorRT headers"
Solution:
1. Verify TENSORRT_RTX_DIR is set: echo $env:TENSORRT_RTX_DIR
2. Check the directory exists and contains include/ folder
3. Restart terminal after setting environment variables
Error: "Linking error: cannot find -lnvinfer_10"
Solution:
1. Verify TensorRT lib directory is in PATH
2. Check lib files exist: dir $env:TENSORRT_RTX_DIR\lib
3. Ensure you downloaded the correct Windows version of TensorRT-RTX
4. Try adding to PATH manually:
$env:PATH += ";C:\TensorRT-RTX\lib"
Error: "CUDA not found"
Solution:
1. Verify CUDA_PATH is set: echo $env:CUDA_PATH
2. Run: nvcc --version (should work)
3. Reinstall CUDA Toolkit if necessary
Runtime Errors¶
Error: "TensorRT execution failed: CUDA initialization failed"
Solution:
1. Check GPU is accessible: nvidia-smi
2. Update GPU drivers to latest version
3. Ensure no other process is using the GPU exclusively
4. Restart your computer
Error: "DLL not found" when running Python
Solution:
1. Ensure TensorRT lib directory is in PATH
2. Copy required DLLs to Python script directory:
- nvinfer_10.dll
- nvonnxparser_10.dll
- cudart64_12.dll (or your CUDA version)
3. Or add to PATH for current session:
$env:PATH += ";C:\TensorRT-RTX\lib;$env:CUDA_PATH\bin"
Backend falls back to ONNX instead of TensorRT
Solution:
1. Verify you built with trtx-runtime feature:
cargo build --features trtx-runtime
2. Check Python package includes TensorRT:
pip show rustnn (should list trtx in dependencies)
3. Rebuild Python package with correct features:
maturin develop --features "python,trtx-runtime"
Performance Issues¶
TensorRT is slower than expected
Tips:
1. TensorRT optimizes on first run (engine building)
- First inference may take 10-60 seconds
- Subsequent runs should be much faster
2. Use larger batch sizes when possible
3. Ensure GPU has adequate cooling (check temps with nvidia-smi)
4. Close other GPU-intensive applications
Development Without TensorRT (Mock Mode)¶
If you want to develop on a machine without an NVIDIA GPU, you can use mock mode:
# Build with mock feature
cargo build --features trtx-runtime-mock
# Run tests with mock
cargo test --lib --features trtx-runtime-mock
# Build Python package with mock
maturin develop --features "python,trtx-runtime-mock"
Mock mode: - Compiles and runs without GPU - Useful for development and CI/CD - Does NOT perform actual inference - Returns dummy results
Environment Variable Summary¶
For quick reference, here are all the environment variables you need:
# Run as Administrator
[System.Environment]::SetEnvironmentVariable('CUDA_PATH', 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x', 'Machine')
[System.Environment]::SetEnvironmentVariable('TENSORRT_RTX_DIR', 'C:\TensorRT-RTX', 'Machine')
# Add to PATH
$oldPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine')
$newPath = "$oldPath;C:\TensorRT-RTX\lib;$env:CUDA_PATH\bin"
[System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine')
After setting these, restart your terminal or reboot.
Performance Expectations¶
With TensorRT properly configured, you should see:
| Operation | CPU (ONNX) | GPU (ONNX) | GPU (TensorRT) |
|---|---|---|---|
| Small models (<10 ops) | ~10ms | ~5ms | ~2ms |
| Medium models (10-100 ops) | ~100ms | ~20ms | ~5ms |
| Large models (>100 ops) | ~1000ms | ~100ms | ~20ms |
Note: First-run times include engine building overhead (10-60 seconds).
Next Steps¶
Once TensorRT is working:
- Explore examples in
examples/directory - Read the API Reference for detailed usage
- Check Implementation Status for supported operations
- See Development Guide for contributing
Additional Resources¶
Support¶
If you encounter issues not covered in this guide:
- Check existing GitHub Issues
- Create a new issue with:
- Your Windows version
- GPU model (from nvidia-smi)
- CUDA version (from nvcc --version)
- TensorRT version
- Full error message and stack trace
- Steps to reproduce