Local Models
Run AI models directly on your device. No internet required, complete privacy.
Platform Note: Local models are only available on the Android app. The web version does not support on-device inference.
Supported Formats
.pte
ExecuTorch
Meta's optimized format for mobile inference. Best performance on newer devices with NPU/GPU acceleration.
- • Optimized for mobile hardware
- • Smaller file sizes
- • Hardware acceleration support
.gguf
Llama.cpp
Popular format with wide model availability. Excellent compatibility and community support.
- • Wide model selection
- • Quantization options (Q4, Q5, Q8)
- • Active community support
Getting Models
Download from HuggingFace
Browse and download models directly within the app.
- Go to Settings → Models
- Tap "Download Model"
- Search for a model on HuggingFace
- Select the model variant (size/quantization)
- Wait for download to complete
Import Local Files
Import models you've already downloaded.
- Go to Settings → Models
- Tap "Import Model"
- Select the .pte or .gguf file
- Optionally add tokenizer files
- Give the model a name and save
Recommended Models
| Model | Size | Best For |
|---|---|---|
| Llama 3.2 1B | ~1GB | Quick responses, older devices |
| Llama 3.2 3B | ~2GB | Balance of speed and quality |
| Phi-3 Mini | ~2GB | Reasoning tasks |
| Qwen2.5 3B | ~2GB | Multilingual support |
Using Local Models
- Once a model is downloaded/imported, it appears in the model picker
- Start a new conversation and select the local model
- The model will load automatically (first load may take a moment)
- Chat as normal - all processing happens on your device