Local Models

Run AI models directly on your device. No internet required, complete privacy.

Platform Note: Local models are only available on the Android app. The web version does not support on-device inference.

Supported Formats

.pte

ExecuTorch

Meta's optimized format for mobile inference. Best performance on newer devices with NPU/GPU acceleration.

  • • Optimized for mobile hardware
  • • Smaller file sizes
  • • Hardware acceleration support
.gguf

Llama.cpp

Popular format with wide model availability. Excellent compatibility and community support.

  • • Wide model selection
  • • Quantization options (Q4, Q5, Q8)
  • • Active community support

Getting Models

Download from HuggingFace

Browse and download models directly within the app.

  1. Go to Settings → Models
  2. Tap "Download Model"
  3. Search for a model on HuggingFace
  4. Select the model variant (size/quantization)
  5. Wait for download to complete

Import Local Files

Import models you've already downloaded.

  1. Go to Settings → Models
  2. Tap "Import Model"
  3. Select the .pte or .gguf file
  4. Optionally add tokenizer files
  5. Give the model a name and save

Recommended Models

ModelSizeBest For
Llama 3.2 1B~1GBQuick responses, older devices
Llama 3.2 3B~2GBBalance of speed and quality
Phi-3 Mini~2GBReasoning tasks
Qwen2.5 3B~2GBMultilingual support

Using Local Models

  1. Once a model is downloaded/imported, it appears in the model picker
  2. Start a new conversation and select the local model
  3. The model will load automatically (first load may take a moment)
  4. Chat as normal - all processing happens on your device