Downloadable Models

Browse and download AI models to run directly on your device. No internet required after download.

ExecuTorchLLM

LLaMA 3.2 1B

Full precision 1B model from Meta

1B~2.5 GB
ExecuTorchLLM

LLaMA 3.2 1B SpinQuant

1B model with SpinQuant quantization (recommended)

1B~1.3 GB
ExecuTorchLLM

LLaMA 3.2 3B

Full precision 3B model from Meta

3B~6.5 GB
ExecuTorchLLM

LLaMA 3.2 3B SpinQuant

3B model with SpinQuant quantization

3B~3.2 GB
ExecuTorchLLM

Qwen 3 0.6B

Lightweight Qwen 3 model with thinking capability

0.6B~1.2 GB
ExecuTorchLLM

Qwen 3 1.7B

Larger Qwen 3 model with thinking capability

1.7B~3.4 GB
ExecuTorchLLM

Qwen 3 4B

Large Qwen 3 model with thinking capability

4B~8.0 GB
ExecuTorchLLM

Qwen 2.5 0.5B

Ultra-lightweight Qwen 2.5 model

0.5B~1.0 GB
ExecuTorchLLM

Qwen 2.5 1.5B

Lightweight Qwen 2.5 model

1.5B~3.0 GB
ExecuTorchLLM

Qwen 2.5 3B

Mid-size Qwen 2.5 model

3B~6.0 GB
ExecuTorchLLM

Hammer 2.1 0.5B

Ultra-lightweight model optimized for function calling

0.5B~1.0 GB
ExecuTorchLLM

Hammer 2.1 1.5B

Lightweight model optimized for function calling

1.5B~3.0 GB
ExecuTorchLLM

Hammer 2.1 3B

Mid-size model optimized for function calling

3B~6.0 GB
ExecuTorchLLM

SmolLM 2 135M

Ultra-lightweight model for simple tasks

135M~270 MB
ExecuTorchLLM

SmolLM 2 360M

Small but capable model

360M~720 MB
ExecuTorchLLM

SmolLM 2 1.7B

Larger SmolLM model with better capabilities

1.7B~3.4 GB
ExecuTorchLLM

Phi 4 Mini 4B

Microsoft's Phi 4 Mini model

4B~8.0 GB
ExecuTorchEmbedding

All MiniLM L6 V2

Lightweight text embedding model for semantic search and RAG

22M~45 MB
Llama.cppLLM

Qwen 3 0.6B GGUF

Lightweight Qwen 3 model for llama.cpp with thinking capability

0.6B~0.4 GB
Llama.cppEmbedding

Nomic Embed Text v1.5 GGUF

Text embedding model for RAG using llama.cpp

137M~100 MB

Download these models directly in the app. Available on Android and Web.

Open App to Download