TuTu Engine Documentation
Everything you need to run AI locally and participate in the TuTu distributed supercomputer network.
Installation
macOS
Linux
Supports x86_64 and ARM64. The installer auto-detects your architecture.
Windows
Or install via WinGet:
Build from Source
Requires Go 1.24+. No CGO dependencies.
Verify Installation
Quick Start
Run your first AI model in 30 seconds:
tutu pull llama3.2 to
pre-download without starting a session.
Upgrading
Re-run the install command for your platform. TuTu handles migrations automatically. You can also check the GitHub Releases page for changelogs.
CLI Reference
TuTu provides a single binary with all commands built in. Run tutu help for a quick overview or
tutu <command> --help for details.
| Command | Description |
|---|---|
tutu run |
Run a model interactively |
tutu pull |
Download a model without running |
tutu list |
List downloaded models |
tutu create |
Create a model from a TuTufile |
tutu show |
Show model details & metadata |
tutu rm |
Remove a downloaded model |
tutu serve |
Start the API server daemon |
tutu ps |
List running models |
tutu stop |
Stop a running model |
tutu run
Run a model interactively. Downloads automatically if not present locally.
tutu pull
Download a model from the TuTu registry without starting a session.
tutu list
List all locally downloaded models.
tutu create
Create a custom model from a TuTufile.
tutu show
Display model details including parameters, system prompt, and template.
tutu rm
Remove a downloaded model and free disk space.
tutu serve
Start the TuTu API server. This runs as a background daemon.
tutu ps
List all currently running models.
tutu stop
Stop a running model and release resources.
REST API
TuTu exposes a REST API on http://localhost:11434 when the server is running. All endpoints accept
and return JSON.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/generate |
Generate a completion |
POST |
/api/chat |
Chat with a model |
POST |
/api/embeddings |
Generate embeddings |
POST |
/api/create |
Create a model |
GET |
/api/tags |
List local models |
POST |
/api/pull |
Pull a model |
DELETE |
/api/delete |
Delete a model |
Generate Completion
Chat
OpenAI Compatibility
TuTu provides an OpenAI-compatible API at /v1/ so every SDK and tool that works with OpenAI works
with TuTu — just change the base URL.
Python (OpenAI SDK)
JavaScript (fetch)
TuTufile Syntax
A TuTufile defines a custom model — think of it as a Dockerfile for AI models. It lets you set a base model, system prompt, parameters, and more.
| Instruction | Description |
|---|---|
FROM |
Base model to build from (required) |
SYSTEM |
Set the system prompt |
PARAMETER |
Set model parameters (temperature, top_p, etc.) |
TEMPLATE |
Custom prompt template (Go template syntax) |
ADAPTER |
Path to a LoRA adapter file |
LICENSE |
Specify the model license |
MESSAGE |
Add few-shot example messages |
Build and run your custom model:
Configuration
TuTu is configured via environment variables. No config files needed — sensible defaults for everything.
| Variable | Default | Description |
|---|---|---|
TUTU_HOST |
127.0.0.1 |
API server bind address |
TUTU_PORT |
11434 |
API server port |
TUTU_MODELS |
~/.tutu/models |
Model storage directory |
TUTU_GPU_LAYERS |
auto |
Number of GPU layers (auto-detects) |
TUTU_NUM_PARALLEL |
1 |
Max parallel requests |
TUTU_MAX_LOADED |
1 |
Max models loaded simultaneously |
TUTU_KEEP_ALIVE |
5m |
Idle model unload timeout |
TUTU_DEBUG |
false |
Enable debug logging |
TUTU_HOST=0.0.0.0 to allow external
connections. Be sure to configure your firewall appropriately.
Distributed Computing Network
When your machine is idle, TuTu can optionally contribute compute to the global TuTu Network — the world's first truly distributed AI supercomputer. Every participating node makes AI inference faster and cheaper for the entire network.
How It Works
- Idle Detection: TuTu monitors your system and only uses compute when your machine is genuinely idle.
- Privacy: Your data never leaves your machine. Network tasks are sandboxed and isolated.
- Opt-in: Network participation is completely optional. Local AI works perfectly without it.
- Reward: You earn credits for every compute cycle you contribute.
Credit System
The TuTu credit system powers the distributed network economy:
- Earn credits by contributing idle GPU time to the network.
- Spend credits to access more powerful models or faster inference on the network.
- 500 free credits are given to every new user, plus 30 days before any prompt to contribute.
- Local AI is always free — credits only apply to network-powered features.
MCP Server (Model Context Protocol)
TuTu Engine implements a full MCP Gateway following the Model Context Protocol 2025-03-26 specification. MCP is an open standard for connecting AI models to external tools, data sources, and services — think of it as USB-C for AI.
How MCP Works
When you run tutu serve, the MCP endpoint is automatically available at /mcp. Any
MCP-compatible AI client (Claude, ChatGPT, custom apps) can connect and use the tools and resources TuTu
exposes.
Available MCP Tools
| Tool | Description | Parameters |
|---|---|---|
tutu_run |
Run a model with a given prompt | model, prompt |
tutu_list |
List available local models | None |
tutu_pull |
Download a model from registry | model |
tutu_status |
Get system and model status | None |
Available MCP Resources
| Resource URI | Description |
|---|---|
tutu://models |
List of all installed models |
tutu://status |
System status and health info |
tutu://credits |
Credit balance and earnings |
Enterprise MCP Use Cases
TuTu's MCP Gateway enables powerful enterprise AI integrations:
| Use Case | How It Works |
|---|---|
| AI Coding Assistants | Connect your IDE's AI (Cursor, Copilot, Continue.dev) to local models via MCP tools. AI can run models, check status, and manage models — all through the standard MCP protocol. |
| Customer Support Bots | Give AI agents access to local inference without cloud API costs. Use MCP resources to monitor model availability and queue depth. |
| Data Analysis Pipelines | Embed TuTu as the AI layer in your data pipeline. MCP tools let orchestrators (LangChain, AutoGen) call models with structured inputs. |
| DevOps & Automation | AI agents use MCP to run inference tasks, pull models, and check system health — all with rate limiting and SLA guarantees. |
| Multi-Model Orchestration | Use the tutu_list and tutu_run tools to dynamically select and route to the
best model for each task. |
MCP Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/mcp |
MCP JSON-RPC 2.0 endpoint (Streamable HTTP transport) |
Supported MCP Methods
| Method | Description |
|---|---|
initialize |
Initialize MCP session, negotiate capabilities |
tools/list |
List available MCP tools |
tools/call |
Execute an MCP tool |
resources/list |
List available MCP resources |
resources/read |
Read an MCP resource |
prompts/list |
List available prompt templates |
SLA Tiers
TuTu's MCP Gateway supports 4 SLA tiers for different usage levels:
| Tier | Rate Limit | Burst | Latency Target | Price |
|---|---|---|---|---|
| Free | 10 req/min | 20 | Best effort | $0 |
| Pro | 100 req/min | 200 | < 500ms | Credits |
| Business | 1,000 req/min | 2,000 | < 200ms | Credits |
| Enterprise | 10,000 req/min | 20,000 | < 100ms | Credits |
Local Fine-Tuning
Fine-tune models on your own hardware using TuTufile. Define adapters, system prompts, and training parameters in one declarative file.
Distributed Fine-Tuning
Submit fine-tuning jobs to the TuTu network. Tasks are distributed across capable peers and you pay with credits.
How Distributed Fine-Tuning Works
- Submit a Job: Define your base model, training dataset, method (LoRA/QLoRA), and credit budget.
- Network Distribution: TuTu's ML scheduler finds capable peers with the right GPU hardware and distributes training shards.
- Progress Tracking: Monitor training progress in real-time via
tutu agent status. - Result Delivery: The trained adapter weights are returned to you. Merge them into your
model with
tutu create.
Fine-Tuning Methods & Costs
| Method | VRAM Required | Speed | Quality | Credit Cost |
|---|---|---|---|---|
| Full Fine-Tune | 48GB+ | Slow | Best | High |
| LoRA | 8GB+ | Fast | Great | Medium |
| QLoRA | 4GB+ | Fast | Good | Low |
| Adapter Merging | 4GB+ | Instant | Good | Free |
tutu agent join to start earning.
Engagement System
TuTu Engine includes a full gamification system to reward and retain contributors.
Level System (L1–L100)
| Level Range | Title | Perks |
|---|---|---|
| 1–10 | Newcomer | Basic access, learning quests |
| 11–25 | Contributor | Priority queue, badge display |
| 26–50 | Builder | Beta features, voting rights |
| 51–75 | Expert | Governance participation, bonus multipliers |
| 76–100 | Legend | Network council, custom badges, max multipliers |
Achievements (25+)
Unlock achievements by hitting milestones. Each achievement rewards credits and XP.
| Achievement | Requirement | Reward |
|---|---|---|
| First Run | Run your first model | 50 credits |
| Network Pioneer | Join distributed network | 100 credits |
| Week Warrior | 7-day contribution streak | 200 credits |
| Diamond Contributor | 10,000 GPU hours | 5,000 credits |
| Fine-Tune Master | Complete 50 fine-tuning jobs | 2,500 credits |
Streak Bonuses
| Streak | Earning Bonus |
|---|---|
| 7 days | 1.1× earnings |
| 30 days | 1.25× earnings |
| 90 days | 1.5× earnings |
| 365 days | 2.0× earnings |
Check your progress anytime: