TuTu Engine Documentation

Everything you need to run AI locally and participate in the TuTu distributed supercomputer network.

Installation

macOS

$ curl -fsSL https://tutuengine.tech/install.sh | sh

Linux

$ curl -fsSL https://tutuengine.tech/install.sh | sh

Supports x86_64 and ARM64. The installer auto-detects your architecture.

Windows

PS> irm tutuengine.tech/install.ps1 | iex

Or install via WinGet:

PS> winget install tutu-network.tutu

Build from Source

$ git clone https://github.com/NikeGunn/tutu.git $ cd tutu $ go build -o tutu ./cmd/tutu $ ./tutu --version

Requires Go 1.24+. No CGO dependencies.

Verify Installation

$ tutu --version tutu version 1.0.0 (go1.22.0 linux/amd64)

Quick Start

Run your first AI model in 30 seconds:

# Start the TuTu daemon $ tutu serve # Run a model (auto-downloads if needed) $ tutu run llama3.2 # Chat! >>> What is the capital of France? The capital of France is Paris.
Tip: TuTu automatically downloads models on first use. Run tutu pull llama3.2 to pre-download without starting a session.

Upgrading

Re-run the install command for your platform. TuTu handles migrations automatically. You can also check the GitHub Releases page for changelogs.

# macOS (Homebrew) $ brew upgrade tutu # Linux / macOS (script) $ curl -fsSL https://tutuengine.tech/install.sh | sh # Windows (PowerShell) PS> irm tutuengine.tech/install.ps1 | iex

CLI Reference

TuTu provides a single binary with all commands built in. Run tutu help for a quick overview or tutu <command> --help for details.

Command Description
tutu run Run a model interactively
tutu pull Download a model without running
tutu list List downloaded models
tutu create Create a model from a TuTufile
tutu show Show model details & metadata
tutu rm Remove a downloaded model
tutu serve Start the API server daemon
tutu ps List running models
tutu stop Stop a running model

tutu run

Run a model interactively. Downloads automatically if not present locally.

$ tutu run <model> [flags] # Examples $ tutu run llama3.2 # interactive chat $ tutu run llama3.2 --verbose # show debug output $ tutu run phi3 --nowordwrap # disable word wrap

tutu pull

Download a model from the TuTu registry without starting a session.

$ tutu pull llama3.2 pulling manifest... done pulling 8934d96d3f08... 100% ████████████ 750 MB success

tutu list

List all locally downloaded models.

$ tutu list NAME SIZE MODIFIED llama3.2:latest 4.7 GB 2 hours ago phi3:latest 2.3 GB 1 day ago mistral:latest 4.1 GB 3 days ago

tutu create

Create a custom model from a TuTufile.

$ tutu create my-assistant -f ./TuTufile transferring model data... creating model layer... done success

tutu show

Display model details including parameters, system prompt, and template.

$ tutu show llama3.2

tutu rm

Remove a downloaded model and free disk space.

$ tutu rm llama3.2 deleted 'llama3.2'

tutu serve

Start the TuTu API server. This runs as a background daemon.

$ tutu serve TuTu is running on http://localhost:11434

tutu ps

List all currently running models.

$ tutu ps NAME SIZE PROCESSOR UNTIL llama3.2:latest 4.7 GB 100% GPU 4 minutes from now

tutu stop

Stop a running model and release resources.

$ tutu stop llama3.2

REST API

TuTu exposes a REST API on http://localhost:11434 when the server is running. All endpoints accept and return JSON.

Method Endpoint Description
POST /api/generate Generate a completion
POST /api/chat Chat with a model
POST /api/embeddings Generate embeddings
POST /api/create Create a model
GET /api/tags List local models
POST /api/pull Pull a model
DELETE /api/delete Delete a model

Generate Completion

curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Why is the sky blue?", "stream": false }'

Chat

curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "Hello!" } ], "stream": false }'

OpenAI Compatibility

TuTu provides an OpenAI-compatible API at /v1/ so every SDK and tool that works with OpenAI works with TuTu — just change the base URL.

Python (OpenAI SDK)

from openai import OpenAI client = OpenAI( base_url="http://localhost:11434/v1", api_key="tutu" # any string works ) resp = client.chat.completions.create( model="llama3.2", messages=[{"role": "user", "content": "Hello!"}] ) print(resp.choices[0].message.content)

JavaScript (fetch)

const resp = await fetch("http://localhost:11434/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "llama3.2", messages: [{ role: "user", content: "Hello!" }] }) }); const data = await resp.json(); console.log(data.choices[0].message.content);

TuTufile Syntax

A TuTufile defines a custom model — think of it as a Dockerfile for AI models. It lets you set a base model, system prompt, parameters, and more.

# TuTufile — Custom coding assistant FROM llama3.2 SYSTEM """ You are a senior software engineer. You write clean, well-documented code. You explain your reasoning step by step. """ PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER num_ctx 4096
Instruction Description
FROM Base model to build from (required)
SYSTEM Set the system prompt
PARAMETER Set model parameters (temperature, top_p, etc.)
TEMPLATE Custom prompt template (Go template syntax)
ADAPTER Path to a LoRA adapter file
LICENSE Specify the model license
MESSAGE Add few-shot example messages

Build and run your custom model:

$ tutu create my-coder -f ./TuTufile $ tutu run my-coder

Configuration

TuTu is configured via environment variables. No config files needed — sensible defaults for everything.

Variable Default Description
TUTU_HOST 127.0.0.1 API server bind address
TUTU_PORT 11434 API server port
TUTU_MODELS ~/.tutu/models Model storage directory
TUTU_GPU_LAYERS auto Number of GPU layers (auto-detects)
TUTU_NUM_PARALLEL 1 Max parallel requests
TUTU_MAX_LOADED 1 Max models loaded simultaneously
TUTU_KEEP_ALIVE 5m Idle model unload timeout
TUTU_DEBUG false Enable debug logging
Note: When running in production, set TUTU_HOST=0.0.0.0 to allow external connections. Be sure to configure your firewall appropriately.

Distributed Computing Network

When your machine is idle, TuTu can optionally contribute compute to the global TuTu Network — the world's first truly distributed AI supercomputer. Every participating node makes AI inference faster and cheaper for the entire network.

How It Works

Credit System

The TuTu credit system powers the distributed network economy:

Coming Soon: The credit system and distributed network features are under active development. Local AI functionality is fully available today.

MCP Server (Model Context Protocol)

TuTu Engine implements a full MCP Gateway following the Model Context Protocol 2025-03-26 specification. MCP is an open standard for connecting AI models to external tools, data sources, and services — think of it as USB-C for AI.

How MCP Works

When you run tutu serve, the MCP endpoint is automatically available at /mcp. Any MCP-compatible AI client (Claude, ChatGPT, custom apps) can connect and use the tools and resources TuTu exposes.

# Initialize MCP session curl -X POST http://localhost:11434/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": { "protocolVersion": "2025-03-26", "clientInfo": {"name": "my-app", "version": "1.0"} } }'

Available MCP Tools

Tool Description Parameters
tutu_run Run a model with a given prompt model, prompt
tutu_list List available local models None
tutu_pull Download a model from registry model
tutu_status Get system and model status None

Available MCP Resources

Resource URI Description
tutu://models List of all installed models
tutu://status System status and health info
tutu://credits Credit balance and earnings

Enterprise MCP Use Cases

TuTu's MCP Gateway enables powerful enterprise AI integrations:

Use Case How It Works
AI Coding Assistants Connect your IDE's AI (Cursor, Copilot, Continue.dev) to local models via MCP tools. AI can run models, check status, and manage models — all through the standard MCP protocol.
Customer Support Bots Give AI agents access to local inference without cloud API costs. Use MCP resources to monitor model availability and queue depth.
Data Analysis Pipelines Embed TuTu as the AI layer in your data pipeline. MCP tools let orchestrators (LangChain, AutoGen) call models with structured inputs.
DevOps & Automation AI agents use MCP to run inference tasks, pull models, and check system health — all with rate limiting and SLA guarantees.
Multi-Model Orchestration Use the tutu_list and tutu_run tools to dynamically select and route to the best model for each task.

MCP Endpoints

Method Endpoint Description
POST /mcp MCP JSON-RPC 2.0 endpoint (Streamable HTTP transport)

Supported MCP Methods

Method Description
initialize Initialize MCP session, negotiate capabilities
tools/list List available MCP tools
tools/call Execute an MCP tool
resources/list List available MCP resources
resources/read Read an MCP resource
prompts/list List available prompt templates

SLA Tiers

TuTu's MCP Gateway supports 4 SLA tiers for different usage levels:

Tier Rate Limit Burst Latency Target Price
Free 10 req/min 20 Best effort $0
Pro 100 req/min 200 < 500ms Credits
Business 1,000 req/min 2,000 < 200ms Credits
Enterprise 10,000 req/min 20,000 < 100ms Credits

Local Fine-Tuning

Fine-tune models on your own hardware using TuTufile. Define adapters, system prompts, and training parameters in one declarative file.

# TuTufile for fine-tuned customer support model FROM llama3 SYSTEM "You are an expert customer support agent for Acme Corp. Be helpful, concise, and professional." ADAPTER ./my-lora-weights PARAMETER temperature 0.7 PARAMETER top_p 0.9
$ tutu create support-bot -f Tutufile $ tutu run support-bot
Cost: Local fine-tuning is completely free. You use your own hardware and compute.

Distributed Fine-Tuning

Submit fine-tuning jobs to the TuTu network. Tasks are distributed across capable peers and you pay with credits.

$ tutu agent finetune \ --base-model llama3 \ --dataset ./training-data.jsonl \ --method lora \ --epochs 3 \ --budget 100 # credits

How Distributed Fine-Tuning Works

Fine-Tuning Methods & Costs

Method VRAM Required Speed Quality Credit Cost
Full Fine-Tune 48GB+ Slow Best High
LoRA 8GB+ Fast Great Medium
QLoRA 4GB+ Fast Good Low
Adapter Merging 4GB+ Instant Good Free
Tip: You can earn credits by contributing your GPU time for other users' fine-tuning jobs. Run tutu agent join to start earning.

Engagement System

TuTu Engine includes a full gamification system to reward and retain contributors.

Level System (L1–L100)

Level Range Title Perks
1–10 Newcomer Basic access, learning quests
11–25 Contributor Priority queue, badge display
26–50 Builder Beta features, voting rights
51–75 Expert Governance participation, bonus multipliers
76–100 Legend Network council, custom badges, max multipliers

Achievements (25+)

Unlock achievements by hitting milestones. Each achievement rewards credits and XP.

Achievement Requirement Reward
First Run Run your first model 50 credits
Network Pioneer Join distributed network 100 credits
Week Warrior 7-day contribution streak 200 credits
Diamond Contributor 10,000 GPU hours 5,000 credits
Fine-Tune Master Complete 50 fine-tuning jobs 2,500 credits

Streak Bonuses

Streak Earning Bonus
7 days 1.1× earnings
30 days 1.25× earnings
90 days 1.5× earnings
365 days 2.0× earnings

Check your progress anytime:

$ tutu progress