TuTu Engine Documentation

Everything you need to run AI locally and participate in the TuTu distributed supercomputer network.

Installation

macOS

$ curl -fsSL https://tutuengine.tech/install.sh | sh

Linux

$ curl -fsSL https://tutuengine.tech/install.sh | sh

Supports x86_64 and ARM64. The installer auto-detects your architecture.

Windows

PS> irm tutuengine.tech/install.ps1 | iex

Or install via WinGet:

PS> winget install tutu-network.tutu

Build from Source

$ git clone https://github.com/NikeGunn/tutu.git $ cd tutu $ go build -o tutu ./cmd/tutu $ ./tutu --version

Requires Go 1.24+. No CGO dependencies.

Verify Installation

$ tutu --version tutu version 1.0.0 (go1.22.0 linux/amd64)

Quick Start

Run your first AI model in 30 seconds:

# Start the TuTu daemon
        $ tutu serve

        # Run a model (auto-downloads if needed)
        $ tutu run llama3.2

        # Chat!
        >>> What is the capital of France?
        The capital of France is Paris.
      

Tip: TuTu automatically downloads models on first use. Run tutu pull llama3.2 to pre-download without starting a session.

Upgrading

Re-run the install command for your platform. TuTu handles migrations automatically. You can also check the GitHub Releases page for changelogs.

# macOS (Homebrew)
        $ brew upgrade tutu

        # Linux / macOS (script)
        $ curl -fsSL https://tutuengine.tech/install.sh | sh

        # Windows (PowerShell)
        PS> irm tutuengine.tech/install.ps1 | iex
      

CLI Reference

TuTu provides a single binary with all commands built in. Run tutu help for a quick overview or tutu <command> --help for details.

Command	Description
`tutu run`	Run a model interactively
`tutu pull`	Download a model without running
`tutu list`	List downloaded models
`tutu create`	Create a model from a TuTufile
`tutu show`	Show model details & metadata
`tutu rm`	Remove a downloaded model
`tutu serve`	Start the API server daemon
`tutu ps`	List running models
`tutu stop`	Stop a running model

tutu run

Run a model interactively. Downloads automatically if not present locally.

$ tutu run <model>
        [flags]

        # Examples
        $ tutu run llama3.2 # interactive chat
        $ tutu run llama3.2 --verbose # show
          debug output
        $ tutu run phi3 --nowordwrap # disable
          word wrap
      

tutu pull

Download a model from the TuTu registry without starting a session.

$ tutu pull llama3.2 pulling manifest... done pulling 8934d96d3f08... 100% ████████████ 750 MB success

tutu list

List all locally downloaded models.

$ tutu list NAME SIZE MODIFIED llama3.2:latest 4.7 GB 2 hours ago phi3:latest 2.3 GB 1 day ago mistral:latest 4.1 GB 3 days ago

tutu create

Create a custom model from a TuTufile.

$ tutu create my-assistant -f ./TuTufile transferring model data... creating model layer... done success

tutu show

Display model details including parameters, system prompt, and template.

$ tutu show llama3.2

tutu rm

Remove a downloaded model and free disk space.

$ tutu rm llama3.2 deleted 'llama3.2'

tutu serve

Start the TuTu API server. This runs as a background daemon.

$ tutu serve TuTu is running on http://localhost:11434

tutu ps

List all currently running models.

$ tutu ps NAME SIZE PROCESSOR UNTIL llama3.2:latest 4.7 GB 100% GPU 4 minutes from now

tutu stop

Stop a running model and release resources.

$ tutu stop llama3.2

REST API

TuTu exposes a REST API on http://localhost:11434 when the server is running. All endpoints accept and return JSON.

Method	Endpoint	Description
`POST`	`/api/generate`	Generate a completion
`POST`	`/api/chat`	Chat with a model
`POST`	`/api/embeddings`	Generate embeddings
`POST`	`/api/create`	Create a model
`GET`	`/api/tags`	List local models
`POST`	`/api/pull`	Pull a model
`DELETE`	`/api/delete`	Delete a model

Generate Completion

curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Why is the sky blue?", "stream": false }'

Chat

curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "Hello!" } ], "stream": false }'

OpenAI Compatibility

TuTu provides an OpenAI-compatible API at /v1/ so every SDK and tool that works with OpenAI works with TuTu — just change the base URL.

Python (OpenAI SDK)

from openai import OpenAI

        client = OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="tutu" # any string works
        )

        resp = client.chat.completions.create(
        model="llama3.2",
        messages=[{"role": "user", "content":
        "Hello!"}]
        )
        print(resp.choices[0].message.content)
      

JavaScript (fetch)

const resp = await fetch("http://localhost:11434/v1/chat/completions", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
        model: "llama3.2",
        messages: [{ role: "user", content: "Hello!" }]
        })
        });
        const data = await resp.json();
        console.log(data.choices[0].message.content);
      

TuTufile Syntax

A TuTufile defines a custom model — think of it as a Dockerfile for AI models. It lets you set a base model, system prompt, parameters, and more.

# TuTufile — Custom coding
          assistant
        FROM llama3.2

        SYSTEM """
          You are a senior software engineer. You write clean,
          well-documented code. You explain your reasoning step by step.
          """

        PARAMETER temperature 0.7
        PARAMETER top_p 0.9
        PARAMETER num_ctx 4096
      

Instruction	Description
`FROM`	Base model to build from (required)
`SYSTEM`	Set the system prompt
`PARAMETER`	Set model parameters (temperature, top_p, etc.)
`TEMPLATE`	Custom prompt template (Go template syntax)
`ADAPTER`	Path to a LoRA adapter file
`LICENSE`	Specify the model license
`MESSAGE`	Add few-shot example messages

Build and run your custom model:

$ tutu create my-coder -f ./TuTufile $ tutu run my-coder

Configuration

TuTu is configured via environment variables. No config files needed — sensible defaults for everything.

Variable	Default	Description
`TUTU_HOST`	`127.0.0.1`	API server bind address
`TUTU_PORT`	`11434`	API server port
`TUTU_MODELS`	`~/.tutu/models`	Model storage directory
`TUTU_GPU_LAYERS`	`auto`	Number of GPU layers (auto-detects)
`TUTU_NUM_PARALLEL`	`1`	Max parallel requests
`TUTU_MAX_LOADED`	`1`	Max models loaded simultaneously
`TUTU_KEEP_ALIVE`	`5m`	Idle model unload timeout
`TUTU_DEBUG`	`false`	Enable debug logging

Note: When running in production, set TUTU_HOST=0.0.0.0 to allow external connections. Be sure to configure your firewall appropriately.

Distributed Computing Network

When your machine is idle, TuTu can optionally contribute compute to the global TuTu Network — the world's first truly distributed AI supercomputer. Every participating node makes AI inference faster and cheaper for the entire network.

How It Works

Idle Detection: TuTu monitors your system and only uses compute when your machine is genuinely idle.
Privacy: Your data never leaves your machine. Network tasks are sandboxed and isolated.
Opt-in: Network participation is completely optional. Local AI works perfectly without it.
Reward: You earn credits for every compute cycle you contribute.

Credit System

The TuTu credit system powers the distributed network economy:

Earn credits by contributing idle GPU time to the network.
Spend credits to access more powerful models or faster inference on the network.
500 free credits are given to every new user, plus 30 days before any prompt to contribute.
Local AI is always free — credits only apply to network-powered features.

Coming Soon: The credit system and distributed network features are under active development. Local AI functionality is fully available today.

MCP Server (Model Context Protocol)

TuTu Engine implements a full MCP Gateway following the Model Context Protocol 2025-03-26 specification. MCP is an open standard for connecting AI models to external tools, data sources, and services — think of it as USB-C for AI.

How MCP Works

When you run tutu serve, the MCP endpoint is automatically available at /mcp. Any MCP-compatible AI client (Claude, ChatGPT, custom apps) can connect and use the tools and resources TuTu exposes.

# Initialize MCP session curl -X POST http://localhost:11434/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": { "protocolVersion": "2025-03-26", "clientInfo": {"name": "my-app", "version": "1.0"} } }'

Available MCP Tools

Tool	Description	Parameters
`tutu_run`	Run a model with a given prompt	`model`, `prompt`
`tutu_list`	List available local models	None
`tutu_pull`	Download a model from registry	`model`
`tutu_status`	Get system and model status	None

Available MCP Resources

Resource URI	Description
`tutu://models`	List of all installed models
`tutu://status`	System status and health info
`tutu://credits`	Credit balance and earnings

Enterprise MCP Use Cases

TuTu's MCP Gateway enables powerful enterprise AI integrations:

Use Case	How It Works
AI Coding Assistants	Connect your IDE's AI (Cursor, Copilot, Continue.dev) to local models via MCP tools. AI can run models, check status, and manage models — all through the standard MCP protocol.
Customer Support Bots	Give AI agents access to local inference without cloud API costs. Use MCP resources to monitor model availability and queue depth.
Data Analysis Pipelines	Embed TuTu as the AI layer in your data pipeline. MCP tools let orchestrators (LangChain, AutoGen) call models with structured inputs.
DevOps & Automation	AI agents use MCP to run inference tasks, pull models, and check system health — all with rate limiting and SLA guarantees.
Multi-Model Orchestration	Use the `tutu_list` and `tutu_run` tools to dynamically select and route to the best model for each task.

MCP Endpoints

Method	Endpoint	Description
`POST`	`/mcp`	MCP JSON-RPC 2.0 endpoint (Streamable HTTP transport)

Supported MCP Methods

Method	Description
`initialize`	Initialize MCP session, negotiate capabilities
`tools/list`	List available MCP tools
`tools/call`	Execute an MCP tool
`resources/list`	List available MCP resources
`resources/read`	Read an MCP resource
`prompts/list`	List available prompt templates

SLA Tiers

TuTu's MCP Gateway supports 4 SLA tiers for different usage levels:

Tier	Rate Limit	Burst	Latency Target	Price
Free	10 req/min	20	Best effort	$0
Pro	100 req/min	200	< 500ms	Credits
Business	1,000 req/min	2,000	< 200ms	Credits
Enterprise	10,000 req/min	20,000	< 100ms	Credits

Local Fine-Tuning

Fine-tune models on your own hardware using TuTufile. Define adapters, system prompts, and training parameters in one declarative file.

# TuTufile for fine-tuned customer
          support model
        FROM llama3
        SYSTEM "You are an expert customer support agent for Acme Corp. Be
          helpful, concise, and professional."
        ADAPTER ./my-lora-weights
        PARAMETER temperature 0.7
        PARAMETER top_p 0.9
      

$ tutu create support-bot -f Tutufile $ tutu run support-bot

Cost: Local fine-tuning is completely free. You use your own hardware and compute.

Distributed Fine-Tuning

Submit fine-tuning jobs to the TuTu network. Tasks are distributed across capable peers and you pay with credits.

$ tutu agent finetune \ --base-model llama3 \ --dataset ./training-data.jsonl \ --method lora \ --epochs 3 \ --budget 100 # credits

How Distributed Fine-Tuning Works

Submit a Job: Define your base model, training dataset, method (LoRA/QLoRA), and credit budget.
Network Distribution: TuTu's ML scheduler finds capable peers with the right GPU hardware and distributes training shards.
Progress Tracking: Monitor training progress in real-time via tutu agent status.
Result Delivery: The trained adapter weights are returned to you. Merge them into your model with tutu create.

Fine-Tuning Methods & Costs

Method	VRAM Required	Speed	Quality	Credit Cost
Full Fine-Tune	48GB+	Slow	Best	High
LoRA	8GB+	Fast	Great	Medium
QLoRA	4GB+	Fast	Good	Low
Adapter Merging	4GB+	Instant	Good	Free

Tip: You can earn credits by contributing your GPU time for other users' fine-tuning jobs. Run tutu agent join to start earning.

Engagement System

TuTu Engine includes a full gamification system to reward and retain contributors.

Level System (L1–L100)

Level Range	Title	Perks
1–10	Newcomer	Basic access, learning quests
11–25	Contributor	Priority queue, badge display
26–50	Builder	Beta features, voting rights
51–75	Expert	Governance participation, bonus multipliers
76–100	Legend	Network council, custom badges, max multipliers

Achievements (25+)

Unlock achievements by hitting milestones. Each achievement rewards credits and XP.

Achievement	Requirement	Reward
First Run	Run your first model	50 credits
Network Pioneer	Join distributed network	100 credits
Week Warrior	7-day contribution streak	200 credits
Diamond Contributor	10,000 GPU hours	5,000 credits
Fine-Tune Master	Complete 50 fine-tuning jobs	2,500 credits

Streak Bonuses

Streak	Earning Bonus
7 days	1.1× earnings
30 days	1.25× earnings
90 days	1.5× earnings
365 days	2.0× earnings

Check your progress anytime:

$ tutu progress