Docker Model Runner: รัน AI/LLM ในเครื่องด้วย OpenAI API

1. บทนำ: Docker Model Runner คืออะไร?

Docker Model Runner (DMR) เป็น feature ใหม่ใน Docker Desktop 4.40+ ที่ช่วยให้คุณสามารถ รัน AI/LLM models ในเครื่องได้ง่ายๆ ด้วย API ที่เข้ากันได้กับ OpenAI โดยไม่ต้องตั้งค่าให้ซับซ้อน

ก่อนหน้านี้ การรัน LLM ในเครื่องต้องติดตั้ง Ollama, llama.cpp, หรือ vLLM แยกกัน แต่ Docker Model Runner รวมทุกอย่างไว้ใน Docker เดียว ทำให้ เริ่มต้นใช้งานได้ภายในไม่กี่นาที

สถาปัตยกรรมของ Docker Model Runner

2. ทำไม Docker Model Runner ถึงน่าสนใจ?

Zero Setup

ไม่ต้องติดตั้ง Ollama, llama.cpp หรือ CUDA แยก เปิด Docker แล้วใช้ได้เลย

OpenAI Compatible

ใช้โค้ดเดิมที่เขียนสำหรับ OpenAI API ได้เลย เปลี่ยนแค่ base_url

ลด Cost

ไม่ต้องจ่ายค่า API call ให้ OpenAI หรือ cloud providers รันในเครื่องฟรี

Privacy First

ข้อมูลไม่ออกจากเครื่อง เหมาะสำหรับงานที่ต้องการความปลอดภัย

สถิติน่าสนใจปี 2025-2026

Local LLM usage เพิ่มขึ้น 400% จากปี 2024
Apple Silicon เป็น platform หลักสำหรับ AI development ในไทย
OpenAI API cost เฉลี่ย $0.002-0.03 per 1K tokens สำหรับ local เท่ากับ $0

3. ความต้องการระบบ

Platform	Hardware	Docker Version	Notes
macOS	Apple Silicon (M1/M2/M3/M4)	Desktop 4.40+	แนะนำ • Performance ดีสุด
Windows (amd64)	NVIDIA GPU + Driver 576.57+	Desktop 4.41+	ต้องมี NVIDIA GPU
Windows (arm64)	Qualcomm Adreno GPU (6xx+)	Desktop 4.41+	บาง feature อาจไม่รองรับ
Linux	CPU / NVIDIA (CUDA) / AMD (ROCm)	Engine only	ต้องตั้งค่า manual

ข้อควรระวัง

• Windows ต้องใช้ NVIDIA Driver เวอร์ชัน 576.57 ขึ้นไป
• Linux ต้องใช้ NVIDIA Driver 575.57.08+ สำหรับ GPU
• vLLM engine รองรับเฉพาะ Linux x86_64 และ Windows WSL2
• แนะนำ RAM 16GB+ สำหรับ models ขนาดใหญ่

4. การติดตั้ง

1

ติดตั้ง Docker Desktop

ดาวน์โหลดและติดตั้ง Docker Desktop เวอร์ชันล่าสุด:

Download Links

# macOS
https://desktop.docker.com/mac/main/arm64/Docker.dmg

# Windows
https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe

# Linux (Docker Engine only)
curl -fsSL https://get.docker.com | sh

2

ตรวจสอบ Docker Model Runner

หลังติดตั้ง Docker Desktop แล้ว ตรวจสอบว่า Model Runner พร้อมใช้งาน:

Terminal

# ตรวจสอบ Docker version
docker version

# ตรวจสอบ model command
docker model --help

# ถ้าไม่พบ command ให้สร้าง symlink (macOS)
ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model

3

เปิดใช้งาน Model Runner ใน Docker Desktop

เปิด Settings ใน Docker Desktop และไปที่ Features in development → เปิด Enable Docker Model Runner

Settings → Features in development → Enable Docker Model Runner

5. เริ่มต้นใช้งาน Docker Model Runner

ดาวน์โหลด AI Model

Docker Model Runner รองรับ models จาก Docker Hub, Hugging Face และ OCI registries:

Terminal - Pull Models

# Pull Qwen 2.5 Coder (แนะนำสำหรับ coding tasks)
docker model pull ai/qwen2.5-coder

# Pull Llama 3.2 (general purpose)
docker model pull ai/llama3.2

# Pull Mistral (fast and efficient)
docker model pull ai/mistral

# Pull จาก Hugging Face
docker model pull hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

ดู Models ที่มี

Terminal

# แสดง models ทั้งหมดที่ดาวน์โหลดแล้ว
docker model ls

# Output:
# REPOSITORY              TAG       SIZE      INFERENCE ENGINE
# ai/qwen2.5-coder       latest    4.7 GB    llama.cpp
# ai/llama3.2            latest    2.0 GB    llama.cpp

รัน Model และสนทนา

Terminal - Interactive Chat

# รัน interactive chat
docker model run ai/qwen2.5-coder

>>> สวัสดีครับ ช่วยเขียน Python function หาค่า factorial หน่อย
# Model จะตอบกลับมา...

# กด Ctrl+D เพื่อออกจาก chat

ตั้งค่า Context Size

สำหรับ conversations ที่ยาวขึ้น ให้เพิ่ม context size:

Terminal

# ตั้งค่า context size เป็น 8192 tokens
docker model configure --context-size 8192 ai/qwen2.5-coder

# ตั้งค่าเป็น 32768 สำหรับ context ที่ยาวมาก
docker model configure --context-size 32768 ai/llama3.2

6. API Integration

Docker Model Runner API Endpoints

Docker Model Runner ให้บริการ API ที่ localhost:12434 โดยรองรับทั้ง OpenAI และ Ollama format:

API Type	Endpoint	Description
OpenAI	/v1/chat/completions	Chat completion
OpenAI	/v1/models	List available models
Ollama	/api/chat	Ollama chat format
Ollama	/api/generate	Ollama generate format

ทดสอบด้วย cURL

Terminal - cURL Request

# OpenAI-compatible API call
curl http://localhost:12434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/qwen2.5-coder",
    "messages": [
      {"role": "user", "content": "เขียน Python function สำหรับหาค่า Fibonacci"}
    ]
  }'

# List available models
curl http://localhost:12434/v1/models

Python + OpenAI SDK

chat.py

import openai

# สร้าง client ที่ชี้ไป Docker Model Runner
client = openai.OpenAI(
    base_url="http://localhost:12434/v1",
    api_key="not-needed"  # Docker Model Runner ไม่ต้องใช้ API key
)

# ส่งข้อความ
response = client.chat.completions.create(
    model="ai/qwen2.5-coder",
    messages=[
        {"role": "system", "content": "คุณเป็นผู้ช่วยเขียนโปรแกรมภาษาไทย"},
        {"role": "user", "content": "ช่วยเขียน function สำหรับอ่านไฟล์ CSV หน่อย"}
    ],
    temperature=0.7,
    max_tokens=2000
)

print(response.choices[0].message.content)

Node.js + OpenAI SDK

chat.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/v1',
  apiKey: 'not-needed'
});

async function chat(message) {
  const response = await client.chat.completions.create({
    model: 'ai/qwen2.5-coder',
    messages: [
      { role: 'user', content: message }
    ]
  });
  
  return response.choices[0].message.content;
}

// ใช้งาน
const answer = await chat('อธิบาย async/await ใน JavaScript สั้นๆ');
console.log(answer);

7. การผสานรวมกับ Frameworks

LangChain Integration

LangChain สามารถเชื่อมต่อกับ Docker Model Runner ผ่าน OpenAI-compatible API:

langchain_chat.py

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

# สร้าง LLM ที่เชื่อมต่อ Docker Model Runner
llm = ChatOpenAI(
    base_url="http://localhost:12434/v1",
    api_key="not-needed",
    model="ai/qwen2.5-coder",
    temperature=0.7
)

# สร้าง chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "คุณเป็นผู้ช่วย DevOps ภาษาไทย"),
    ("user", "{input}")
])

chain = prompt | llm | StrOutputParser()

# ใช้งาน
response = chain.invoke({"input": "อธิบาย Kubernetes Deployment สั้นๆ"})
print(response)

Spring AI Integration (Java)

application.yml

spring:
  ai:
    openai:
      base-url: http://localhost:12434/v1
      api-key: not-needed
      chat:
        options:
          model: ai/qwen2.5-coder
          temperature: 0.7

ChatController.java

@RestController
@RequestMapping("/api/chat")
public class ChatController {
    
    @Autowired
    private ChatClient chatClient;
    
    @PostMapping
    public String chat(@RequestBody String message) {
        return chatClient.call(message);
    }
}

IDE Integration

Docker Model Runner รองรับ AI coding tools ยอดนิยม:

Cline

Continue

Cursor

Aider

8. Docker Compose Setup

ตัวอย่างการใช้ Docker Compose สำหรับแอปพลิเคชันที่ใช้ AI:

docker-compose.yml

services:
  # Application ที่ใช้ AI
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - OPENAI_API_BASE=http://model-runner:12434/v1
      - OPENAI_API_KEY=not-needed
    depends_on:
      - model-runner
    networks:
      - ai-network

  # Docker Model Runner
  model-runner:
    image: docker/model-runner:latest
    ports:
      - "12434:12434"
    environment:
      - MODEL=ai/qwen2.5-coder
    volumes:
      - model-cache:/root/.cache
    networks:
      - ai-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  # Open WebUI (Optional - ChatGPT-like interface)
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "8080:8080"
    environment:
      - OLLAMA_BASE_URL=http://model-runner:12434
    volumes:
      - open-webui-data:/app/backend/data
    networks:
      - ai-network

volumes:
  model-cache:
  open-webui-data:

networks:
  ai-network:
    driver: bridge

Docker Compose Architecture

9. แก้ไขปัญหาที่พบบ่อย

docker model command not found

ไม่พบคำสั่ง docker model

วิธีแก้: สร้าง symlink ไปยัง cli plugin

# macOS
ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model

# Linux
mkdir -p ~/.docker/cli-plugins
ln -s /usr/libexec/docker/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model

Model Runner not enabled

Model Runner ยังไม่เปิดใช้งาน

วิธีแก้:

เปิด Docker Desktop
ไปที่ Settings → Features in development
เปิด "Enable Docker Model Runner"
Restart Docker Desktop

Out of Memory (OOM)

RAM ไม่พอสำหรับ model ขนาดใหญ่

วิธีแก้:

เพิ่ม Docker memory limit ใน Settings → Resources
ใช้ quantized models (เช่น Q4_K_M, Q5_K_M)
เลือก model ขนาดเล็กกว่า (7B แทน 70B)
ลด context size: docker model configure --context-size 4096

GPU not detected (Windows)

ไม่พบ GPU บน Windows

วิธีแก้:

ตรวจสอบ NVIDIA Driver version: nvidia-smi
ต้องใช้ Driver 576.57 ขึ้นไป
ติดตั้ง/อัปเดต NVIDIA Driver จากเว็บไซต์ NVIDIA

10. สรุป

สิ่งที่คุณได้เรียนรู้

Docker Model Runner คืออะไร และทำไมถึงน่าสนใจ
ความต้องการระบบสำหรับ macOS, Windows และ Linux
วิธีติดตั้งและเริ่มต้นใช้งาน
การใช้ OpenAI-compatible API กับ Python และ Node.js
การผสานรวมกับ LangChain และ Spring AI
การตั้งค่า Docker Compose สำหรับ AI applications

ขั้นตอนถัดไป

• ลองใช้ models ต่างๆ เช่น Llama 3.2, Mistral, Phi-3
• สร้าง AI chatbot ด้วย LangChain + Docker Model Runner
• ผสานรวมกับ IDE tools เช่น Cursor, Continue
• Deploy เป็น microservice ใน production

1. บทนำ: Docker Model Runner คืออะไร?

สถาปัตยกรรมของ Docker Model Runner

2. ทำไม Docker Model Runner ถึงน่าสนใจ?

Zero Setup

OpenAI Compatible

ลด Cost

Privacy First

สถิติน่าสนใจปี 2025-2026

3. ความต้องการระบบ

ข้อควรระวัง

4. การติดตั้ง

ติดตั้ง Docker Desktop

ตรวจสอบ Docker Model Runner

เปิดใช้งาน Model Runner ใน Docker Desktop

5. เริ่มต้นใช้งาน Docker Model Runner

ดาวน์โหลด AI Model

ดู Models ที่มี

รัน Model และสนทนา

ตั้งค่า Context Size

6. API Integration

Docker Model Runner API Endpoints

ทดสอบด้วย cURL

Python + OpenAI SDK

Node.js + OpenAI SDK

7. การผสานรวมกับ Frameworks

LangChain Integration

Spring AI Integration (Java)

IDE Integration

8. Docker Compose Setup

Docker Compose Architecture

9. แก้ไขปัญหาที่พบบ่อย

docker model command not found

Model Runner not enabled

Out of Memory (OOM)

GPU not detected (Windows)

10. สรุป

สิ่งที่คุณได้เรียนรู้

ขั้นตอนถัดไป

แหล่งข้อมูลเพิ่มเติม