Skip to content
Tutorial emka
Menu
  • Home
  • Debian Linux
  • Ubuntu Linux
  • Red Hat Linux
Menu
qwen3 on AMD GPU MI200

How to Run Qwen (14B) on AMD MI200 with vLLM

Posted on February 3, 2026

If you are trying to run LLMs on AMD Instinct MI200 (MI210/MI250) cards, you have probably already experienced the pain of “HSA errors,” random segmentation faults, or containers that just hang forever.

We went through the struggle of finding the right Docker image so you don’t have to. Here is the definitive, battle-tested guide to running an OpenAI-compatible API for Qwen (14B) on ROCm.

The MI200 is a beast, but it uses the gfx90a architecture. Most “bleeding edge” Docker images today are optimized for the newer MI300 (gfx942). If you try to run the latest vLLM (0.11.x) with the default settings, it will crash because the new execution engine (aiter) isn’t fully compatible with MI200 yet.

We are going to use a stable setup that disables the experimental features and just runs fast.

Prerequisites

  • Host OS: Linux with ROCm kernel drivers installed (rocm-dkms).
  • Docker: Installed and running.
  • GPU: AMD Instinct MI200 series (MI210, MI250/X).

Don’t use latest. Don’t use 0.11.x. We are using vLLM 0.10.1 on ROCm 6.4. It provides the best balance of modern model support (like Qwen 2.5) and stability.

Copy and paste this exact command.

docker run -it --rm \
    --device /dev/kfd \
    --device /dev/dri \
    --group-add video \
    --ipc=host \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -p 48700:8000 \
    -e HUGGING_FACE_HUB_TOKEN="your_hf_token_here" \
    -e VLLM_USE_V1=0 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --name qwen-server \
    rocm/vllm:rocm6.4.1_vllm_0.10.1_20250909 \
    vllm serve Qwen/Qwen2.5-14B-Instruct \
    --dtype float16 \
    --gpu-memory-utilization 0.90 \
    --max-model-len 32768 \
    --tensor-parallel-size 1 \
    --host 0.0.0.0 \
    --port 8000

Screenshot

Why these flags matter

  • VLLM_USE_V1=0: This is the most important line. The new “V1” engine in vLLM crashes on MI200 when loading JIT kernels. We force the legacy engine (V0) for rock-solid stability.
  • --dtype float16: We don’t trust auto mode on ROCm containers. Explicitly telling it to use float16 prevents initialization stalls.
  • --security-opt seccomp=unconfined: AMD GPUs need direct memory access that Docker blocks by default. Without this, you get permission errors.
  • The Model Size (14B): We chose 14B because the 32B model (at float16) requires ~64GB of VRAM just for weights. On a single GPU, you’ll hit OOM (Out Of Memory) instantly once you add the KV cache. 14B sits in the “sweet spot”—fast, smart, and leaves room for context.

Step 2: Testing the API

Once the container says Application startup complete, your API is live on port 48700.

You can test it with curl. Note: Be careful with your JSON syntax! Use straight quotes ("), not curly smart quotes (“), or the API will throw a 400 Bad Request error.

Here is a test command with a complex System Prompt (as requested in our logs):

curl http://localhost:48700/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer any_token_is_fine" \
    -d '{
        "model": "Qwen/Qwen2.5-14B-Instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant. Please answer in JSON format."
            },
            {
                "role": "user",
                "content": "Are you running on an AMD GPU?"
            }
        ],
        "temperature": 0.7
    }'

Screenshot

Note: If you are using a custom model name (like Qwen3), make sure the "model" field in your JSON matches exactly what you passed in the Docker command.

Troubleshooting

1. It hangs at “Loading model weights…”

  • Cause: ROCm is compiling kernels (JIT) for your specific GPU.
  • Fix: Wait. On the very first run, this can take 2–5 minutes. Subsequent runs will be instant.

2. RuntimeError: Engine core initialization failed

  • Cause: You forgot VLLM_USE_V1=0.
  • Fix: Add the env var. The V1 engine tries to use aiter libraries optimized for MI300, which segfault on MI200.

3. HIP out of memory

  • Cause: Your model is too fat.
  • Fix: If you absolutely need a 32B or 70B model, you must use Quantization. Change the docker command to use an AWQ model:Bashvllm serve Qwen/Qwen2.5-32B-Instruct-AWQ --quantization awq

Recent Posts

  • How to Add Password Protection to GRUB
  • Linux Kernel Hardening: Command-line Lockdown
  • Make Linux Kernel More Safe and Hardening with Sysctl Easy Way
  • How to Lockdown Root & Wheel Group in Linux
  • How to Secure Sudo in Linux (Secure Sudo Logging & Timeout)
  • Make Fedora Login Safe with Authselect and Faillock
  • How Measure Linux Security Use OpenSCAP Lynis and Systemd
  • SELinux Make Nginx Break and How to Fix It Easy
  • How See Hidden SELinux Errors When Your Server Is Broken
  • How Fix SELinux Port Denied Error With Sealert Easy Guide
  • Read SELinux AVC Denial Log Simple Guide for Noob
  • How Check and Fix SELinux Block Things in Fedora Linux
  • How Actually SELinux is Work?
  • How to Install Elementary OS 8 Easy and Make It Good
  • How to Install UniFi OS Server on Ubuntu Linux Without Cloud Key
  • Top DNF5 Tips to Make Your Fedora Linux Super Fast
  • Run Local AI on Fedora 44 CPU Without Expensive GPU
  • Google Gemini Live Redesign: Works with more ‘Connected Apps’ on Android
  • A new LILYGO T3S3 ESP32-S3 with LoRA, WiFi & Bluetooth is Released only $16
  • New ESP32 Project: OpenTrafficMap ESP32-C5 C-ITS With 802.11p V2X communication
  • How to Unlock the Hidden Potential of Your Kindle with Amazing Community Plugins
  • How to Use Waze with Android Auto for the Ultimate Driving Experience
  • How to Transform Your GNOME Desktop with GNOME Prism
  • Why Your Google Maps Wear OS Navigation Fails While Using Android Auto
  • Packagist Attacked! How to Detect Hidden Malware Like This?
  • Cara Atasi USB Error dengan Update USB Root Hub dan Chipset Driver
  • Inilah Cara Mengatasi Unknown USB Device Descriptor Request Failed yang Paling Ampuh
  • Inilah 20 Kampus Swasta Terbaik di Bandung Versi EduRank 2026 untuk Referensi Kuliah Kalian
  • Inilah Syarat dan Cara Daftar Sekolah Kedinasan STPN 2026, Kuota Terbatas!
  • Inilah Cara Daftar PPKB UI 2026 Lengkap dengan Rincian Uang Pangkal Semua Jurusan S1
  • How to Automate Your Entire SEO Strategy Using a Swarm of 100 Free AI Agents Working in Parallel
  • How to create professional presentations easily using NotebookLM’s AI power for school projects and beyond
  • How to Master SEO Automation with Google Gemini 3.1 Flash-Lite in Google AI Studio
  • How to create viral AI video ads and complete brand assets using the Claude and Higgsfield MCP integration
  • How to Transform Your Mac Into a Supercharged AI Assistant with Perplexity Personal Computer
RSS Error: WP HTTP Error: A valid URL was not provided.
©2026 Tutorial emka | Design: Newspaperly WordPress Theme