Tool-Calling Agents on Laptop Intel Arc GPUs: Dockerizing Qwen3-8B with Ipex-LLM
yourlabs.org·4h·
Discuss: Hacker News
Flag this post

This guide demonstrates how to launch an Intel ipex-llm Docker container, start a vLLM API server configured for the Qwen3-8B model with tool-calling capabilities, query the server using curl, and interpret a sample response.

This setup enables running large language models (LLMs) on Intel XPUs with features like automatic tool choice and reasoning parsing. All commands assume a Linux environment with Docker installed and access to Intel hardware (e.g., via /dev/dri).

1. Download the Model on the Host

Before entering the container, download the Qwen3-8B model to your host’s Hugging Face cache directory using the huggingface-cli. This ensures the model is pre-fetched and available when the container mounts the cache volume, speeding up the server startup.

huggingface...

Similar Posts

Loading similar posts...