This guide demonstrates how to launch an Intel ipex-llm Docker container, start a vLLM API server configured for the Qwen3-8B model with tool-calling capabilities, query the server using curl, and interpret a sample response.

This setup enables running large language models (LLMs) on Intel XPUs with features like automatic tool choice and reasoning parsing. All commands assume a Linux environment with Docker installed and access to Intel hardware (e.g., via /dev/dri).

1. Download the Model on the Host

Before entering the container, download the Qwen3-8B model to your host’s Hugging Face cache directory using the huggingface-cli. This ensures the model is pre-fetched and available when the container mounts the cache volume, speeding up the server startup.

huggingface...

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help