Hi all,

I’m running ollama with Gemma 3 12b locally on my 4080 but I’d like to have my endpoint be a similar interface as OpenAI’s batch interface. I’m trying to do this with a wrapper around VLLM but I’m having issues.

I’m not super deep in this space and have been using agents to help me set everything up.

My use case is to send 200k small profiles to a recommendation engine and get 5-15 classifications on each profile.

Any advice on how to get this accomplished?

Currently the agents are running into trouble as they say the engine isn’t handling memory well. VLLM model support doesn’t list latest models for Gemma either.

Am I barking up the wrong tree? Any advice would be much appreciated

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help