Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
⚡High Performance Computing
Flag this post
What I learned building Python notebooks to run any AI model (LLM, Vision, Audio) — across CPU, GPU, and NPU
⚡High Performance Computing
Flag this post
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
arxiv.org·23h
⚡High Performance Computing
Flag this post
AI Function Calling: Composing and Decomposing Functions for Complex Tasks
💻Programming
Flag this post
Topographical sparse mapping: A training framework for deep learning models
⚡High Performance Computing
Flag this post
I'm the author of LocalAI (the local OpenAI-compatible API). We just released v3.7.0 with full Agentic Support (tool use!), Qwen 3 VL, and the latest llama.cpp
💻Programming
Flag this post
Show HN: ReadMyMRI DICOM native preprocessor with multi model consensus/ML pipes
⚡High Performance Computing
Flag this post
I Taught an AI to Dream
🏛️Software Architecture Patterns
Flag this post
Show HN: Oodle – Unified Debugging with OpenSearch and Grafana
🏛️Software Architecture Patterns
Flag this post
Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving
arxiv.org·23h
💻Programming
Flag this post
Loading...Loading more...