AI Infrastructure
Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving
🤖AI Inference Content type: Academiczhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability
💻Local LLMs Content type: CodeCodegenBench: Can LLMs Write Efficient Code Across Architectures?
⚡Hardware Acceleration Content type: AcademicNo more posts from nmarshall's subscribed feeds.