🪟 Context Windows - bloknayrb · Scour

STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

🤖LLM Academic

markusheimerl/gpt: A generative pretrained transformer implementation

💬LLMs Code

github.com··Hacker News

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🤖LLM Academic

mingusb/transformer-golf: The Fully Unrolled Transformer: An experimental repository for architecture simplification and compilation. [2026]

💬LLMs Code

github.com··Hacker News

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

🤖LLM Academic

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖LLM Code

github.com··Hacker News

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🤖LLM Academic

Dynamic Linear Attention

💬LLMs Academic

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

💾Cognitive Offloading Academic

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

💬LLMs Academic

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

💬LLMs Academic

When Vision Misleads, Let Location Speak: A Worldwide Image Geo-Localization Method via Location Attention Mechanism and Large Multimodal Models

💬LLMs Academic

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

💬LLMs Academic

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

💬LLMs Academic

Still: Amortized KV Cache Compaction in a Single Forward Pass

💾Cognitive Offloading Academic

Beyond Patches: Superpixel Token-based Transformers for Attribute-Specific Fashion Retrieval

💬LLMs Academic

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

🤖LLM Academic

EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

🤖LLM Academic

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

💬LLMs Academic

A Four-Condition Diagnostic Protocol for Evidence Utilization in Long-Context and Retrieval-Augmented Language Models

🤖LLM Academic

Log in to enable infinite scrolling