🤖 AI - machacek.vitek

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🔀Concurrency Code

github.com··Hacker News

My research agenda and work

🤖LLMs

lesswrong.com·

Phantom transitions in language model fine-tuning

🤖LLMs Academic

arxiv.org·

I open-sourced my UFC prediction model, code, and database after 5 years of work

⚡Performance

mcinerney.ai··Hacker News

The Surprising Truth About AI-Native Semantic Layers

🤖LLMs Blog

motherduck.com·

google/gemma-4-12B-it-qat-q4_0-gguf

🤖LLMs

huggingface.co·

NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...

⚡Performance

digg.com·

techjarves/Portable-AI-USB: A 100% offline, fully portable, zero-trace AI (Ollama + Llama 3 + AnythingLLM) that runs natively from a USB drive on Windows and Mac.

🤖LLMs Code

github.com·

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

🤖LLMs

techradar.com

Anthropic: Claude Now Writes 80% of Its Own Code in 2026

🤖LLMs Blog

wowhow.cloud··DEV

What an LLM Actually Does With Your Prompt First

🤖LLMs

siliconopera.com·

STAT+: AI titans push Congress for DNA safeguards

🤖LLMs

statnews.com·

NVIDIA's Cosmos 3: The World's First Fully Open AI Omnimodel

🤖LLMs News

aimagazine.com·

Towards Robust Arabic Speech Emotion Recognition with Deep Learning

🤖LLMs Academic

arxiv.org·

Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none

⚡Performance Reference

docs.github.com··DEV

What Does Abliteration Actually Cost?

🤖LLMs

lesswrong.com·

Wall Attention: Length Generalization With Diagonal Gates | Tilde

🤖LLMs Blog

blog.tilderesearch.com·

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

⚡Performance News

digg.com··Hacker News

Introducing the Third Generation of Apple’s Foundation Models

Show HN: LLM memory without context bleed; 100% precision vs. <10% vector search

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

My research agenda and work

Phantom transitions in language model fine-tuning

I open-sourced my UFC prediction model, code, and database after 5 years of work

The Surprising Truth About AI-Native Semantic Layers

google/gemma-4-12B-it-qat-q4_0-gguf

NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...

techjarves/Portable-AI-USB: A 100% offline, fully portable, zero-trace AI (Ollama + Llama 3 + AnythingLLM) that runs natively from a USB drive on Windows and Mac.

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

Anthropic: Claude Now Writes 80% of Its Own Code in 2026

What an LLM Actually Does With Your Prompt First

STAT+: AI titans push Congress for DNA safeguards

NVIDIA's Cosmos 3: The World's First Fully Open AI Omnimodel

Towards Robust Arabic Speech Emotion Recognition with Deep Learning

Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none

What Does Abliteration Actually Cost?

Wall Attention: Length Generalization With Diagonal Gates | Tilde

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM