🎯 RLHF - liqihui02

Less-relevant results

Don't let the LLM speak, just probe it (8 minute read)

✍️Prompt Engineering Blog

blog.j11y.io··Hacker News

The week AI infrastructure crossed from a technology story to a financial one

✍️Prompt Engineering News

mlwhiz.com·

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

🤖recommendation systems, LLM, large langurage model

codehamr.com··r/SideProject

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

🤖recommendation systems, LLM, large langurage model Academic

arxiv.org·

Stack Overflow didn't just help AI learn to code

🤖reinforcement learning, deep learning, machine learning

zozo123.github.io··Hacker News

Researchers develop AI-powered railway control system for efficient urban train operation

🤖reinforcement learning, deep learning, machine learning

techxplore.com·

Posting for authoring

✍️Prompt Engineering

turingpost.com·

Alignment Defends LLMs from Property Inference Attacks

🤖reinforcement learning, deep learning, machine learning Academic

arxiv.org·

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

🤖reinforcement learning, deep learning, machine learning Code

github.com··r/SideProject

local AI agents for Cursor with pre-tuned marketplace/commu

✍️Prompt Engineering

locaible.com··Hacker News

My research agenda and work

🤖reinforcement learning, deep learning, machine learning

lesswrong.com·

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

🤖reinforcement learning, deep learning, machine learning Blog

aws.amazon.com·

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

✍️Prompt Engineering Academic

arxiv.org·

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

How ChatGPT Actually Works (Beginner Friendly)

Tracing Eval-Awareness Emergence Through Training of OLMo 3

Why LLMs (still) lack taste

How LLMs are Actually Trained

SLUUG Talk: Demystifying Large Language Models on Linux

Don't let the LLM speak, just probe it (8 minute read)

The week AI infrastructure crossed from a technology story to a financial one

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

Stack Overflow didn't just help AI learn to code

Researchers develop AI-powered railway control system for efficient urban train operation

Posting for authoring

Alignment Defends LLMs from Property Inference Attacks

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

local AI agents for Cursor with pre-tuned marketplace/commu

My research agenda and work

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization