✍️ Prompt Engineering - MarkGao · Scour

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

🤖Agentic Engineering Academic

Declarative Skills for AI Agents in Knowledge-Grounded Tool-Use Workflows

🤖AI Agents Academic

Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

🧠LLMs Academic

LLM-Guided Neural Architecture Search for Robust Co-Design of Physical Neural Networks

🧠LLMs Academic

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

🤖Agentic Engineering Academic

When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

🤖Agentic Engineering Academic

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

🤖Agentic Engineering Academic

Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

🤖Agentic Engineering Academic

Arithmetic Pedagogy for Language Models

🤖Agentic Engineering Academic

arxiv.org··Hacker News

VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents

⚙️AI Automation Academic

Quantum-Inspired Trace-Augmented Evidence Selection for Reasoning over Structured Hypothesis Spaces

⚛️Quantum Computing Academic

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

🧠LLMs Academic

Towards Autonomous Accelerator Design: FPGA Accelerator Generation with SECDA

🤖Agentic Engineering Academic

IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation

🤖Agentic Engineering Academic

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

🤖Agentic Engineering Academic

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

🧠LLMs Academic

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

🤖Multi-Agent Systems Academic

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

🤖Agentic Engineering Academic

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

🧠LLMs Academic

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

⚙️AI Automation Academic

Log in to enable infinite scrolling