📝 Prompt Engineering - gruggiero · Scour

AuRA: Internalizing Audio Understanding into LLMs as LoRA

🧠LLMs Academic

When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation

⚡Effect Systems Academic

Benchmarking and Exploring the Capabilities of LLMs for Attack Investigations

📏Model Evaluation Academic

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

🧠LLMs Academic

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

📊ML Research Academic

IDP-Bench: Benchmarking ability of LLMs to protect personal information in interdependent privacy contexts

✅TLA+ Academic

Distilling Safe LLM Systems via Soft Prompts for On Device Settings

✅TLA+ Academic

"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory

🤖LLM Agents Academic

Are Large Language Models Suitable for Graph Computation? Progress and Prospects

🧠LLMs Academic

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

💻AI Coding Academic

Defending Jailbreak Attacks on Large Language Models via Manifold Trajectory Kinetics

✅TLA+ Academic

Detecting Differences Is Not Understanding Structure: Large Language Models Fail at Graph Isomorphism

🧠LLMs Academic

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

🧠LLMs Academic

Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for Large Language Models in Long-Tail Educational Scenarios

🔄Agentic Workflows Academic

Phun-Bench: Evaluating LLMs on Phonological Understanding in Chinese

🧠LLMs Academic

Cross-LLM Consistency in Inference: Evidence from Shared Interactions

🧠LLMs Academic

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

🧠LLMs Academic

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

📏Model Evaluation Academic

IR3DE: A Linear Router for Large Language Models

📊ML Research Academic

QBugLM: An Agentic Benchmarking Framework for LLM-based Quantum Software Debugging

🔄Agentic Workflows Academic

Sign up or log in to see more results

Log in to enable infinite scrolling