📊 Model Evaluation - jasonvh · Scour

On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus

⚙️Software Engineering Academic

Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models

🧠LLMs Academic

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

✍️Prompt Engineering Academic

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

🧠LLMs Academic

More than a Judge: An Empirical Study of Agent-Human Interaction in Crowdsourced Testing Assessment

⚙️Software Engineering Academic

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

🤖AI Agents Academic

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

⚙️Software Engineering Academic

Contemporary AI lacks the imagination to diverge or negate in science

🤖AI Agents Academic

Less is MoE: Trimming Experts in Domain-Specialist Language Models

🧠LLMs Academic

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

🧠LLMs Academic

LLM Explainability with Counterfactual Chains and Causal Graphs

🧠LLMs Academic

Benchmark Everything Everywhere All at Once

💻AI Coding Academic

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

🤖AI Agents Academic

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

✍️Prompt Engineering Academic

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

✍️Prompt Engineering Academic

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

✍️Prompt Engineering Academic

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

🧠LLMs Academic

Evidence Markets

🌱Startups Academic

Log in to enable infinite scrolling