Model Evaluation

Feeds to Scour
SubscribedAll
Scoured 93 posts in 7.6 ms

MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models

 🔬Hallucination Detection  Content type: Academic
arxiv.org·

Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese

 🛡LLM safety  Content type: Academic
arxiv.org·

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

 🛡LLM safety  Content type: Academic
arxiv.org·

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

 🧬Embeddings  Content type: Academic
arxiv.org·

Sample-Efficient LLM-Based Detection of Malicious Web Server Logs with Forensically Explainable Reasoning

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Data-Driven Runway and Taxiway Exits Prediction of Landing Aircraft: A Case Study at Hartsfield-Jackson Atlanta International Airport

 🎛️Fine-Tuning  Content type: Academic
arxiv.org·

Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation

 🛡️Red Teaming  Content type: Academic
arxiv.org·

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

 💭Context Management  Content type: Academic
arxiv.org·

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

 🎛️Fine-Tuning  Content type: Academic
arxiv.org·

Aggregating LLM-Based Weak Verifiers for Spatial Layout Generation

 🎯AI Alignment  Content type: Academic
arxiv.org·

MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models

 🤖AI  Content type: Academic
arxiv.org·

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

 🤖AI  Content type: Academic
arxiv.org·

Improving Answer Extraction in Context-based Question Answering Systems Using LLMs

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

FusionVul: A Multimodal Feature Fusion Framework for Source Code Vulnerability Detection

 🛡️Red Teaming  Content type: Academic
arxiv.org·

Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach

 🎛️Fine-Tuning  Content type: Academic
arxiv.org·

Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Network for Detecting Disfluency in Children's Speech via Multiscale Acoustic Fusion

 🔬Hallucination Detection  Content type: Academic
arxiv.org·

ATTAIN: Automated Exploit Failure Analysis through Trace-Driven Diff Analysis

 🛡️Red Teaming  Content type: Academic
arxiv.org·

Deep Learning-assisted AMD Staging based on OCT and OCT Angiography

 🔬Hallucination Detection  Content type: Academic
arxiv.org·

Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder

 🤖AI  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help