Fine-tuning

Feeds to Scour
SubscribedAll
Scoured 203 posts in 9.9 ms

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

 🎯RLHF  Content type: Academic
arxiv.org·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

 🎯RLHF  Content type: Academic
arxiv.org·

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Distilling Safe LLM Systems via Soft Prompts for On Device Settings

 💬LLMs  Content type: Academic
arxiv.org·

AuRA: Internalizing Audio Understanding into LLMs as LoRA

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

On the Geometry of On-Policy Distillation

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

 🎯RLHF  Content type: Academic
arxiv.org·

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

 💬LLMs  Content type: Academic
arxiv.org·

Alignment Defends LLMs from Property Inference Attacks

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families

 💬LLMs  Content type: Academic
arxiv.org·

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Post-training is (Massive) Supervised Learning

 Transformers  Content type: Academic
arxiv.org·

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

 🎯RLHF  Content type: Academic
arxiv.org·

Defending Against Malicious Finetuning by Scaling Train-time Adversarial Attacks

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

 🎯RLHF  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help