Recently was trying to finetune a Qwen2.5-3b-Instruct to have reasoning as well. But kept failing at creating a reasoning model. Trained it on 800 examples and at the end either got a model that would not generate thinking tokens or would additionaly start generating trash. Would highly appreciate someone explaining how its usually done, cuz after some paper reading - usually CoT is added via SFT of base models and in this case 800 examples 1 epoch might be too little.
Recently was trying to finetune a Qwen2.5-3b-Instruct to have reasoning as well. But kept failing at creating a reasoning model. Trained it on 800 examples and at the end either got a model that would not generate thinking tokens or would additionaly start generating trash. Would highly appreciate someone explaining how its usually done, cuz after some paper reading - usually CoT is added via SFT of base models and in this case 800 examples 1 epoch might be too little.