LLM Fine-tuning Providers. Customize Large Language Models on… | by Xin Cheng

Customize Large Language Models on Cloud, Azure, GCP, AWS

5 min readJust now

–

LLM fine-tuning needs more consideration when done it in scale: infra/hardware/training data, training environment, base model, cost. Let’s see how cloud providers offer.

Azure

Chat completion FT

training data:

The system message
The user message
The assistant’s response

{"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox bett...

Customize Large Language Models on Cloud, Azure, GCP, AWS

5 min readJust now

–

LLM fine-tuning needs more consideration when done it in scale: infra/hardware/training data, training environment, base model, cost. Let’s see how cloud providers offer.

Azure

Chat completion FT

training data:

The system message
The user message
The assistant’s response

{"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}

fine-tuning job

Select a base model.
Select your training data.
(Optional) Select your validation data.
Configure the advanced options. (batch_size, learning_rate_multiplier, n_epochs, seed)

Python-based FT using huggingface transformers + PEFT/QLORA + docker

Steps on using customized docker environment (Dockerfile, requirements.txt)

Use a custom container to deploy a model to an online endpoint

Register model on Azure ML
Create custom docker image and store in ACR
Create Azure ML Environment from custom docker image
Create online endpoint and deployment

Curated models from Hugging Face Hub

Supported model path

GCP

Training data preparation
Approach: full vs PEFT
Training (hyperparameters like learning rate, batch size, and number of epochs)
Performance evaluation and deployment

AWS

Main technologies: Sagemaker, Bedrock, notebook

run training with the Hugging Face estimator

SageMaker JumpStart in essence is SageMaker’s Model Zoo. For deploying cohere foundation model, you can use cohere-sagemaker SDK, which further simplifies the deployment process as a wrapper around the usual SageMaker Inference constructs (SageMaker Model, SageMaker Endpoint Configuration, and SageMaker Endpoint).

Amazon SageMaker Jumpstart empowers you to host your own machine learning models, allowing you to choose infrastructure components such as instance sizes and deployment endpoints. In contrast, Amazon Bedrock is a fully managed service provided by Amazon that enables you to make API calls to access models hosted on AWS.

Other

Multi-GPU & multi-node support

Customize Large Language Models on Cloud, Azure, GCP, AWS

Azure

Customize Large Language Models on Cloud, Azure, GCP, AWS

Azure

GCP

AWS

Other

Appendix

Similar Posts