Serving Machine Learning Models as FastAPI Endpoints

Machine learning models are powerful tools for making predictions and uncovering patterns in data. However, if these models are confined to a Jupyter notebook or a local script, their real-world usefulness will be limited. In most practical scenarios, we want our models to interact with other applications, websites, or mobile apps providing predictions in real time. This is where APIs (Application Programming Interfaces) come into play.

An API acts as a bridge allowing different software systems to communicate with each other. By serving our machine learning model as an API, we enable external programs to send input data to your model and receive predictions instantly. This approach transforms your trained model from a static artifact into a dynamic, usable service, opening the door to automation, real time analytics, and integration with broader software ecosystems. Hence, serving our model via an API is the first step toward making machine learning accessible and actionable in real world applications.

Let’s say, I have trained a machine learning model to predict rental prices for properties in Colombo. The dataset includes features like property size in square feet, number of bedrooms and bathrooms, neighborhood, furnishing status, building type, property age, and distance to the city center.

After exploring and preprocessing the data, I decide to build a robust model using an ensemble approach. In this example, I combined a Huber Regressor with a RANSAC Regressor using a scikit-learn Pipeline and VotingRegressor. This setup ensures that both regressors contribute to the final prediction, weighted according to their performance. Once the model is trained on the training data, it is evaluated on the test set using metrics like R² and RMSE. Once satisfied with the performance, the model is ready to be saved for later use.

You can access complete code here!

Once the model performs well, we save it so we can reuse it later without retraining. Saving a model allows us to deploy it in applications or serve it via an API. In Python, there are two popular ways to save models: joblib and pickle.

Joblib: Optimized for scikit-learn pipelines and large NumPy arrays. It is fast, efficient, and handles memory heavy models well.
Pickle: A general-purpose Python serialization library that can save almost any Python object. It is slightly slower for large arrays but versatile for smaller or custom objects.

For scikit-learn models like ours, joblib is the preferred choice. Saving the model with joblib preserves the entire pipeline, including preprocessing steps and trained regressors, which can then be loaded later to make predictions immediately. For example, the model is saved with:

joblib.dump(final_model, "saved_rental_price_model.joblib")

This file can then be loaded in another Python script or API to generate predictions on new data. But what does it really mean to save a model? When we save a model using joblib or pickle, Python takes the entire trained model object including the algorithm, the learned coefficients, any preprocessing steps like scaling or encoding, and even the structure of the pipeline and converts it into a binary file.

When we load this file back, Python reads the binary data and rebuilds the exact same model object in memory, ready to make predictions. We don’t need to retrain it, the model remembers everything it learned before.

So essentially, saving a model doesn’t just store the code. It stores the knowledge the model gained from the training data in a reusable and portable way. This makes it easy to deploy the model in a web application, API, or any other environment without starting from scratch.

Serving our Model as an API with FastAPI

After training and saving a machine learning model, the real value comes when it can be used in real world applications. To make predictions accessible to other systems, websites, or users in real time, we need to serve the model via an API. This is where FastAPI comes in. A modern, fast and easy to use Python framework that allows us to turn our trained ML model into a production ready API.

What is FastAPI?

FastAPI is a modern, high performance web framework for building APIs with Python. Unlike traditional web frameworks that focus mainly on creating websites, FastAPI is designed specifically for building APIs that allow software applications to communicate with each other.

FastAPI can handle many requests simultaneously without slowing down, making it ideal for deploying machine learning models, real time applications, and microservices. Its combination of speed, simplicity, and scalability has made it one of the fastest growing frameworks in the Python ecosystem.

Why FastAPI is Popular for Machine Learning

One of the main reasons FastAPI is popular in the machine learning community is its simplicity and performance. With just a few lines of code, you can turn a Python function into a fully functional API endpoint. FastAPI automatically handles,

Data validation ensuring the input matches the expected format.
Serialization and deserialization converting Python objects to JSON and vice versa.
Interactive documentation via Swagger UI, allowing you to test API endpoints directly in the browser.

Serving Model as API with FastAPI

Using FastAPI, we can expose our trained machine learning model as a web service. This allows any client like a website, mobile app, or another program to send input data to the API and receive predictions in real time. Simply FastAPI acts as a bridge between the trained model and the outside world turning a static model into a live usable service.

We have already saved our model as joblib. The next step is to make it accessible as an API so that other applications or users can get predictions in real time. Here’s how we can do that using FastAPI.

Step 1: Import Required Libraries

from fastapi import FastAPI from pydantic import BaseModel import joblib import pandas as pd

FastAPI: This is the web framework we are using to create our API.
Pydantic: It helps validate incoming data automatically to make sure it matches the expected format.

Note: Pydantic helps ensure that the data coming into our API is in the correct format. For example, if our model expects a number for age or a string for a name, Pydantic will automatically check the incoming data and give an error if it doesn’t match the expected type. This makes our API more reliable and safe, preventing incorrect data from breaking your model.

Joblib: This loads the saved model from disk.
Pandas: We use it to convert input data into a DataFrame, which is what our scikit-learn pipeline expects.

Step 2: Initialize FastAPI

app = FastAPI()

This creates a FastAPI app object. All our API endpoints will be defined on this object. This is the main entry point for our web service.

Step 3: Define Input Data Structure

class model_input(BaseModel): Size_in_Sqft: int Bedrooms: int Bathrooms: int Neighborhood: str Furnished: str Building_Type: str Property_Age: int Distance_to_City_Center: float

Here, we define a Pydantic model to describe the expected input for our API.

Each attribute represents a feature in our dataset.
Pydantic automatically checks that the input data matches these types, e.g., Size_in_Sqft must be an integer and Neighborhood must be a string.
This helps prevent invalid inputs from crashing your model.

By default, all fields in your class are required. If some features are optional, you can use Optional and provide a default

from typing import Optional Furnished: Optional[str] = "UnFurnished"

For a field that should only accept specific values, the best way in Pydantic is to use an Enum. This ensures that FastAPI will automatically validate the input and reject anything outside the allowed options. For example, I expect the input field Furnished to be either Furnished or UnFurnished

from pydantic import BaseModel from enum import Enum class Furnishing(str, Enum): Furnished = "Furnished" UnFurnished = "UnFurnished" class model_input(BaseModel): Size_in_Sqft: int Bedrooms: int Bathrooms: int Neighborhood: str Furnished: Furnishing # now only accepts "Furnished" or "NotFurnished" Building_Type: str Property_Age: int Distance_to_City_Center: float

Step 4: Load the Saved Model

rental_model = joblib.load("saved_rental_price_model.joblib")

This loads the trained model pipeline from disk.
Because we saved the entire pipeline (preprocessing + model), the API can now handle raw input data just like during training.

Step 5: Define the Prediction Endpoint

@app.post('/predict_rental_price') def predict_rental_price(data: model_input): input_df = pd.DataFrame([data.dict()]) prediction = rental_model.predict(input_df) return {'Predicted_Rental_Price': prediction[0]}

@app.post('/predict_rental_price'): This creates a POST endpoint called /predict_rental_price. Clients can send data to this endpoint to get predictions.
data: model_input: FastAPI automatically converts incoming JSON to a model_input object and validates it.
input_df = pd.DataFrame([data.dict()]): Converts the validated input into a pandas DataFrame. The model pipeline expects a DataFrame as input.
prediction = rental_model.predict(input_df): Passes the input through the pipeline (including preprocessing) and generates a prediction.
return {'Predicted_Rental_Price': prediction[0]}: Sends the prediction back as a JSON response.

Step 6: Running the API

To run the API locally, use the following command:

uvicorn main:app --reload

Once running, you can open http://127.0.0.1:8000/docs in your browser. This opens Swagger UI, an interactive interface where we can test the API without writing any code.

Step 7: Testing the API with Python (Without Swagger)

We can also send requests to the API programmatically using Python’s requests library

import requests url = "http://127.0.0.1:8000/predict_rental_price" data = { "Size_in_Sqft": 1200, "Bedrooms": 3, "Bathrooms": 2, "Neighborhood": "Uptown", "Furnished": "Furnished", "Building_Type": "Apartment", "Property_Age": 5, "Distance_to_City_Center": 2.5 } response = requests.post(url, json=data) if response.status_code == 200: print("Predicted Rental Price:", response.json()['Predicted_Rental_Price']) else: print("Error:", response.status_code, response.text)

We send a POST request with input data formatted as a JSON object that matches the expected model fields. The API processes the request, validates the input, runs it through the machine learning model, and returns a prediction.

We can then check if the request was successful and extract the predicted value from the response. This approach allows us to interact with the API directly from Python scripts, making it easy to integrate the model into other applications or automated workflows.

This approach makes your machine learning model fully deployable, reusable, and accessible to any application capable of sending HTTP requests. The combination of FastAPI + Pydantic + Joblib ensures the API is fast, reliable, and robust against invalid input.

Serving Machine Learning Models as FastAPI Endpoints was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.