CloudsArk
Deployment Mlops

Deploy an ML Model with FastAPI

Build a simple FastAPI service that loads a model artifact, exposes a prediction endpoint, and can be tested with curl.

Deploy an ML Model with FastAPI

Introduction

Build a simple FastAPI service that loads a model artifact, exposes a prediction endpoint, and can be tested with curl.

Before You Start

You need Python, a saved model artifact, and a clear request schema. Load the model once at startup and return JSON with prediction and model version.

Project Structure

ml-api/
├── app.py
├── models/
│   └── churn.pkl
├── requirements.txt
└── Dockerfile

Step-by-Step Deployment

Create the API:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

MODEL_VERSION = "churn:17"
model = joblib.load("models/churn.pkl")

class Customer(BaseModel):
    tenure: int
    monthly_charges: float

app = FastAPI()

@app.get("/health")
def health():
    return {"status": "ok", "model_version": MODEL_VERSION}

@app.post("/predict")
def predict(customer: Customer):
    features = [[customer.tenure, customer.monthly_charges]]
    probability = float(model.predict_proba(features)[0][1])
    return {"churn_probability": probability, "model_version": MODEL_VERSION}

Install and run locally:

pip install fastapi uvicorn scikit-learn joblib
uvicorn app:app --host 0.0.0.0 --port 8000

Testing the Deployment

Test the endpoints:

curl -s http://localhost:8000/health
curl -s http://localhost:8000/predict -H "Content-Type: application/json" -d '{"tenure": 12, "monthly_charges": 89.9}'
{"status":"ok","model_version":"churn:17"}
{"churn_probability":0.73,"model_version":"churn:17"}

Production Considerations

Add request validation, structured logs, metrics, model version reporting, resource limits, readiness probes, and a rollback plan before production.

Common Mistakes

  • Loading the model on every request.
  • Returning predictions without model version.
  • Accepting unvalidated JSON.
  • Hiding exceptions instead of logging them.

Summary

A reliable model deployment is versioned, testable, observable, and reversible.