12  API design

12.1 “Now serve the model”

The model works. It scores well, the notebook runs, and then comes the sentence that ends the comfortable phase of every data science project: now we need it to serve predictions to the website. Another system — one you didn’t write and may never see — needs an answer from your model, on demand, hundreds of times a minute. You cannot email it a notebook. The model has to become a service: something other software can ask a question and get an answer back.

An API (application programming interface) is how software asks other software for something. A web API does it over HTTP: a client sends a request, your code runs, and a response goes back. Turning a model into an API is the concrete form of the “production gap” from Chapter 1 — the step where a thing that worked on your laptop becomes a thing the rest of the system can rely on. It is more approachable than the silence after “now serve the model” suggests, and the tooling has made the common case genuinely simple.

12.2 An endpoint is a function over HTTP

Strip away the web vocabulary and a web API is a function call across a network. The client names an operation by its HTTP method and path (POST /predict), sends arguments as a JSON body, and receives a return value as a JSON response with a status code (200 for success, 4xx for a bad request, 5xx for a server error). A /predict endpoint is your model.predict() with an HTTP wrapper around it.

FastAPI makes the wrapper almost disappear: you write a normal Python function, decorate it with the method and path, and it handles the HTTP machinery. We’ll see the whole thing run in a moment; the shape is just a decorated function that takes features and returns a prediction.

NoteData Science Bridge

An endpoint is model.predict() exposed over HTTP. You already call predict with a feature vector and get a result back; an API is the same call, except the caller is on another machine and the conversation happens in JSON. The request schema is the contract for the feature vector going in; the response schema is the contract for the prediction coming out. Seen this way, serving a model isn’t a new skill so much as a wrapper around one you use daily.

Where the analogy breaks down: when you call predict in a notebook, you trust the input completely, because you built the array yourself a cell earlier. An endpoint receives its input from strangers — other teams, other systems, the open internet — who will send nulls, strings where you expect numbers, missing fields, and values far outside any range you trained on. So an endpoint has to do things a notebook predict never does: validate its input, handle errors gracefully, and stay stable as the model behind it changes. Those concerns, not the prediction itself, are most of what API design is about.

12.3 The contract: request and response schemas

The heart of a well-designed API is its contract — an explicit statement of what a valid request looks like and what the response will contain. With FastAPI you declare the contract as pydantic models (the same typed-and-validated objects from the previous chapter), and the framework enforces it automatically: a request that doesn’t match is rejected with a clear 422 before your code runs, and the contract is published as interactive documentation without your writing any.

Because FastAPI runs an app in-process for testing, we can define a small service and exercise it here, with no server to start:

import numpy as np
from fastapi import FastAPI
from fastapi.testclient import TestClient
from pydantic import BaseModel, Field
from sklearn.linear_model import LogisticRegression

# A tiny model, fit once at startup — it stands in for a loaded artefact.
rng = np.random.default_rng(42)
X = rng.normal(size=(500, 2))
y = (X[:, 0] + 0.5 * X[:, 1] > 0).astype(int)
model = LogisticRegression().fit(X, y)

class CustomerFeatures(BaseModel):       # the request contract
    recency: float = Field(ge=0)
    frequency: float = Field(ge=0)

class Prediction(BaseModel):             # the response contract
    churn_probability: float

app = FastAPI()

@app.post("/predict", response_model=Prediction)
def predict(features: CustomerFeatures) -> Prediction:
    proba = model.predict_proba([[features.recency, features.frequency]])[0, 1]
    return Prediction(churn_probability=round(float(proba), 3))

client = TestClient(app)

ok = client.post("/predict", json={"recency": 1.2, "frequency": 0.5})
print(f"valid request   -> {ok.status_code}  {ok.json()}")

bad = client.post("/predict", json={"recency": "yesterday", "frequency": 0.5})
print(f"invalid request -> {bad.status_code}  ({bad.json()['detail'][0]['msg']})")
valid request   -> 200  {'churn_probability': 1.0}
invalid request -> 422  (Input should be a valid number, unable to parse string as a number)

The valid request returns 200 and a typed prediction; the malformed one — a string where a number belongs — is rejected with 422 and a message naming the problem, and our predict function never even runs. We wrote no validation logic and no error handling: declaring the schema was enough. That same schema also generates an interactive documentation page (served at /docs) that callers can read and try out, kept in sync with the code automatically because it is the code.

12.4 Designing for callers you’ll never meet

A production endpoint needs a few things a toy one doesn’t, all flowing from the fact that its callers are unknown. Input validation rejects garbage at the door with a helpful message rather than letting it reach the model and produce nonsense. Error handling ensures a failure returns a meaningful status and message, not a stack trace that leaks your internals. Versioning — serving the endpoint at /v1/predict — lets you change or retrain the model behind a new version without breaking the callers depending on the old one. And the auto-generated docs serve as the contract those callers read instead of asking you.

Running the service in production is a matter of pointing a server at the app (uvicorn main:app), which is where the next part of the book picks up — packaging it into a container and deploying it. The design work, though, is done here: a clear contract, validated inputs, sensible errors, and a version.

TipAuthor’s Note

A notebook has exactly one user — you — and that single fact explains why serving a model feels so unexpectedly involved. You know what every input means, you trust the values because you made them, and when something breaks you see the error yourself and fix it on the spot. An API inverts all three. Its callers are strangers who will send the inputs you never thought to guard against; they can’t be trusted, because they don’t know your assumptions; and when something fails, they experience a cryptic error with none of the context you’d have in front of you.

The shift, then, is from trusting your input to defending against it, and from a result you read yourself to a contract other systems build on. That reframing is uncomfortable because it feels like a lot of ceremony around a one-line predict call — but the ceremony is the product. The model is the easy part, already built; the contract around it — the schema that rejects bad input, the version that protects existing callers, the docs that let others integrate without you — is the thing that turns a model into a service other people can actually depend on.

12.5 Summary

An API turns a model into a service other software can rely on:

  1. An endpoint is a function over HTTP. A POST /predict is model.predict() wrapped so that another machine can call it in JSON; FastAPI makes the wrapper a decorated function.

  2. The contract is the design. Declare request and response schemas as pydantic models; FastAPI validates input automatically, returns a clear 422 on bad requests, and generates interactive docs from the schema.

  3. Design for unknown callers. Validate input, handle errors without leaking internals, and version the endpoint so you can change the model without breaking the systems that depend on it.

  4. The contract is the deliverable, not the model. The prediction is the easy part; the schema, versioning, and docs around it are what make the model dependable as a service.

This completes Part 3. Part 4 takes the service the rest of the way to production — running its tests automatically, packaging it, deploying it, and watching it — beginning with continuous integration.

12.6 Exercises

  1. Wrap a model — one of your own, or a trivial one — in a FastAPI /predict endpoint with a pydantic request schema. Run it locally with uvicorn and call it, either with curl or through the interactive /docs page. What did you have to decide about the request format that a notebook predict let you ignore?

  2. Add validation to the endpoint by constraining the request fields (required fields, numeric ranges), then send a malformed request and confirm it returns a clear 422 rather than a 500 or a confidently wrong answer.

  3. Add a response schema and open the auto-generated documentation at /docs. Change a field in the code and reload the page — how does the documentation stay in sync with the implementation, and why does that matter for the people calling your API?

  4. Conceptual: The Data Science Bridge compares an endpoint to model.predict(). Give one way the analogy holds and one way it breaks down. What must an endpoint handle that an in-notebook predict call never has to?

  5. Conceptual: Not every model needs a real-time API. Describe a situation where a scheduled batch job scoring a file is the right delivery mechanism, and one where a real-time API is genuinely necessary. What property of the use case decides between them?