---
# Content: CC BY-NC-SA 4.0 | Code: MIT - see /LICENSE.md
title: "Notebook to production API"
---
## The model that has to ship {#sec-model-that-ships}
You have a churn model that works. In a notebook, you loaded the customer table, engineered a few features, trained a classifier, and the holdout numbers are good enough that product wants it serving the app: when a customer's account page loads, the system should ask your model how likely they are to churn and, if the risk is high, offer a retention incentive. The exploration is over; now it has to *ship*.
This chapter is that journey end to end, on one concrete project. Nothing in it is new — every practice has its own chapter in Parts 1 to 5 — but seeing them applied in sequence to a single model is the point. In isolation each step can feel like ceremony; assembled, they are what turns "it works in my notebook" into "it serves the business reliably". We'll go from the notebook to a tested, configured, containerised, deployed, and monitored service, and at each step point back to the chapter that covers it in depth.
## Out of the notebook, into a package {#sec-into-a-package}
The notebook's value was discovery; its structure is a liability (Chapter 1). The first move is to extract the logic worth keeping — the feature engineering and the training — into pure functions in a `src/` package (Chapters 6 and 9), leaving the notebook as a thin layer that imports and explores. The project takes the shape from Chapter 9:
```text
churn-service/
├── pyproject.toml
├── requirements.txt
├── config.yaml
├── src/churn_service/
│ ├── features.py # feature engineering
│ ├── train.py # training, producing the artefact
│ └── api.py # the serving layer
├── tests/
└── Dockerfile
```
The feature and training logic, lifted from the notebook into `features.py` and `train.py`, become ordinary importable functions:
```{python}
#| label: package-core
#| echo: true
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
def add_features(customers: pd.DataFrame) -> pd.DataFrame:
"""Derive model features from raw customer fields."""
return customers.assign(
spend_per_active_day=customers["spend"] / customers["active_days"].clip(lower=1),
is_recent=(customers["days_since_login"] < 30).astype(int),
)
def train(customers: pd.DataFrame) -> GradientBoostingClassifier:
"""Train the churn classifier on engineered features."""
features = add_features(customers)
X = features[["spend_per_active_day", "is_recent", "days_since_login"]]
return GradientBoostingClassifier(random_state=42).fit(X, features["churned"])
# Stand-in for the real customer table (data lives outside the repo, Chapter 2).
rng = np.random.default_rng(42)
n = 2_000
customers = pd.DataFrame({
"spend": rng.exponential(50, n),
"active_days": rng.integers(0, 365, n),
"days_since_login": rng.integers(0, 90, n),
})
customers["churned"] = rng.binomial(1, 1 / (1 + np.exp(-(customers["days_since_login"] / 30 - 1))))
model = train(customers)
print(f"trained {type(model).__name__} on {len(customers)} customers")
```
The logic is identical to the notebook's; what's changed is that it now lives in named, importable, testable functions rather than in cells whose order you have to remember.
## Reproducible and configured {#sec-repro-configured}
With the code in a package, the surrounding scaffolding follows quickly, each from its own chapter. The environment is pinned to a lockfile (Chapter 3) so the model trains identically on a colleague's machine and the CI runner. The project is under version control with a data-and-secrets-aware `.gitignore` (Chapter 2). And the values that were hard-coded in the notebook — the feature thresholds, the model's hyperparameters, the path to the data — move into a `config.yaml` loaded into a validated object (Chapter 11):
```yaml
# config.yaml
features:
recent_login_days: 30
model:
random_state: 42
n_estimators: 100
```
None of this changes what the model *does*. It changes whether the model can be rebuilt, by someone else, next month, to the same result — which is the difference between a finding and a rumour.
::: {.callout-note}
## Data Science Bridge
Productionising a model is the engineering equivalent of writing a result up for publication. The exploratory analysis is where you *found* something; the paper is the rigorous, reproducible, reviewable version that others can build on — with the methods stated, the data described, and the result reproducible by a stranger. Everything in this chapter is that write-up, in code: the package is the methods section, the lockfile and config are the reproducibility statement, the tests are the peer review you run yourself.
Where the analogy breaks down: a paper, once published, is finished — it describes a result frozen in time. A deployed service is never finished. It keeps running, against data that keeps changing, so it needs the thing a paper never does — monitoring — to tell you when the result it embodies has quietly stopped being true.
:::
## Test the pieces {#sec-test-pieces}
Now the model can be tested, because its logic is in functions rather than cells (Chapter 7). You test the deterministic transforms against known inputs and edge cases, and — critically for deployment — you confirm the trained artefact survives a save/load round-trip (Chapter 15), since the deployed service loads a serialised model rather than retraining:
```{python}
#| label: tests-and-artefact
#| echo: true
import tempfile
from pathlib import Path
import joblib
# A transform test: the feature must be defined even when active_days is zero
# (the bug that bit us in Chapter 19 — guarded here, and now pinned by a test).
edge = pd.DataFrame({"spend": [100.0], "active_days": [0], "days_since_login": [5]})
assert np.isfinite(add_features(edge)["spend_per_active_day"]).all()
# The serialisation round-trip the deployment depends on.
artefact = Path(tempfile.mkdtemp()) / "model.joblib"
joblib.dump(model, artefact)
loaded = joblib.load(artefact)
sample = add_features(customers).iloc[:5][["spend_per_active_day", "is_recent", "days_since_login"]]
assert np.array_equal(model.predict_proba(sample), loaded.predict_proba(sample))
print("transform handles the zero edge case; saved model predicts identically")
```
Note what we *don't* test: that the model achieves a particular accuracy. That belongs in evaluation and monitoring, not a pass/fail gate (Chapter 7). We test the code around the model, where there's a right answer.
## Wrap it in an API {#sec-wrap-api}
The service loads the saved artefact and exposes a `/predict` endpoint with a typed request and response contract (Chapter 12), the same contract the engineering team agreed to consume (Chapter 20). Because FastAPI runs in-process for testing, we can serve the trained model and exercise it here:
```{python}
#| label: serve-the-model
#| echo: true
from fastapi import FastAPI
from fastapi.testclient import TestClient
from pydantic import BaseModel, Field
class CustomerInput(BaseModel):
spend: float = Field(ge=0)
active_days: int = Field(ge=0)
days_since_login: int = Field(ge=0)
class ChurnResponse(BaseModel):
churn_probability: float
model_version: str
app = FastAPI()
@app.post("/predict", response_model=ChurnResponse)
def predict(customer: CustomerInput) -> ChurnResponse:
row = add_features(pd.DataFrame([customer.model_dump()]))
features = row[["spend_per_active_day", "is_recent", "days_since_login"]]
proba = float(loaded.predict_proba(features)[0, 1])
return ChurnResponse(churn_probability=round(proba, 3), model_version="1.0.0")
client = TestClient(app)
ok = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": 70})
print(f"valid -> {ok.status_code} {ok.json()}")
bad = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": -3})
print(f"invalid -> {bad.status_code} ({bad.json()['detail'][0]['msg']})")
```
The endpoint reuses the *same* `add_features` the model was trained with — not a reimplementation — which is how you avoid train–serve skew, the bug where serving features are computed differently from training features. A valid request returns a typed probability and a model version; an invalid one is rejected at the door with a clear 422, no hand-written validation required.
## Package, integrate, deploy {#sec-package-deploy}
From here the operational chapters assemble in sequence, mostly as configuration. The service is containerised (Chapter 14) so it runs identically everywhere, with the model artefact and its locked dependencies sealed in:
```dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/ ./src/
CMD ["uvicorn", "src.churn_service.api:app", "--host", "0.0.0.0", "--port", "80"]
```
Continuous integration runs the tests and linters on every change (Chapter 13), so a refactor that breaks the feature logic is caught before it merges:
```yaml
# .github/workflows/ci.yml (abridged)
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install -r requirements.txt
- run: ruff check . && pytest
```
And the verified image is deployed as an online service (Chapter 15), promoted through staging to production with a rollback kept ready.
## Watch it in production {#sec-watch-production}
Deployment is not the finish line (Chapter 16). The running service logs each prediction with its inputs and the model version, exposes a `/health` endpoint for the platform to poll, and compares the distribution of incoming features against the training reference so that drift — a new customer segment, a changed upstream field — raises an alert rather than silently degrading the model. The same `model_version` carried in the response makes it possible to trace a bad prediction back to the exact artefact that produced it. The loop the whole system enables is: serve, watch, and when the data drifts far enough, retrain and redeploy — which is exactly the pipeline the next chapters automate.
::: {.callout-tip}
## Author's Note
Walking the whole route at once reveals something that's invisible chapter by chapter: none of these practices is really a separate chore. They're one continuous act of hardening — taking a single idea, the churn model, and removing the ways it could fail to run, fail to reproduce, fail to scale to a second user, or fail silently over time. The package makes it reusable, the tests make it changeable, the container makes it portable, the monitoring makes it trustworthy — and each only matters because the others are there too. A tested model nobody can deploy is as stuck as a deployed model nobody can trust.
The other thing the journey makes clear is that you choose how far to walk it. A throwaway analysis stops at the notebook; an internal tool might stop after the package and a few tests; only a model that real users depend on needs the whole route. The skill this book has been building is not "always do all of it" but the judgement to know how far down this path a given piece of work needs to go — and the ability to keep going when it does. The notebook was the hypothesis. Everything after it is the work of making the hypothesis dependable.
:::
## Summary {#sec-notebook-to-api-summary}
This chapter assembled the book so far into one journey:
1. **Extract, then scaffold.** Lift the keep-worthy logic out of the notebook into a package of pure functions, then add the environment, version control, and configuration around it.
2. **Test the code, serialise the model.** Test the deterministic transforms and confirm the artefact round-trips; leave accuracy to evaluation, not a pass/fail gate.
3. **Serve behind a contract.** Wrap the loaded artefact in a typed API that reuses the training-time feature code, avoiding train–serve skew.
4. **Integrate, deploy, and watch.** Containerise, gate every change with CI, deploy with a rollback, and monitor for drift — because a deployed model that runs is not yet a model that still works.
The next chapter applies the same end-to-end discipline to a different shape of problem, where the deliverable is not a live service but a result others must be able to reproduce exactly: *the reproducible research pipeline*.
## Exercises {#sec-notebook-to-api-exercises}
1. Take a model of your own that currently lives in a notebook and carry it one stage further down this route than it is now — for most readers, that means extracting the feature and training logic into an importable module with pure functions. What did you have to change about the code to make it importable?
2. Add the train–serve safeguard: make your serving path call the *exact same* feature function used in training, then write a test that feeds one raw record through both the training and serving paths and asserts the features match. Why is this test worth more than a test of the model's accuracy?
3. Wrap your model in a FastAPI `/predict` endpoint with a typed request and response, including a `model_version` in the response. Send it a malformed request and confirm it returns a clear 422. What did you have to decide about the contract that the notebook let you leave implicit?
4. **Conceptual:** The Data Science Bridge compares productionising a model to writing a result up for publication. Give one way the analogy holds and one way it breaks down. What does a deployed service require that a published paper never does, and which chapter addresses it?
5. **Conceptual:** This chapter insists you choose how far down the route to walk. For three pieces of work you've actually done — a throwaway analysis, an internal tool, and something users depend on — state where each should stop, and the signal that would tell you it needs to go further.