21 Notebook to production API

21.1 The model that has to ship

You have a churn model that works. In a notebook, you loaded the customer table, engineered a few features, trained a classifier, and the holdout numbers are good enough that product wants it serving the app: when a customer’s account page loads, the system should ask your model how likely they are to churn and, if the risk is high, offer a retention incentive. The exploration is over; now it has to ship.

This chapter is that journey end to end, on one concrete project. Nothing in it is new — every practice has its own chapter in Parts 1 to 5 — but seeing them applied in sequence to a single model is the point. In isolation each step can feel like ceremony; assembled, they are what turns “it works in my notebook” into “it serves the business reliably”. We’ll go from the notebook to a tested, configured, containerised, deployed, and monitored service, and at each step point back to the chapter that covers it in depth.

21.2 Out of the notebook, into a package

The notebook’s value was discovery; its structure is a liability (Chapter 1). The first move is to extract the logic worth keeping — the feature engineering and the training — into pure functions in a src/ package (Chapter 6 and Chapter 9), leaving the notebook as a thin layer that imports and explores. The project takes the shape from Chapter 9:

churn-service/
├── pyproject.toml
├── requirements.txt
├── config.yaml
├── src/churn_service/
│   ├── features.py     # feature engineering
│   ├── train.py        # training, producing the artefact
│   └── api.py          # the serving layer
├── models/
│   └── model.joblib    # the trained artefact train.py produces
├── tests/
└── Dockerfile

The feature and training logic, lifted from the notebook into features.py and train.py, become ordinary importable functions:

import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier

def add_features(customers: pd.DataFrame) -> pd.DataFrame:
    """Derive model features from raw customer fields."""
    return customers.assign(
        spend_per_active_day=customers["spend"] / customers["active_days"].clip(lower=1),
        is_recent=(customers["days_since_login"] < 30).astype(int),
    )

def train(customers: pd.DataFrame) -> GradientBoostingClassifier:
    """Train the churn classifier on engineered features."""
    features = add_features(customers)
    X = features[["spend_per_active_day", "is_recent", "days_since_login"]]
    return GradientBoostingClassifier(random_state=42).fit(X, features["churned"])

# Stand-in for the real customer table (data lives outside the repo, Chapter 2).
rng = np.random.default_rng(42)
n = 2_000
customers = pd.DataFrame({
    "spend": rng.lognormal(4.8, 0.85, n),
    "active_days": rng.integers(0, 365, n),
    "days_since_login": rng.integers(0, 90, n),
})
customers["churned"] = rng.binomial(1, 1 / (1 + np.exp(-(customers["days_since_login"] / 30 - 1))))

model = train(customers)
print(f"trained {type(model).__name__} on {len(customers)} customers")

trained GradientBoostingClassifier on 2000 customers

The logic is identical to the notebook’s; what’s changed is that it now lives in named, importable, testable functions rather than in cells whose order you have to remember.

21.3 Reproducible and configured

With the code in a package, the surrounding scaffolding follows quickly, each from its own chapter. The environment is pinned to a lockfile (Chapter 3) so the model trains identically on a colleague’s machine and the CI runner. The project is under version control with a data-and-secrets-aware .gitignore (Chapter 2). And the values that were hard-coded in the notebook — the feature thresholds, the model’s hyperparameters, the path to the data — move into a config.yaml loaded into a validated object (Chapter 11):

# config.yaml
features:
  recent_login_days: 30
model:
  random_state: 42
  n_estimators: 100

None of this changes what the model does. It changes whether the model can be rebuilt, by someone else, next month, to the same result — which is the difference between a finding and a rumour.

Data Science Bridge

Productionising a model is the engineering equivalent of writing a result up for publication. The exploratory analysis is where you found something; the paper is the rigorous, reproducible, reviewable version that others can build on — with the methods stated, the data described, and the result reproducible by a stranger. Everything in this chapter is that write-up, in code: the package is the methods section, the lockfile and config are the reproducibility statement, the tests are the peer review you run yourself.

Where the analogy breaks down: a paper, once published, is finished — it describes a result frozen in time. A deployed service is never finished. It keeps running, against data that keeps changing, so it needs the thing a paper never does — monitoring — to tell you when the result it embodies has quietly stopped being true.

21.4 Test the pieces

Now the model can be tested, because its logic is in functions rather than cells (Chapter 7). You test the deterministic transforms against known inputs and edge cases, and — critically for deployment — you confirm the trained artefact survives a save/load round-trip (Chapter 15), since the deployed service loads a serialised model rather than retraining:

import tempfile
from pathlib import Path
import joblib

# A transform test: the feature must be defined even when active_days is zero
# (the bug that bit us in Chapter 19 — guarded here, and now pinned by a test).
edge = pd.DataFrame({"spend": [100.0], "active_days": [0], "days_since_login": [5]})
assert np.isfinite(add_features(edge)["spend_per_active_day"]).all()

# The serialisation round-trip the deployment depends on.
artefact = Path(tempfile.mkdtemp()) / "model.joblib"
joblib.dump(model, artefact)
loaded = joblib.load(artefact)
sample = add_features(customers).iloc[:5][["spend_per_active_day", "is_recent", "days_since_login"]]
assert np.array_equal(model.predict_proba(sample), loaded.predict_proba(sample))

print("transform handles the zero edge case; saved model predicts identically")

transform handles the zero edge case; saved model predicts identically

Note what we don’t test: that the model achieves a particular accuracy. That belongs in evaluation and monitoring, not a pass/fail gate (Chapter 7). We test the code around the model, where there’s a right answer.

21.5 Wrap it in an API

The service loads the saved artefact and exposes a /predict endpoint with a typed request and response contract (Chapter 12), the same contract the engineering team agreed to consume (Chapter 20). Because FastAPI runs in-process for testing, we can serve the trained model and exercise it here:

from fastapi import FastAPI
from fastapi.testclient import TestClient
from pydantic import BaseModel, Field

class CustomerInput(BaseModel):
    spend: float = Field(ge=0)
    active_days: int = Field(ge=0)
    days_since_login: int = Field(ge=0)

class ChurnResponse(BaseModel):
    churn_probability: float
    model_version: str

app = FastAPI()

@app.post("/predict", response_model=ChurnResponse)
def predict(customer: CustomerInput) -> ChurnResponse:
    row = add_features(pd.DataFrame([customer.model_dump()]))
    features = row[["spend_per_active_day", "is_recent", "days_since_login"]]
    proba = float(loaded.predict_proba(features)[0, 1])
    return ChurnResponse(churn_probability=round(proba, 3), model_version="1.0.0")

client = TestClient(app)
ok = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": 70})
print(f"valid   -> {ok.status_code}  {ok.json()}")
bad = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": -3})
print(f"invalid -> {bad.status_code}  ({bad.json()['detail'][0]['msg']})")

valid   -> 200  {'churn_probability': 0.866, 'model_version': '1.0.0'}
invalid -> 422  (Input should be greater than or equal to 0)

The endpoint reuses the same add_features the model was trained with — not a reimplementation — which is how you avoid train–serve skew, the bug where serving features are computed differently from training features. A valid request returns a typed probability and a model version; an invalid one is rejected at the door with a clear 422, no hand-written validation required.

21.6 Package, integrate, deploy

From here the operational chapters assemble in sequence, mostly as configuration. The service is containerised (Chapter 14) so it runs identically everywhere, with the model artefact and its locked dependencies sealed in:

FROM python:3.12-slim
WORKDIR /app

# Dependencies first, as their own cached layer.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Install the package itself, so `churn_service` is importable by name.
# A src layout is not on the import path until it's installed.
COPY pyproject.toml .
COPY src/ ./src/
RUN pip install --no-cache-dir --no-deps .

# The trained artefact, sealed in alongside the code that produced it.
COPY models/model.joblib ./models/model.joblib

CMD ["uvicorn", "churn_service.api:app", "--host", "0.0.0.0", "--port", "80"]

Two things here are easy to get wrong and worth stating plainly. The pip install . step is not optional decoration: a src layout deliberately keeps the package off the import path until it is installed, so without it the container starts and immediately fails to import churn_service. And the artefact has to be copied in explicitly — an image built from code alone will pass every build check and then crash on its first request with a FileNotFoundError, because nothing put the model where the service expects it.

Baking the artefact into the image is a real choice, not the only one. It makes the image wholly self-describing — one tag identifies one model and one set of code, and a rollback is a single image swap — at the cost of rebuilding to ship a retrained model. The alternative from Chapter 15, fetching the artefact from a registry at startup, decouples the two and suits a model that retrains far more often than its serving code changes. For a service like this one, where they change together, sealing it in is simpler.

Continuous integration runs the tests and linters on every change (Chapter 13), so a refactor that breaks the feature logic is caught before it merges:

# .github/workflows/ci.yml (abridged)
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-python@v6
        with: { python-version: "3.12" }
      - run: pip install -r requirements.txt
      - run: ruff check . && pytest

And the verified image is deployed as an online service (Chapter 15), promoted through staging to production with a rollback kept ready.

21.7 Watch it in production

Deployment is not the finish line (Chapter 16). The running service logs each prediction with its inputs and the model version, exposes a /health endpoint for the platform to poll, and compares the distribution of incoming features against the training reference so that drift — a new customer segment, a changed upstream field — raises an alert rather than silently degrading the model. The same model_version carried in the response makes it possible to trace a bad prediction back to the exact artefact that produced it. The loop the whole system enables is: serve, watch, and when the data drifts far enough, retrain and redeploy — which is exactly the pipeline the next chapters automate.

Author’s Note

Walking the whole route at once reveals something that’s invisible chapter by chapter: none of these practices is really a separate chore. They’re one continuous act of hardening — taking a single idea, the churn model, and removing the ways it could fail to run, fail to reproduce, fail to scale to a second user, or fail silently over time. The package makes it reusable, the tests make it changeable, the container makes it portable, the monitoring makes it trustworthy — and each only matters because the others are there too. A tested model nobody can deploy is as stuck as a deployed model nobody can trust.

The other thing the journey makes clear is that you choose how far to walk it. A throwaway analysis stops at the notebook; an internal tool might stop after the package and a few tests; only a model that real users depend on needs the whole route. The skill this book has been building is not “always do all of it” but the judgement to know how far down this path a given piece of work needs to go — and the ability to keep going when it does. The notebook was the hypothesis. Everything after it is the work of making the hypothesis dependable.

21.8 Summary

This chapter assembled the book so far into one journey:

Extract, then scaffold. Lift the keep-worthy logic out of the notebook into a package of pure functions, then add the environment, version control, and configuration around it.
Test the code, serialise the model. Test the deterministic transforms and confirm the artefact round-trips; leave accuracy to evaluation, not a pass/fail gate.
Serve behind a contract. Wrap the loaded artefact in a typed API that reuses the training-time feature code, avoiding train–serve skew.
Integrate, deploy, and watch. Containerise, gate every change with CI, deploy with a rollback, and monitor for drift — because a deployed model that runs is not yet a model that still works.

The next chapter applies the same end-to-end discipline to a different shape of problem, where the deliverable is not a live service but a result others must be able to reproduce exactly: the reproducible research pipeline.

21.9 Exercises

Take a model of your own that currently lives in a notebook and carry it one stage further down this route than it is now — for most readers, that means extracting the feature and training logic into an importable module with pure functions. What did you have to change about the code to make it importable?
Add the train–serve safeguard: make your serving path call the exact same feature function used in training, then write a test that feeds one raw record through both the training and serving paths and asserts the features match. Why is this test worth more than a test of the model’s accuracy?
Wrap your model in a FastAPI /predict endpoint with a typed request and response, including a model_version in the response. Send it a malformed request and confirm it returns a clear 422. What did you have to decide about the contract that the notebook let you leave implicit?
Conceptual: The Data Science Bridge maps parts of a paper onto parts of a shipped service: the package is the methods section, the lockfile and config are the reproducibility statement, the tests are the peer review you run on yourself. Extend the mapping to two things a paper has that the chapter left unplaced — its abstract, and the retraction it issues when a published result turns out to be wrong. What plays each part for a deployed service, and is either one something a service has no honest equivalent of?
Conceptual: This chapter insists you choose how far down the route to walk. For three pieces of work you’ve actually done — a throwaway analysis, an internal tool, and something users depend on — state where each should stop, and the signal that would tell you it needs to go further.

--- # Content: CC BY-NC-SA 4.0 | Code: MIT - see /LICENSE.md --- # Notebook to production API {#sec-notebook-to-api} ## The model that has to ship {#sec-model-that-ships} You have a churn model that works. In a notebook, you loaded the customer table, engineered a few features, trained a classifier, and the holdout numbers are good enough that product wants it serving the app: when a customer's account page loads, the system should ask your model how likely they are to churn and, if the risk is high, offer a retention incentive. The exploration is over; now it has to *ship*. This chapter is that journey end to end, on one concrete project. Nothing in it is new — every practice has its own chapter in Parts 1 to 5 — but seeing them applied in sequence to a single model is the point. In isolation each step can feel like ceremony; assembled, they are what turns "it works in my notebook" into "it serves the business reliably". We'll go from the notebook to a tested, configured, containerised, deployed, and monitored service, and at each step point back to the chapter that covers it in depth. ## Out of the notebook, into a package {#sec-into-a-package} The notebook's value was discovery; its structure is a liability (@sec-notebook-to-system). The first move is to extract the logic worth keeping — the feature engineering and the training — into pure functions in a `src/` package (@sec-functions-modules and @sec-project-structure), leaving the notebook as a thin layer that imports and explores. The project takes the shape from @sec-project-structure: ```text churn-service/ ├── pyproject.toml ├── requirements.txt ├── config.yaml ├── src/churn_service/ │ ├── features.py # feature engineering │ ├── train.py # training, producing the artefact │ └── api.py # the serving layer ├── models/ │ └── model.joblib # the trained artefact train.py produces ├── tests/ └── Dockerfile ``` The feature and training logic, lifted from the notebook into `features.py` and `train.py`, become ordinary importable functions: ```{python} #| label: package-core #| echo: true import numpy as np import pandas as pd from sklearn.ensemble import GradientBoostingClassifier def add_features(customers: pd.DataFrame) -> pd.DataFrame: """Derive model features from raw customer fields.""" return customers.assign( spend_per_active_day=customers["spend"] / customers["active_days"].clip(lower=1), is_recent=(customers["days_since_login"] < 30).astype(int), ) def train(customers: pd.DataFrame) -> GradientBoostingClassifier: """Train the churn classifier on engineered features.""" features = add_features(customers) X = features[["spend_per_active_day", "is_recent", "days_since_login"]] return GradientBoostingClassifier(random_state=42).fit(X, features["churned"]) # Stand-in for the real customer table (data lives outside the repo, Chapter 2). rng = np.random.default_rng(42) n = 2_000 customers = pd.DataFrame({ "spend": rng.lognormal(4.8, 0.85, n), "active_days": rng.integers(0, 365, n), "days_since_login": rng.integers(0, 90, n), }) customers["churned"] = rng.binomial(1, 1 / (1 + np.exp(-(customers["days_since_login"] / 30 - 1)))) model = train(customers) print(f"trained {type(model).__name__} on {len(customers)} customers") ``` The logic is identical to the notebook's; what's changed is that it now lives in named, importable, testable functions rather than in cells whose order you have to remember. ## Reproducible and configured {#sec-repro-configured} With the code in a package, the surrounding scaffolding follows quickly, each from its own chapter. The environment is pinned to a lockfile (@sec-environments) so the model trains identically on a colleague's machine and the CI runner. The project is under version control with a data-and-secrets-aware `.gitignore` (@sec-version-control). And the values that were hard-coded in the notebook — the feature thresholds, the model's hyperparameters, the path to the data — move into a `config.yaml` loaded into a validated object (@sec-config-secrets): ```yaml # config.yaml features: recent_login_days: 30 model: random_state: 42 n_estimators: 100 ``` None of this changes what the model *does*. It changes whether the model can be rebuilt, by someone else, next month, to the same result — which is the difference between a finding and a rumour. ::: {.callout-note} ## Data Science Bridge Productionising a model is the engineering equivalent of writing a result up for publication. The exploratory analysis is where you *found* something; the paper is the rigorous, reproducible, reviewable version that others can build on — with the methods stated, the data described, and the result reproducible by a stranger. Everything in this chapter is that write-up, in code: the package is the methods section, the lockfile and config are the reproducibility statement, the tests are the peer review you run yourself. Where the analogy breaks down: a paper, once published, is finished — it describes a result frozen in time. A deployed service is never finished. It keeps running, against data that keeps changing, so it needs the thing a paper never does — monitoring — to tell you when the result it embodies has quietly stopped being true. ::: ## Test the pieces {#sec-test-pieces} Now the model can be tested, because its logic is in functions rather than cells (@sec-testing). You test the deterministic transforms against known inputs and edge cases, and — critically for deployment — you confirm the trained artefact survives a save/load round-trip (@sec-deployment), since the deployed service loads a serialised model rather than retraining: ```{python} #| label: tests-and-artefact #| echo: true import tempfile from pathlib import Path import joblib # A transform test: the feature must be defined even when active_days is zero # (the bug that bit us in Chapter 19 — guarded here, and now pinned by a test). edge = pd.DataFrame({"spend": [100.0], "active_days": [0], "days_since_login": [5]}) assert np.isfinite(add_features(edge)["spend_per_active_day"]).all() # The serialisation round-trip the deployment depends on. artefact = Path(tempfile.mkdtemp()) / "model.joblib" joblib.dump(model, artefact) loaded = joblib.load(artefact) sample = add_features(customers).iloc[:5][["spend_per_active_day", "is_recent", "days_since_login"]] assert np.array_equal(model.predict_proba(sample), loaded.predict_proba(sample)) print("transform handles the zero edge case; saved model predicts identically") ``` Note what we *don't* test: that the model achieves a particular accuracy. That belongs in evaluation and monitoring, not a pass/fail gate (@sec-testing). We test the code around the model, where there's a right answer. ## Wrap it in an API {#sec-wrap-api} The service loads the saved artefact and exposes a `/predict` endpoint with a typed request and response contract (@sec-api-design), the same contract the engineering team agreed to consume (@sec-collaboration). Because FastAPI runs in-process for testing, we can serve the trained model and exercise it here: ```{python} #| label: serve-the-model #| echo: true from fastapi import FastAPI from fastapi.testclient import TestClient from pydantic import BaseModel, Field class CustomerInput(BaseModel): spend: float = Field(ge=0) active_days: int = Field(ge=0) days_since_login: int = Field(ge=0) class ChurnResponse(BaseModel): churn_probability: float model_version: str app = FastAPI() @app.post("/predict", response_model=ChurnResponse) def predict(customer: CustomerInput) -> ChurnResponse: row = add_features(pd.DataFrame([customer.model_dump()])) features = row[["spend_per_active_day", "is_recent", "days_since_login"]] proba = float(loaded.predict_proba(features)[0, 1]) return ChurnResponse(churn_probability=round(proba, 3), model_version="1.0.0") client = TestClient(app) ok = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": 70}) print(f"valid -> {ok.status_code} {ok.json()}") bad = client.post("/predict", json={"spend": 40.0, "active_days": 12, "days_since_login": -3}) print(f"invalid -> {bad.status_code} ({bad.json()['detail'][0]['msg']})") ``` The endpoint reuses the *same* `add_features` the model was trained with — not a reimplementation — which is how you avoid train–serve skew, the bug where serving features are computed differently from training features. A valid request returns a typed probability and a model version; an invalid one is rejected at the door with a clear 422, no hand-written validation required. ## Package, integrate, deploy {#sec-package-deploy} From here the operational chapters assemble in sequence, mostly as configuration. The service is containerised (@sec-containerisation) so it runs identically everywhere, with the model artefact and its locked dependencies sealed in: ```dockerfile FROM python:3.12-slim WORKDIR /app # Dependencies first, as their own cached layer. COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Install the package itself, so `churn_service` is importable by name. # A src layout is not on the import path until it's installed. COPY pyproject.toml . COPY src/ ./src/ RUN pip install --no-cache-dir --no-deps . # The trained artefact, sealed in alongside the code that produced it. COPY models/model.joblib ./models/model.joblib CMD ["uvicorn", "churn_service.api:app", "--host", "0.0.0.0", "--port", "80"] ``` Two things here are easy to get wrong and worth stating plainly. The `pip install .` step is not optional decoration: a `src` layout deliberately keeps the package off the import path until it is installed, so without it the container starts and immediately fails to import `churn_service`. And the artefact has to be copied in explicitly — an image built from code alone will pass every build check and then crash on its first request with a `FileNotFoundError`, because nothing put the model where the service expects it. Baking the artefact into the image is a real choice, not the only one. It makes the image wholly self-describing — one tag identifies one model *and* one set of code, and a rollback is a single image swap — at the cost of rebuilding to ship a retrained model. The alternative from @sec-deployment, fetching the artefact from a registry at startup, decouples the two and suits a model that retrains far more often than its serving code changes. For a service like this one, where they change together, sealing it in is simpler. Continuous integration runs the tests and linters on every change (@sec-ci), so a refactor that breaks the feature logic is caught before it merges: ```yaml # .github/workflows/ci.yml (abridged) on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v6 - uses: actions/setup-python@v6 with: { python-version: "3.12" } - run: pip install -r requirements.txt - run: ruff check . && pytest ``` And the verified image is deployed as an online service (@sec-deployment), promoted through staging to production with a rollback kept ready. ## Watch it in production {#sec-watch-production} Deployment is not the finish line (@sec-monitoring). The running service logs each prediction with its inputs and the model version, exposes a `/health` endpoint for the platform to poll, and compares the distribution of incoming features against the training reference so that drift — a new customer segment, a changed upstream field — raises an alert rather than silently degrading the model. The same `model_version` carried in the response makes it possible to trace a bad prediction back to the exact artefact that produced it. The loop the whole system enables is: serve, watch, and when the data drifts far enough, retrain and redeploy — which is exactly the pipeline the next chapters automate. ::: {.callout-tip} ## Author's Note Walking the whole route at once reveals something that's invisible chapter by chapter: none of these practices is really a separate chore. They're one continuous act of hardening — taking a single idea, the churn model, and removing the ways it could fail to run, fail to reproduce, fail to scale to a second user, or fail silently over time. The package makes it reusable, the tests make it changeable, the container makes it portable, the monitoring makes it trustworthy — and each only matters because the others are there too. A tested model nobody can deploy is as stuck as a deployed model nobody can trust. The other thing the journey makes clear is that you choose how far to walk it. A throwaway analysis stops at the notebook; an internal tool might stop after the package and a few tests; only a model that real users depend on needs the whole route. The skill this book has been building is not "always do all of it" but the judgement to know how far down this path a given piece of work needs to go — and the ability to keep going when it does. The notebook was the hypothesis. Everything after it is the work of making the hypothesis dependable. ::: ## Summary {#sec-notebook-to-api-summary} This chapter assembled the book so far into one journey: 1. **Extract, then scaffold.** Lift the keep-worthy logic out of the notebook into a package of pure functions, then add the environment, version control, and configuration around it. 2. **Test the code, serialise the model.** Test the deterministic transforms and confirm the artefact round-trips; leave accuracy to evaluation, not a pass/fail gate. 3. **Serve behind a contract.** Wrap the loaded artefact in a typed API that reuses the training-time feature code, avoiding train–serve skew. 4. **Integrate, deploy, and watch.** Containerise, gate every change with CI, deploy with a rollback, and monitor for drift — because a deployed model that runs is not yet a model that still works. The next chapter applies the same end-to-end discipline to a different shape of problem, where the deliverable is not a live service but a result others must be able to reproduce exactly: *the reproducible research pipeline*. ## Exercises {#sec-notebook-to-api-exercises} 1. Take a model of your own that currently lives in a notebook and carry it one stage further down this route than it is now — for most readers, that means extracting the feature and training logic into an importable module with pure functions. What did you have to change about the code to make it importable? 2. Add the train–serve safeguard: make your serving path call the *exact same* feature function used in training, then write a test that feeds one raw record through both the training and serving paths and asserts the features match. Why is this test worth more than a test of the model's accuracy? 3. Wrap your model in a FastAPI `/predict` endpoint with a typed request and response, including a `model_version` in the response. Send it a malformed request and confirm it returns a clear 422. What did you have to decide about the contract that the notebook let you leave implicit? 4. **Conceptual:** The Data Science Bridge maps parts of a paper onto parts of a shipped service: the package is the methods section, the lockfile and config are the reproducibility statement, the tests are the peer review you run on yourself. Extend the mapping to two things a paper has that the chapter left unplaced — its abstract, and the retraction it issues when a published result turns out to be wrong. What plays each part for a deployed service, and is either one something a service has no honest equivalent of? 5. **Conceptual:** This chapter insists you choose how far down the route to walk. For three pieces of work you've actually done — a throwaway analysis, an internal tool, and something users depend on — state where each should stop, and the signal that would tell you it needs to go further.