Defensive Programming in Data Science
Data is messy. APIs fail. Servers run out of memory. If your code assumes "happy path" inputs, it will break—and usually at 3 AM. Defensive programming is the practice of anticipating failure and guarding against it.
1. Fail Fast with Assertions
When you make an assumption about your data, enforce it explicitly.
Assumption: "This dataframe has no nulls."
Defensive Code:
df = pd.read_csv("data.csv")
assert df.isnull().sum().sum() == 0, "Input data contains NA values!"
If the data is bad, the script crashes immediately, saving you from training a model on garbage data for 4 hours before realizing something was wrong.
2. Parameter Validation
When writing functions, validate inputs first.
def calculate_metrics(y_true, y_pred):
if len(y_true) != len(y_pred):
raise ValueError(f"Shape mismatch: {len(y_true)} vs {len(y_pred)}")
# ... logic ...
3. Graceful Error Handling (try/except)
Sometimes, you expect things to fail (e.g., a network request). Handle these known errors gracefully.
Bad:
response = requests.get(url) # If this fails, script crashes completely
data = response.json()
Good:
try:
response = requests.get(url, timeout=5)
response.raise_for_status() # Raise error for 4xx/5xx status codes
data = response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Failed to fetch data: {e}")
data = None # Or return a default value / retry
4. Type Hinting (Again)
Type hints are a form of defensive programming. They catch errors before code even runs (if you use a linter).
Summary
- Trust no one (especially not your data source).
- Assert your assumptions.
- Catch specific exceptions (never just
except:). - Fail fast so you can fix it fast.
Robust code is confident code.