Learn.

Learn Small Amount Everyday

← Back to Home

Configuration Management: Separating Config from Code

Have you ever had to change a file path in five different scripts just because you moved a folder? Or accidentally pushed an API key to GitHub? If so, you need better configuration management.

The Problem: Hardcoded Values

Hardcoding is the practice of embedding data directly into the source code rather than obtaining it from external sources.

Bad Practice:

# main.py
API_KEY = "12345-secret-key"
DATA_PATH = "/Users/adipta/Downloads/data.csv"
MODEL_PARAMS = {"lr": 0.01, "epochs": 10}

def train():
    data = load(DATA_PATH)
    # ...

Why it's bad:

  1. Security Risk: Secrets in code get committed to version control.
  2. Portability: The code won't run on your colleague's machine (they don't have /Users/adipta).
  3. Flexibility: You have to edit the code to change hyperparameters.

The Solution: External Configuration

Software engineering best practices dictate that configuration (things that change between deployments/runs) should be separated from code (logic that stays the same).

1. Environment Variables for Secrets

For sensitive data like passwords and API keys, use environment variables.

.env file:

API_KEY=12345-secret-key
DB_PASSWORD=supersecret

Python Code:

import os
from dotenv import load_dotenv

load_dotenv() # Load variables from .env

api_key = os.getenv("API_KEY")

2. YAML/JSON for Hyperparameters and Paths

For non-sensitive configuration like model parameters or file paths, use structured files like YAML.

config.yaml:

logging:
  level: "INFO"
data:
  path: "./data/raw/dataset.csv"
  test_split: 0.2
model:
  learning_rate: 0.01
  epochs: 100

Python Code:

import yaml

with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)

print(f"Training with lr={config['model']['learning_rate']}")

Benefits for Data Science

  • Reproducibility: You can commit config.yaml to Git, documenting exactly what parameters created a specific model version.
  • Experimentation: You can run multiple experiments by simply pointing to different config files (e.g., python train.py --config config_v1.yaml).
  • Cleanliness: Your code focuses on logic, not settings.

Start separating your concerns today. Your future self (and your team) will thank you.