Configuration Management: Separating Config from Code
Have you ever had to change a file path in five different scripts just because you moved a folder? Or accidentally pushed an API key to GitHub? If so, you need better configuration management.
The Problem: Hardcoded Values
Hardcoding is the practice of embedding data directly into the source code rather than obtaining it from external sources.
Bad Practice:
# main.py
API_KEY = "12345-secret-key"
DATA_PATH = "/Users/adipta/Downloads/data.csv"
MODEL_PARAMS = {"lr": 0.01, "epochs": 10}
def train():
data = load(DATA_PATH)
# ...
Why it's bad:
- Security Risk: Secrets in code get committed to version control.
- Portability: The code won't run on your colleague's machine (they don't have
/Users/adipta). - Flexibility: You have to edit the code to change hyperparameters.
The Solution: External Configuration
Software engineering best practices dictate that configuration (things that change between deployments/runs) should be separated from code (logic that stays the same).
1. Environment Variables for Secrets
For sensitive data like passwords and API keys, use environment variables.
.env file:
API_KEY=12345-secret-key
DB_PASSWORD=supersecret
Python Code:
import os
from dotenv import load_dotenv
load_dotenv() # Load variables from .env
api_key = os.getenv("API_KEY")
2. YAML/JSON for Hyperparameters and Paths
For non-sensitive configuration like model parameters or file paths, use structured files like YAML.
config.yaml:
logging:
level: "INFO"
data:
path: "./data/raw/dataset.csv"
test_split: 0.2
model:
learning_rate: 0.01
epochs: 100
Python Code:
import yaml
with open("config.yaml", "r") as f:
config = yaml.safe_load(f)
print(f"Training with lr={config['model']['learning_rate']}")
Benefits for Data Science
- Reproducibility: You can commit
config.yamlto Git, documenting exactly what parameters created a specific model version. - Experimentation: You can run multiple experiments by simply pointing to different config files (e.g.,
python train.py --config config_v1.yaml). - Cleanliness: Your code focuses on logic, not settings.
Start separating your concerns today. Your future self (and your team) will thank you.