Learn.

Learn Small Amount Everyday

← Back to Home

Design Patterns: Factory and Strategy

Design patterns are standard solutions to common problems in software design. For data scientists, two patterns are particularly useful for managing experiments: Factory and Strategy.

The Problem: The "If-Else" Hell

We've all written code like this:

if model_name == 'rf':
    model = RandomForest()
elif model_name == 'xgb':
    model = XGBoost()
elif model_name == 'svm':
    model = SVC()
# ... 10 more lines ...

This is hard to maintain. Adding a new model requires editing the main training logic.

1. The Strategy Pattern

The Strategy pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable.

In Python, since functions are first-class citizens, this is easy. You define a common interface (e.g., fit and predict). Since most Scikit-Learn estimators already follow this, they are strategies!

2. The Factory Pattern

The Factory pattern creates objects without specifying the exact class of object that will be created.

# model_factory.py

def get_model(config):
    """Factory function to instantiate models."""
    model_type = config.get('type')
    params = config.get('params', {})

    if model_type == 'random_forest':
        return RandomForestClassifier(**params)
    elif model_type == 'xgboost':
        return XGBClassifier(**params)
    else:
        raise ValueError(f"Unknown model type: {model_type}")

Putting it Together

Now your main script is clean and agnostic to the specific model details.

# main.py
import yaml
from model_factory import get_model

# Load config (Configuration Management!)
config = yaml.safe_load(open("config.yaml"))

# Get Strategy from Factory
model = get_model(config['model'])

# Execute Strategy (Polymorphism)
model.fit(X_train, y_train) 

Why use this?

  1. Decoupling: Your training loop doesn't know (or care) if it's training a Neural Net or a Logistic Regression. It just calls .fit().
  2. Scalability: You can add new models to the factory without touching the training loop code (Open/Closed Principle).
  3. Configurability: You can drive your entire pipeline from a YAML file.

Design patterns give you a vocabulary to discuss and structure complex systems.