Learn.

Learn Small Amount Everyday

← Back to Home

Type Hinting: Making Python Self-Documenting

Python is a dynamically typed language. This means you don't have to declare that a variable x is an integer. It just works.

def add(a, b):
    return a + b

This flexibility is great for quick scripts, but dangerous for large systems. specific to data science: Does b expect a scalar, a list, or a pandas Series?

What are Type Hints?

Introduced in Python 3.5, type hints allow you to optionally specify what type of data your functions expect and return.

def add(a: int, b: int) -> int:
    return a + b

Now it is clear: add takes two integers and returns an integer.

Why Use Type Hints in Data Science?

1. Documentation

Looking at function signatures instantly tells you what inputs are required.

Without hints:

def process_data(data, config):
    ...

Is data a CSV path? A DataFrame? A dict?

With hints:

import pandas as pd
from typing import Dict, Any

def process_data(data: pd.DataFrame, config: Dict[str, Any]) -> pd.DataFrame:
    ...

Ah, it takes a DataFrame and a dictionary config, and returns a DataFrame.

2. Catching Errors Early

Modern IDEs (like VS Code) and static type checkers (like mypy) use these hints to warn you before you even run the code.

def greet(name: str) -> str:
    return "Hello " + name

greet(123) # Editor warning: Expected type 'str', got 'int' instead.

3. Better Autocomplete

When the editor knows df is a pandas.DataFrame, it can provide accurate autocomplete suggestions for methods like .groupby(), .apply(), etc.

Common Types for Data Science

from typing import List, Dict, Optional, Tuple, Union
import pandas as pd
import numpy as np

# Primitives
x: int = 1
y: float = 1.5
z: str = "hello"

# Collections
names: List[str] = ["Alice", "Bob"]
scores: Dict[str, int] = {"Alice": 90, "Bob": 85}

# Optional (can be None)
user_id: Optional[int] = None 

# Data Functions
def clean_dataset(
    df: pd.DataFrame, 
    columns_to_drop: List[str]
) -> pd.DataFrame:
    return df.drop(columns=columns_to_drop)

def calculate_metric(
    y_true: np.ndarray, 
    y_pred: np.ndarray
) -> float:
    return np.mean((y_true - y_pred) ** 2)

Conclusion

Type hinting adds a small amount of overhead when writing code, but saves a massive amount of time when reading and debugging it later. It acts as a contract between parts of your code, ensuring that your data pipeline pieces fit together perfectly.