Learn.

Learn Small Amount Everyday

← Back to Home

Documentation and Docstrings: Be Kind to Your Future Self

We've all been there: opening a project we worked on six months ago and having absolutely no idea what x = y * 0.5 means. Documentation isn't just a chore; it's a love letter to your future self and your teammates.

The Hierarchy of Documentation

  1. Self-Documenting Code: Variable and function names so clear they don't need explanation.
  2. Docstrings: Structured explanations of what a function/class does, its inputs, and outputs.
  3. Comments: Explanations of why something is done (avoid stating what is done, the code shows that).
  4. External Docs (README/Wiki): High-level overview, installation, and usage guides.

Docstrings: The Standard

In Python, a docstring is a string literal that occurs as the first statement in a module, function, class, or method definition.

Bad:

def split(df, r):
    # Splits data
    n = int(len(df) * r)
    return df[:n], df[n:]

Good (Google Style):

def split_dataset(data: pd.DataFrame, ratio: float) -> tuple[pd.DataFrame, pd.DataFrame]:
    """Splits the dataset into training and testing sets.

    Args:
        data: The input dataframe containing features and labels.
        ratio: A float between 0 and 1 indicating the proportion of data
               to include in the training set.

    Returns:
        A tuple containing (train_df, test_df).

    Raises:
        ValueError: If ratio is not between 0 and 1.
    """
    if not 0 < ratio < 1:
        raise ValueError("Ratio must be between 0 and 1.")
    
    n = int(len(data) * ratio)
    return data[:n], data[n:]

Why standard formats matter?

Tools like Sphinx or VS Code extensions can automatically generate nice HTML documentation or hover-text tooltips from standard docstrings (Google, NumPy, or reST style).

Comments: The Why, Not the What

Useless Comment:

x = x + 1 # Increment x

Useful Comment:

# We add a small epsilon to avoid division by zero errors in the log transform
x = x + 1e-9 

Summary

  • Name well: learning_rate is better than lr.
  • Docstring always: Every function needs a docstring explaining Args and Returns.
  • Comment sparingly: Only explain complex logic or "why" decisions.

Good documentation turns a "script" into a "product."