EC1B1 Python Coding SupportFollow-up resources for LSE BSc Economics
← Back to resources

Python Pattern Library for Economists

Quick reference of common code patterns for loading data, calculations, visualization, and analysis.

Updated 2026-01-29

Python Pattern Library for Economists

A quick reference of common code patterns you'll use throughout this course. When you see these patterns in examples, you'll know what they do. When you need to do something similar, look here first.


1. Loading Data

Load a CSV file

import pandas as pd

df = pd.read_csv("filename.csv")

Load a CSV from a URL

import pandas as pd

url = "[https://example.com/data.csv"](https://example.com/data.csv")
df = pd.read_csv(url)

Check what your data looks like

df.head()        # First 5 rows
df.tail()        # Last 5 rows
df.shape         # (rows, columns)
df.columns       # Column names
df.info()        # Column types and missing values
df.describe()    # Summary statistics

2. Selecting Data

Select a single column

# Returns a Series (one-dimensional)
gdp_values = df["gdp"]
gdp_values = df.gdp  # Alternative syntax

Select multiple columns

# Returns a DataFrame (two-dimensional)
subset = df[["year", "gdp", "inflation"]]

Select rows by position

df.iloc[0]       # First row
df.iloc[0:5]     # First 5 rows
df.iloc[-1]      # Last row

Select rows by label/index

df.loc[2020]     # Row where index is 2020

3. Filtering Data

Filter by a condition

# All rows where year is greater than 2000
recent = df[df["year"] > 2000]

# All rows where country is UK
uk_data = df[df["country"] == "United Kingdom"]

# All rows where growth is negative
recessions = df[df["growth_rate"] < 0]

Multiple conditions (AND)

# Year after 2000 AND positive growth
subset = df[(df["year"] > 2000) & (df["growth_rate"] > 0)]

Multiple conditions (OR)

# UK or US data
anglo = df[(df["country"] == "United Kingdom") | (df["country"] == "United States")]

Filter by list of values

countries = ["United Kingdom", "Germany", "France"]
europe = df[df["country"].isin(countries)]

4. Basic Calculations

Summary statistics

df["gdp"].mean()      # Average
df["gdp"].median()    # Median
df["gdp"].std()       # Standard deviation
df["gdp"].min()       # Minimum
df["gdp"].max()       # Maximum
df["gdp"].sum()       # Total
df["gdp"].count()     # Count (excluding NaN)

Create a new column

# GDP in trillions (from billions)
df["gdp_trillions"] = df["gdp_billions"] / 1000

# Real GDP (adjusting for inflation)
df["real_gdp"] = df["nominal_gdp"] / df["price_index"] * 100

# Growth rate from levels
df["growth"] = df["gdp"].pct_change() * 100

Group and aggregate

# Average GDP by country
df.groupby("country")["gdp"].mean()

# Multiple statistics by group
df.groupby("country")["gdp"].agg(["mean", "std", "count"])

# Multiple columns
df.groupby("country")[["gdp", "unemployment"]].mean()

5. Sorting Data

Sort by one column

df.sort_values("gdp")                    # Ascending (smallest first)
df.sort_values("gdp", ascending=False)   # Descending (largest first)

Sort by multiple columns

df.sort_values(["country", "year"])

6. Handling Missing Data

Check for missing values

df.isnull().sum()        # Count of NaN per column
df["gdp"].isnull().sum() # Count of NaN in one column

Remove rows with missing values

df.dropna()              # Remove any row with NaN
df.dropna(subset=["gdp"]) # Remove rows where gdp is NaN

Fill missing values

df["gdp"].fillna(0)              # Replace NaN with 0
df["gdp"].fillna(df["gdp"].mean()) # Replace with mean

7. Basic Visualization

Line chart

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.plot(df["year"], df["gdp"])
plt.xlabel("Year")
plt.ylabel("GDP (billions)")
plt.title("UK GDP Over Time")
plt.show()

Bar chart

plt.figure(figsize=(10, 6))
plt.bar(df["country"], df["gdp"])
plt.xlabel("Country")
plt.ylabel("GDP")
plt.title("GDP by Country")
plt.xticks(rotation=45)
plt.show()

Scatter plot

plt.figure(figsize=(10, 6))
plt.scatter(df["education_years"], df["gdp_per_capita"])
plt.xlabel("Years of Education")
plt.ylabel("GDP per Capita")
plt.title("Education vs Income")
plt.show()

Histogram

plt.figure(figsize=(10, 6))
plt.hist(df["growth_rate"], bins=20)
plt.xlabel("Growth Rate (%)")
plt.ylabel("Frequency")
plt.title("Distribution of Growth Rates")
plt.show()

Quick pandas plot (shortcut)

df.plot(x="year", y="gdp", kind="line")
df.plot(x="year", y="gdp", kind="bar")
df.plot(x="education", y="gdp", kind="scatter")
df["growth"].plot(kind="hist")

8. Statistical Analysis

Correlation

# Correlation between two variables
df["gdp"].corr(df["unemployment"])

# Correlation matrix for all numeric columns
df.corr()

Simple linear regression

import statsmodels.formula.api as smf

# Regression: gdp depends on education
model = smf.ols("gdp_per_capita ~ education_years", data=df).fit()
print(model.summary())

# Get coefficients
model.params["education_years"]  # Slope
model.params["Intercept"]        # Intercept

Multiple regression

model = smf.ols("gdp_per_capita ~ education_years + life_expectancy", data=df).fit()
print(model.summary())

9. Useful Shortcuts

Print formatted output

# f-strings for formatted printing
avg = df["gdp"].mean()
print(f"The average GDP is {avg:.2f} billion")  # 2 decimal places
print(f"The average GDP is {avg:,.0f} billion")  # Comma separator, no decimals

Quick data exploration

# All in one go
def explore(df):
    print("Shape:", df.shape)
    print("\nColumns:", list(df.columns))
    print("\nFirst rows:")
    display(df.head())
    print("\nStatistics:")
    display(df.describe())
    print("\nMissing values:")
    print(df.isnull().sum())

explore(df)

Save your results

# Save DataFrame to CSV
df.to_csv("my_results.csv", index=False)

# Save a figure
plt.savefig("my_chart.png", dpi=300, bbox_inches="tight")

10. Common Patterns for Economics

Calculate growth rates

# Period-over-period growth rate
df["growth_rate"] = df["gdp"].pct_change() * 100

# Year-over-year growth (if monthly data)
df["yoy_growth"] = df["gdp"].pct_change(periods=12) * 100

Calculate real values (inflation adjustment)

base_year = 2020
base_cpi = df[df["year"] == base_year]["cpi"].values[0]
df["real_gdp"] = df["nominal_gdp"] * (base_cpi / df["cpi"])

Calculate index numbers

base_value = df[df["year"] == 2000]["gdp"].values[0]
df["gdp_index"] = (df["gdp"] / base_value) * 100

Rolling averages (smoothing)

# 3-period moving average
df["gdp_ma3"] = df["gdp"].rolling(window=3).mean()

Lag variables

# Previous year's GDP
df["gdp_lag1"] = df["gdp"].shift(1)

# GDP from 2 years ago
df["gdp_lag2"] = df["gdp"].shift(2)

Quick Debugging Reference

"KeyError: 'column_name'"

The column doesn't exist. Check spelling and use df.columns to see available columns.

"ValueError: could not convert string to float"

You're trying to do math on text data. Check your column types with df.dtypes.

"Index out of bounds"

You're trying to access a row/column that doesn't exist. Check df.shape.

Graph is empty or wrong

Check that your x and y columns exist and have data. Use df["column"].head() to verify.

NaN in calculations

You have missing values. Use df.dropna() or df.fillna() before calculations.


Keep this reference handy. You don't need to memorize everything—just know where to look!