Python Pattern Library for Economists
A quick reference of common code patterns you'll use throughout this course. When you see these patterns in examples, you'll know what they do. When you need to do something similar, look here first.
1. Loading Data
Load a CSV file
import pandas as pd
df = pd.read_csv("filename.csv")
Load a CSV from a URL
import pandas as pd
url = "[https://example.com/data.csv"](https://example.com/data.csv")
df = pd.read_csv(url)
Check what your data looks like
df.head() # First 5 rows
df.tail() # Last 5 rows
df.shape # (rows, columns)
df.columns # Column names
df.info() # Column types and missing values
df.describe() # Summary statistics
2. Selecting Data
Select a single column
# Returns a Series (one-dimensional)
gdp_values = df["gdp"]
gdp_values = df.gdp # Alternative syntax
Select multiple columns
# Returns a DataFrame (two-dimensional)
subset = df[["year", "gdp", "inflation"]]
Select rows by position
df.iloc[0] # First row
df.iloc[0:5] # First 5 rows
df.iloc[-1] # Last row
Select rows by label/index
df.loc[2020] # Row where index is 2020
3. Filtering Data
Filter by a condition
# All rows where year is greater than 2000
recent = df[df["year"] > 2000]
# All rows where country is UK
uk_data = df[df["country"] == "United Kingdom"]
# All rows where growth is negative
recessions = df[df["growth_rate"] < 0]
Multiple conditions (AND)
# Year after 2000 AND positive growth
subset = df[(df["year"] > 2000) & (df["growth_rate"] > 0)]
Multiple conditions (OR)
# UK or US data
anglo = df[(df["country"] == "United Kingdom") | (df["country"] == "United States")]
Filter by list of values
countries = ["United Kingdom", "Germany", "France"]
europe = df[df["country"].isin(countries)]
4. Basic Calculations
Summary statistics
df["gdp"].mean() # Average
df["gdp"].median() # Median
df["gdp"].std() # Standard deviation
df["gdp"].min() # Minimum
df["gdp"].max() # Maximum
df["gdp"].sum() # Total
df["gdp"].count() # Count (excluding NaN)
Create a new column
# GDP in trillions (from billions)
df["gdp_trillions"] = df["gdp_billions"] / 1000
# Real GDP (adjusting for inflation)
df["real_gdp"] = df["nominal_gdp"] / df["price_index"] * 100
# Growth rate from levels
df["growth"] = df["gdp"].pct_change() * 100
Group and aggregate
# Average GDP by country
df.groupby("country")["gdp"].mean()
# Multiple statistics by group
df.groupby("country")["gdp"].agg(["mean", "std", "count"])
# Multiple columns
df.groupby("country")[["gdp", "unemployment"]].mean()
5. Sorting Data
Sort by one column
df.sort_values("gdp") # Ascending (smallest first)
df.sort_values("gdp", ascending=False) # Descending (largest first)
Sort by multiple columns
df.sort_values(["country", "year"])
6. Handling Missing Data
Check for missing values
df.isnull().sum() # Count of NaN per column
df["gdp"].isnull().sum() # Count of NaN in one column
Remove rows with missing values
df.dropna() # Remove any row with NaN
df.dropna(subset=["gdp"]) # Remove rows where gdp is NaN
Fill missing values
df["gdp"].fillna(0) # Replace NaN with 0
df["gdp"].fillna(df["gdp"].mean()) # Replace with mean
7. Basic Visualization
Line chart
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.plot(df["year"], df["gdp"])
plt.xlabel("Year")
plt.ylabel("GDP (billions)")
plt.title("UK GDP Over Time")
plt.show()
Bar chart
plt.figure(figsize=(10, 6))
plt.bar(df["country"], df["gdp"])
plt.xlabel("Country")
plt.ylabel("GDP")
plt.title("GDP by Country")
plt.xticks(rotation=45)
plt.show()
Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df["education_years"], df["gdp_per_capita"])
plt.xlabel("Years of Education")
plt.ylabel("GDP per Capita")
plt.title("Education vs Income")
plt.show()
Histogram
plt.figure(figsize=(10, 6))
plt.hist(df["growth_rate"], bins=20)
plt.xlabel("Growth Rate (%)")
plt.ylabel("Frequency")
plt.title("Distribution of Growth Rates")
plt.show()
Quick pandas plot (shortcut)
df.plot(x="year", y="gdp", kind="line")
df.plot(x="year", y="gdp", kind="bar")
df.plot(x="education", y="gdp", kind="scatter")
df["growth"].plot(kind="hist")
8. Statistical Analysis
Correlation
# Correlation between two variables
df["gdp"].corr(df["unemployment"])
# Correlation matrix for all numeric columns
df.corr()
Simple linear regression
import statsmodels.formula.api as smf
# Regression: gdp depends on education
model = smf.ols("gdp_per_capita ~ education_years", data=df).fit()
print(model.summary())
# Get coefficients
model.params["education_years"] # Slope
model.params["Intercept"] # Intercept
Multiple regression
model = smf.ols("gdp_per_capita ~ education_years + life_expectancy", data=df).fit()
print(model.summary())
9. Useful Shortcuts
Print formatted output
# f-strings for formatted printing
avg = df["gdp"].mean()
print(f"The average GDP is {avg:.2f} billion") # 2 decimal places
print(f"The average GDP is {avg:,.0f} billion") # Comma separator, no decimals
Quick data exploration
# All in one go
def explore(df):
print("Shape:", df.shape)
print("\nColumns:", list(df.columns))
print("\nFirst rows:")
display(df.head())
print("\nStatistics:")
display(df.describe())
print("\nMissing values:")
print(df.isnull().sum())
explore(df)
Save your results
# Save DataFrame to CSV
df.to_csv("my_results.csv", index=False)
# Save a figure
plt.savefig("my_chart.png", dpi=300, bbox_inches="tight")
10. Common Patterns for Economics
Calculate growth rates
# Period-over-period growth rate
df["growth_rate"] = df["gdp"].pct_change() * 100
# Year-over-year growth (if monthly data)
df["yoy_growth"] = df["gdp"].pct_change(periods=12) * 100
Calculate real values (inflation adjustment)
base_year = 2020
base_cpi = df[df["year"] == base_year]["cpi"].values[0]
df["real_gdp"] = df["nominal_gdp"] * (base_cpi / df["cpi"])
Calculate index numbers
base_value = df[df["year"] == 2000]["gdp"].values[0]
df["gdp_index"] = (df["gdp"] / base_value) * 100
Rolling averages (smoothing)
# 3-period moving average
df["gdp_ma3"] = df["gdp"].rolling(window=3).mean()
Lag variables
# Previous year's GDP
df["gdp_lag1"] = df["gdp"].shift(1)
# GDP from 2 years ago
df["gdp_lag2"] = df["gdp"].shift(2)
Quick Debugging Reference
"KeyError: 'column_name'"
The column doesn't exist. Check spelling and use df.columns to see available columns.
"ValueError: could not convert string to float"
You're trying to do math on text data. Check your column types with df.dtypes.
"Index out of bounds"
You're trying to access a row/column that doesn't exist. Check df.shape.
Graph is empty or wrong
Check that your x and y columns exist and have data. Use df["column"].head() to verify.
NaN in calculations
You have missing values. Use df.dropna() or df.fillna() before calculations.
Keep this reference handy. You don't need to memorize everything—just know where to look!