Polynomial Linear Regression

Polynomial regression extends linear regression to model non-linear relationships between variables by transforming features into polynomial terms. Despite the name, it remains a linear model because the relationship between Y and the coefficients is linear.

Introduction to Linear Regression (Quick Recap)

Simple Linear Regression

When you have a single input variable (X) and a single output variable (Y) with a linear relationship:

$Y = \beta_0 + \beta_1 X$

Where:

$\beta_0$ is the intercept
$\beta_1$ is the slope (coefficient)

Multiple Linear Regression

When you have multiple input variables (X₁, X₂, X₃, etc.) and a single output variable (Y):

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \ldots$

Where:

$\beta$ values are the coefficients (parameters) that the model learns
The relationship between variables and output remains linear

Motivation for Polynomial Regression

Non-Linear Relationships

Linear regression assumes a linear relationship between input and output variables. However, in many real-world scenarios, data exhibits non-linear patterns:

Data points follow a curve rather than a straight line
Examples: parabolic paths, exponential growth, cubic relationships
A straight line cannot accurately capture these underlying patterns

Limitations of Linear Models

When data exhibits non-linear patterns:

Poor R-squared Score: A linear model will result in a poor R² score, indicating inadequate fit.
Underfitting: The model fails to capture the true relationship in the data.
Visual Evidence: Plotting a linear regression line on non-linear data shows obvious mismatch with the actual data points.

Example: A dataset following a parabolic curve ( $Y = 0.8X^2 + 0.8X + 2$ ) fit with simple linear regression yields an R² ≈ 0.32, clearly inadequate.

What is Polynomial Regression?

Definition

Polynomial regression is a form of linear regression in which the relationship between the independent variable (X) and the dependent variable (Y) is modelled as an nth-degree polynomial.

$Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \ldots + \beta_n X^n$

Feature Transformation: The Core Idea

The key insight of polynomial regression is feature engineering: transform the original features into polynomial features before applying linear regression.

Single Input Example (Degree 2 - Quadratic)

Original features: X

Transformed features:

$X^0 = 1$ (constant/bias term)
$X^1 = X$ (original feature)
$X^2$ (squared feature)

Equation: $Y = \beta_0 + \beta_1 X + \beta_2 X^2$

Single Input Example (Degree 3 - Cubic)

Transformed features:

$X^0 = 1$
$X^1 = X$
$X^2$
$X^3$

Equation: $Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3$

Why It's Still "Linear" Regression

Important: Despite involving powers of X, polynomial regression is still a linear model because:

The relationship between Y and the coefficients ( $\beta_0, \beta_1, \beta_2, ...$ ) is linear
We're finding a linear combination of polynomial features: $Y = \beta_0 \cdot 1 + \beta_1 \cdot X + \beta_2 \cdot X^2 + ...$
From the model's perspective, these are just different input features (even though they're derived from X)

It's not linear in the input variable X, but it is linear in the parameters.

How It Works

Convert original features (e.g., X) into polynomial features (e.g., X, X², X³)
Apply standard linear regression on these transformed features
The model learns coefficients for each polynomial term

Practical Application: Single Input Feature

Scenario

Dataset generated by: Y = 0.8X² + 0.8X + 2 + noise

Step 1: Initial Linear Regression Attempt

Applying standard linear regression directly:

R² Score: ~0.32 (very poor fit)
Visualization: Straight line clearly doesn't match the curved data points
Conclusion: Simple linear regression is inadequate

Step 2: Applying Polynomial Features

Using Scikit-learn's PolynomialFeatures class:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Evaluate
r2_score = model.score(X_test_poly, y_test)

Key Parameters

degree: The degree of the polynomial (e.g., 2 for quadratic)
include_bias:
- True (default): Adds a column of ones ( $X^0$ ), representing the intercept term
- False: Does not include the bias term (useful when your linear regression already handles it)

Features Generated (for `degree=2`)

Feature	Description
$X^0 = 1$	Constant (bias)
$X^1 = X$	Original feature
$X^2$	Squared feature

Important: Polynomial features are only applied to input (X) features, NOT the output (Y). This transformation must be applied to both training and test data.

Step 3: Improved Performance

After polynomial transformation:

R² Score: ~0.9817 (excellent fit)
Learned Coefficients:
- $\beta_2 \approx 0.89$ (actual: 0.8)
- $\beta_1 \approx 0.78$ (actual: 0.8)
- $\beta_0 \approx 1.97$ (actual: 2)
Visualization: Fitted curve closely matches the non-linear pattern
Note: Perfect coefficients aren't achieved due to random noise in the dataset

Overfitting and Underfitting

Underfitting (Degree Too Low)

Problem: Polynomial degree is too low to capture the true relationship.

Model performs poorly on both training and test data
Fails to capture underlying patterns
Looks like a line when the data is clearly curved

Example: Using degree=1 for data that's actually quadratic.

Overfitting (Degree Too High)

Problem: Polynomial degree is too high, causing the model to fit noise and minor fluctuations.

Training Performance: Exceptionally well (fits every point, including noise)
Test Performance: Poor (fails to generalize)
Visualization: Highly erratic curve trying to "catch" every training point
Real-World Impact: The model learned specific patterns from training noise that don't exist in the true relationship

Example: Using degree=5 or degree=10 for data that's actually quadratic results in an overly wiggly curve.

Optimizing the Degree

Selecting the Right Degree

For Simple Cases:

Trial and error
Visual inspection of the plotted data

For Complex Cases:

Cross-Validation: Test different degrees and measure performance
Learning Curves: Analyze how training and test errors change with model complexity
Validation Curves: Plot model performance as a function of polynomial degree

The Sweet Spot

For the example dataset (generated with degree 2):

Optimal degree: 2
degree < 2: Underfitting
degree = 2: Perfect balance
degree > 2: Overfitting

Multiple Polynomial Regression

Scenario

When you have multiple input features (e.g., X, Y) and a non-linear relationship with output (Z):

Z = X² + Y² + 0.2X + 0.2Y + 0.1XY + 2 + noise

This is a degree-2 polynomial with two input variables.

Visualization

Data plotted in 3D appears as a curved, parabolic-like surface
Cannot be fit by a simple plane (which multiple linear regression produces)

Feature Generation with Multiple Inputs

Using PolynomialFeatures with multiple inputs generates all combinations of powers up to the specified degree:

Example: 2 Inputs (X, Y) with `degree=2`

Generated features:

$X^0 Y^0 = 1$ (constant/bias)
$X^1 Y^0 = X$
$X^0 Y^1 = Y$
$X^2 Y^0 = X^2$
$X^1 Y^1 = XY$ (interaction term)
$X^0 Y^2 = Y^2$

Total: 6 features

Degree Definition

The degree of a term is the sum of the powers of all variables:

$X^2 Y^0$ : degree = 2 + 0 = 2 ✓
$X^1 Y^1$ : degree = 1 + 1 = 2 ✓
$X^0 Y^2$ : degree = 0 + 2 = 2 ✓

The maximum degree of any term equals the specified degree parameter.

Number of Generated Features

For n input features and degree d:

Number of features = C(n+d, d)

For 2 features and degree 2: C(4, 2) = 6 features ✓

Implementation

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Generate polynomial features for 2 inputs, degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train and evaluate
model = LinearRegression()
model.fit(X_train_poly, y_train)
r2_train = model.score(X_train_poly, y_train)
r2_test = model.score(X_test_poly, y_test)

Performance Comparison

Simple Multiple Linear Regression

Fits a flat plane through 3D space
R² Score: Poor (e.g., 0.25-0.40)
Cannot capture curved surface

Polynomial Multiple Linear Regression

Fits a curved surface through 3D space
R² Score: Excellent (e.g., 0.96-0.99)
Captures the non-linear relationships and interactions

Overfitting in Multiple Polynomial Regression

Increasing degree too high leads to:

Surface becomes highly irregular: Tries to precisely capture every training point
Poor generalization: Model doesn't work well on test/unseen data
Noise amplification: Learns patterns specific to training noise

Key Insights

Why Polynomial Regression is "Linear"

The crucial distinction:

Aspect	Linear?
Relationship between Y and coefficients	Yes (linear)
Relationship between Y and input X	No (can be non-linear)

Polynomial regression is linear in its coefficients, which is why it falls under the umbrella of linear regression methods and uses the same optimization techniques (like Ordinary Least Squares).

Feature Engineering Power

Polynomial regression demonstrates the power of feature engineering:

Same algorithm (linear regression)
Better features (polynomial terms)
Much better results

Interaction Terms

Multiple polynomial regression captures interaction effects between variables:

$XY$ terms show how X and Y influence each other
$X^2Y$ terms show complex interactions
These are crucial for modeling real-world relationships

Summary

Aspect	Simple Linear	Polynomial (degree=2)
Formula	$Y = \beta_0 + \beta_1 X$	$Y = \beta_0 + \beta_1 X + \beta_2 X^2$
Flexibility	Limited	Captures curves
Model Complexity	Simple	Moderate
R² on Non-linear Data	Poor	Excellent
Risk of Overfitting	Low	Moderate-High (if degree too high)
Use Case	Linear relationships	Non-linear relationships

Polynomial regression is an essential technique for handling non-linear patterns while maintaining the interpretability and efficiency of linear regression methods.

Introduction to Linear Regression (Quick Recap)​

Simple Linear Regression​

Multiple Linear Regression​

Motivation for Polynomial Regression​

Non-Linear Relationships​

Limitations of Linear Models​

What is Polynomial Regression?​

Definition​

Feature Transformation: The Core Idea​

Single Input Example (Degree 2 - Quadratic)​

Single Input Example (Degree 3 - Cubic)​

Why It's Still "Linear" Regression​

How It Works​

Practical Application: Single Input Feature​

Scenario​

Step 1: Initial Linear Regression Attempt​

Step 2: Applying Polynomial Features​

Key Parameters​

Features Generated (for degree=2)​

Step 3: Improved Performance​

Overfitting and Underfitting​

Underfitting (Degree Too Low)​

Overfitting (Degree Too High)​

Optimizing the Degree​

Selecting the Right Degree​

The Sweet Spot​

Multiple Polynomial Regression​

Scenario​

Visualization​

Feature Generation with Multiple Inputs​

Example: 2 Inputs (X, Y) with degree=2​

Degree Definition​

Number of Generated Features​

Implementation​

Performance Comparison​

Simple Multiple Linear Regression​

Polynomial Multiple Linear Regression​

Overfitting in Multiple Polynomial Regression​

Key Insights​

Why Polynomial Regression is "Linear"​

Feature Engineering Power​

Interaction Terms​

Summary​

Introduction to Linear Regression (Quick Recap)

Simple Linear Regression

Multiple Linear Regression

Motivation for Polynomial Regression

Non-Linear Relationships

Limitations of Linear Models

What is Polynomial Regression?

Definition

Feature Transformation: The Core Idea

Single Input Example (Degree 2 - Quadratic)

Single Input Example (Degree 3 - Cubic)

Why It's Still "Linear" Regression

How It Works

Practical Application: Single Input Feature

Scenario

Step 1: Initial Linear Regression Attempt

Step 2: Applying Polynomial Features

Key Parameters

Features Generated (for `degree=2`)

Step 3: Improved Performance

Overfitting and Underfitting

Underfitting (Degree Too Low)

Overfitting (Degree Too High)

Optimizing the Degree

Selecting the Right Degree

The Sweet Spot

Multiple Polynomial Regression

Scenario

Visualization

Feature Generation with Multiple Inputs

Example: 2 Inputs (X, Y) with `degree=2`

Degree Definition

Number of Generated Features

Implementation

Performance Comparison

Simple Multiple Linear Regression

Polynomial Multiple Linear Regression

Overfitting in Multiple Polynomial Regression

Key Insights

Why Polynomial Regression is "Linear"

Feature Engineering Power

Interaction Terms

Summary