Skip to main content

Polynomial Linear Regression

Polynomial regression extends linear regression to model non-linear relationships between variables by transforming features into polynomial terms. Despite the name, it remains a linear model because the relationship between Y and the coefficients is linear.


Introduction to Linear Regression (Quick Recap)

Simple Linear Regression

When you have a single input variable (X) and a single output variable (Y) with a linear relationship:

Y=β0+β1XY = \beta_0 + \beta_1 X

Where:

  • β0\beta_0 is the intercept
  • β1\beta_1 is the slope (coefficient)

Multiple Linear Regression

When you have multiple input variables (X₁, X₂, X₃, etc.) and a single output variable (Y):

Y=β0+β1X1+β2X2+β3X3+Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \ldots

Where:

  • β\beta values are the coefficients (parameters) that the model learns
  • The relationship between variables and output remains linear

Motivation for Polynomial Regression

Non-Linear Relationships

Linear regression assumes a linear relationship between input and output variables. However, in many real-world scenarios, data exhibits non-linear patterns:

  • Data points follow a curve rather than a straight line
  • Examples: parabolic paths, exponential growth, cubic relationships
  • A straight line cannot accurately capture these underlying patterns

Limitations of Linear Models

When data exhibits non-linear patterns:

  • Poor R-squared Score: A linear model will result in a poor R² score, indicating inadequate fit.
  • Underfitting: The model fails to capture the true relationship in the data.
  • Visual Evidence: Plotting a linear regression line on non-linear data shows obvious mismatch with the actual data points.

Example: A dataset following a parabolic curve (Y=0.8X2+0.8X+2Y = 0.8X^2 + 0.8X + 2) fit with simple linear regression yields an R² ≈ 0.32, clearly inadequate.


What is Polynomial Regression?

Definition

Polynomial regression is a form of linear regression in which the relationship between the independent variable (X) and the dependent variable (Y) is modelled as an nth-degree polynomial.

Y=β0+β1X+β2X2+β3X3++βnXnY = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \ldots + \beta_n X^n

Feature Transformation: The Core Idea

The key insight of polynomial regression is feature engineering: transform the original features into polynomial features before applying linear regression.

Single Input Example (Degree 2 - Quadratic)

Original features: X

Transformed features:

  • X0=1X^0 = 1 (constant/bias term)
  • X1=XX^1 = X (original feature)
  • X2X^2 (squared feature)

Equation: Y=β0+β1X+β2X2Y = \beta_0 + \beta_1 X + \beta_2 X^2

Single Input Example (Degree 3 - Cubic)

Transformed features:

  • X0=1X^0 = 1
  • X1=XX^1 = X
  • X2X^2
  • X3X^3

Equation: Y=β0+β1X+β2X2+β3X3Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3

Why It's Still "Linear" Regression

Important: Despite involving powers of X, polynomial regression is still a linear model because:

  • The relationship between Y and the coefficients (β0,β1,β2,...\beta_0, \beta_1, \beta_2, ...) is linear
  • We're finding a linear combination of polynomial features: Y=β01+β1X+β2X2+...Y = \beta_0 \cdot 1 + \beta_1 \cdot X + \beta_2 \cdot X^2 + ...
  • From the model's perspective, these are just different input features (even though they're derived from X)

It's not linear in the input variable X, but it is linear in the parameters.

How It Works

  1. Convert original features (e.g., X) into polynomial features (e.g., X, , )
  2. Apply standard linear regression on these transformed features
  3. The model learns coefficients for each polynomial term

Practical Application: Single Input Feature

Scenario

Dataset generated by: Y = 0.8X² + 0.8X + 2 + noise

Step 1: Initial Linear Regression Attempt

Applying standard linear regression directly:

  • R² Score: ~0.32 (very poor fit)
  • Visualization: Straight line clearly doesn't match the curved data points
  • Conclusion: Simple linear regression is inadequate

Step 2: Applying Polynomial Features

Using Scikit-learn's PolynomialFeatures class:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Evaluate
r2_score = model.score(X_test_poly, y_test)

Key Parameters

  • degree: The degree of the polynomial (e.g., 2 for quadratic)
  • include_bias:
    • True (default): Adds a column of ones (X0X^0), representing the intercept term
    • False: Does not include the bias term (useful when your linear regression already handles it)

Features Generated (for degree=2)

FeatureDescription
X0=1X^0 = 1Constant (bias)
X1=XX^1 = XOriginal feature
X2X^2Squared feature

Important: Polynomial features are only applied to input (X) features, NOT the output (Y). This transformation must be applied to both training and test data.

Step 3: Improved Performance

After polynomial transformation:

  • R² Score: ~0.9817 (excellent fit)
  • Learned Coefficients:
    • β20.89\beta_2 \approx 0.89 (actual: 0.8)
    • β10.78\beta_1 \approx 0.78 (actual: 0.8)
    • β01.97\beta_0 \approx 1.97 (actual: 2)
  • Visualization: Fitted curve closely matches the non-linear pattern
  • Note: Perfect coefficients aren't achieved due to random noise in the dataset

Overfitting and Underfitting

Underfitting (Degree Too Low)

Problem: Polynomial degree is too low to capture the true relationship.

  • Model performs poorly on both training and test data
  • Fails to capture underlying patterns
  • Looks like a line when the data is clearly curved

Example: Using degree=1 for data that's actually quadratic.

Overfitting (Degree Too High)

Problem: Polynomial degree is too high, causing the model to fit noise and minor fluctuations.

  • Training Performance: Exceptionally well (fits every point, including noise)
  • Test Performance: Poor (fails to generalize)
  • Visualization: Highly erratic curve trying to "catch" every training point
  • Real-World Impact: The model learned specific patterns from training noise that don't exist in the true relationship

Example: Using degree=5 or degree=10 for data that's actually quadratic results in an overly wiggly curve.

Optimizing the Degree

Selecting the Right Degree

For Simple Cases:

  • Trial and error
  • Visual inspection of the plotted data

For Complex Cases:

  • Cross-Validation: Test different degrees and measure performance
  • Learning Curves: Analyze how training and test errors change with model complexity
  • Validation Curves: Plot model performance as a function of polynomial degree

The Sweet Spot

For the example dataset (generated with degree 2):

  • Optimal degree: 2
  • degree < 2: Underfitting
  • degree = 2: Perfect balance
  • degree > 2: Overfitting

Multiple Polynomial Regression

Scenario

When you have multiple input features (e.g., X, Y) and a non-linear relationship with output (Z):

Z = X² + Y² + 0.2X + 0.2Y + 0.1XY + 2 + noise

This is a degree-2 polynomial with two input variables.

Visualization

  • Data plotted in 3D appears as a curved, parabolic-like surface
  • Cannot be fit by a simple plane (which multiple linear regression produces)

Feature Generation with Multiple Inputs

Using PolynomialFeatures with multiple inputs generates all combinations of powers up to the specified degree:

Example: 2 Inputs (X, Y) with degree=2

Generated features:

  • X0Y0=1X^0 Y^0 = 1 (constant/bias)
  • X1Y0=XX^1 Y^0 = X
  • X0Y1=YX^0 Y^1 = Y
  • X2Y0=X2X^2 Y^0 = X^2
  • X1Y1=XYX^1 Y^1 = XY (interaction term)
  • X0Y2=Y2X^0 Y^2 = Y^2

Total: 6 features

Degree Definition

The degree of a term is the sum of the powers of all variables:

  • X2Y0X^2 Y^0: degree = 2 + 0 = 2 ✓
  • X1Y1X^1 Y^1: degree = 1 + 1 = 2 ✓
  • X0Y2X^0 Y^2: degree = 0 + 2 = 2 ✓

The maximum degree of any term equals the specified degree parameter.

Number of Generated Features

For n input features and degree d:

Number of features = C(n+d, d)

For 2 features and degree 2: C(4, 2) = 6 features ✓

Implementation

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Generate polynomial features for 2 inputs, degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train and evaluate
model = LinearRegression()
model.fit(X_train_poly, y_train)
r2_train = model.score(X_train_poly, y_train)
r2_test = model.score(X_test_poly, y_test)

Performance Comparison

Simple Multiple Linear Regression

  • Fits a flat plane through 3D space
  • R² Score: Poor (e.g., 0.25-0.40)
  • Cannot capture curved surface

Polynomial Multiple Linear Regression

  • Fits a curved surface through 3D space
  • R² Score: Excellent (e.g., 0.96-0.99)
  • Captures the non-linear relationships and interactions

Overfitting in Multiple Polynomial Regression

Increasing degree too high leads to:

  • Surface becomes highly irregular: Tries to precisely capture every training point
  • Poor generalization: Model doesn't work well on test/unseen data
  • Noise amplification: Learns patterns specific to training noise

Key Insights

Why Polynomial Regression is "Linear"

The crucial distinction:

AspectLinear?
Relationship between Y and coefficientsYes (linear)
Relationship between Y and input XNo (can be non-linear)

Polynomial regression is linear in its coefficients, which is why it falls under the umbrella of linear regression methods and uses the same optimization techniques (like Ordinary Least Squares).

Feature Engineering Power

Polynomial regression demonstrates the power of feature engineering:

  • Same algorithm (linear regression)
  • Better features (polynomial terms)
  • Much better results

Interaction Terms

Multiple polynomial regression captures interaction effects between variables:

  • XYXY terms show how X and Y influence each other
  • X2YX^2Y terms show complex interactions
  • These are crucial for modeling real-world relationships

Summary

AspectSimple LinearPolynomial (degree=2)
FormulaY=β0+β1XY = \beta_0 + \beta_1 XY=β0+β1X+β2X2Y = \beta_0 + \beta_1 X + \beta_2 X^2
FlexibilityLimitedCaptures curves
Model ComplexitySimpleModerate
R² on Non-linear DataPoorExcellent
Risk of OverfittingLowModerate-High (if degree too high)
Use CaseLinear relationshipsNon-linear relationships

Polynomial regression is an essential technique for handling non-linear patterns while maintaining the interpretability and efficiency of linear regression methods.