Polynomial Linear Regression
Polynomial regression extends linear regression to model non-linear relationships between variables by transforming features into polynomial terms. Despite the name, it remains a linear model because the relationship between Y and the coefficients is linear.
Introduction to Linear Regression (Quick Recap)
Simple Linear Regression
When you have a single input variable (X) and a single output variable (Y) with a linear relationship:
Where:
- is the intercept
- is the slope (coefficient)
Multiple Linear Regression
When you have multiple input variables (X₁, X₂, X₃, etc.) and a single output variable (Y):
Where:
- values are the coefficients (parameters) that the model learns
- The relationship between variables and output remains linear
Motivation for Polynomial Regression
Non-Linear Relationships
Linear regression assumes a linear relationship between input and output variables. However, in many real-world scenarios, data exhibits non-linear patterns:
- Data points follow a curve rather than a straight line
- Examples: parabolic paths, exponential growth, cubic relationships
- A straight line cannot accurately capture these underlying patterns
Limitations of Linear Models
When data exhibits non-linear patterns:
- Poor R-squared Score: A linear model will result in a poor R² score, indicating inadequate fit.
- Underfitting: The model fails to capture the true relationship in the data.
- Visual Evidence: Plotting a linear regression line on non-linear data shows obvious mismatch with the actual data points.
Example: A dataset following a parabolic curve () fit with simple linear regression yields an R² ≈ 0.32, clearly inadequate.
What is Polynomial Regression?
Definition
Polynomial regression is a form of linear regression in which the relationship between the independent variable (X) and the dependent variable (Y) is modelled as an nth-degree polynomial.
Feature Transformation: The Core Idea
The key insight of polynomial regression is feature engineering: transform the original features into polynomial features before applying linear regression.
Single Input Example (Degree 2 - Quadratic)
Original features: X
Transformed features:
- (constant/bias term)
- (original feature)
- (squared feature)
Equation:
Single Input Example (Degree 3 - Cubic)
Transformed features:
Equation:
Why It's Still "Linear" Regression
Important: Despite involving powers of X, polynomial regression is still a linear model because:
- The relationship between Y and the coefficients () is linear
- We're finding a linear combination of polynomial features:
- From the model's perspective, these are just different input features (even though they're derived from X)
It's not linear in the input variable X, but it is linear in the parameters.
How It Works
- Convert original features (e.g.,
X) into polynomial features (e.g.,X,X²,X³) - Apply standard linear regression on these transformed features
- The model learns coefficients for each polynomial term
Practical Application: Single Input Feature
Scenario
Dataset generated by: Y = 0.8X² + 0.8X + 2 + noise
Step 1: Initial Linear Regression Attempt
Applying standard linear regression directly:
- R² Score: ~0.32 (very poor fit)
- Visualization: Straight line clearly doesn't match the curved data points
- Conclusion: Simple linear regression is inadequate
Step 2: Applying Polynomial Features
Using Scikit-learn's PolynomialFeatures class:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Evaluate
r2_score = model.score(X_test_poly, y_test)
Key Parameters
degree: The degree of the polynomial (e.g., 2 for quadratic)include_bias:True(default): Adds a column of ones (), representing the intercept termFalse: Does not include the bias term (useful when your linear regression already handles it)
Features Generated (for degree=2)
| Feature | Description |
|---|---|
| Constant (bias) | |
| Original feature | |
| Squared feature |
Important: Polynomial features are only applied to input (X) features, NOT the output (Y). This transformation must be applied to both training and test data.
Step 3: Improved Performance
After polynomial transformation:
- R² Score: ~0.9817 (excellent fit)
- Learned Coefficients:
- (actual: 0.8)
- (actual: 0.8)
- (actual: 2)
- Visualization: Fitted curve closely matches the non-linear pattern
- Note: Perfect coefficients aren't achieved due to random noise in the dataset
Overfitting and Underfitting
Underfitting (Degree Too Low)
Problem: Polynomial degree is too low to capture the true relationship.
- Model performs poorly on both training and test data
- Fails to capture underlying patterns
- Looks like a line when the data is clearly curved
Example: Using degree=1 for data that's actually quadratic.
Overfitting (Degree Too High)
Problem: Polynomial degree is too high, causing the model to fit noise and minor fluctuations.
- Training Performance: Exceptionally well (fits every point, including noise)
- Test Performance: Poor (fails to generalize)
- Visualization: Highly erratic curve trying to "catch" every training point
- Real-World Impact: The model learned specific patterns from training noise that don't exist in the true relationship
Example: Using degree=5 or degree=10 for data that's actually quadratic results in an overly wiggly curve.
Optimizing the Degree
Selecting the Right Degree
For Simple Cases:
- Trial and error
- Visual inspection of the plotted data
For Complex Cases:
- Cross-Validation: Test different degrees and measure performance
- Learning Curves: Analyze how training and test errors change with model complexity
- Validation Curves: Plot model performance as a function of polynomial degree
The Sweet Spot
For the example dataset (generated with degree 2):
- Optimal degree: 2
- degree < 2: Underfitting
- degree = 2: Perfect balance
- degree > 2: Overfitting
Multiple Polynomial Regression
Scenario
When you have multiple input features (e.g., X, Y) and a non-linear relationship with output (Z):
Z = X² + Y² + 0.2X + 0.2Y + 0.1XY + 2 + noise
This is a degree-2 polynomial with two input variables.
Visualization
- Data plotted in 3D appears as a curved, parabolic-like surface
- Cannot be fit by a simple plane (which multiple linear regression produces)
Feature Generation with Multiple Inputs
Using PolynomialFeatures with multiple inputs generates all combinations of powers up to the specified degree:
Example: 2 Inputs (X, Y) with degree=2
Generated features:
- (constant/bias)
- (interaction term)
Total: 6 features
Degree Definition
The degree of a term is the sum of the powers of all variables:
- : degree = 2 + 0 = 2 ✓
- : degree = 1 + 1 = 2 ✓
- : degree = 0 + 2 = 2 ✓
The maximum degree of any term equals the specified degree parameter.
Number of Generated Features
For n input features and degree d:
Number of features = C(n+d, d)
For 2 features and degree 2: C(4, 2) = 6 features ✓
Implementation
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Generate polynomial features for 2 inputs, degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Train and evaluate
model = LinearRegression()
model.fit(X_train_poly, y_train)
r2_train = model.score(X_train_poly, y_train)
r2_test = model.score(X_test_poly, y_test)
Performance Comparison
Simple Multiple Linear Regression
- Fits a flat plane through 3D space
- R² Score: Poor (e.g., 0.25-0.40)
- Cannot capture curved surface
Polynomial Multiple Linear Regression
- Fits a curved surface through 3D space
- R² Score: Excellent (e.g., 0.96-0.99)
- Captures the non-linear relationships and interactions
Overfitting in Multiple Polynomial Regression
Increasing degree too high leads to:
- Surface becomes highly irregular: Tries to precisely capture every training point
- Poor generalization: Model doesn't work well on test/unseen data
- Noise amplification: Learns patterns specific to training noise
Key Insights
Why Polynomial Regression is "Linear"
The crucial distinction:
| Aspect | Linear? |
|---|---|
| Relationship between Y and coefficients | Yes (linear) |
| Relationship between Y and input X | No (can be non-linear) |
Polynomial regression is linear in its coefficients, which is why it falls under the umbrella of linear regression methods and uses the same optimization techniques (like Ordinary Least Squares).
Feature Engineering Power
Polynomial regression demonstrates the power of feature engineering:
- Same algorithm (linear regression)
- Better features (polynomial terms)
- Much better results
Interaction Terms
Multiple polynomial regression captures interaction effects between variables:
- terms show how X and Y influence each other
- terms show complex interactions
- These are crucial for modeling real-world relationships
Summary
| Aspect | Simple Linear | Polynomial (degree=2) |
|---|---|---|
| Formula | ||
| Flexibility | Limited | Captures curves |
| Model Complexity | Simple | Moderate |
| R² on Non-linear Data | Poor | Excellent |
| Risk of Overfitting | Low | Moderate-High (if degree too high) |
| Use Case | Linear relationships | Non-linear relationships |
Polynomial regression is an essential technique for handling non-linear patterns while maintaining the interpretability and efficiency of linear regression methods.