Logistic Regression

Logistic Regression is one of the most important ML algorithms and a natural bridge to Deep Learning. Its building block, the Perceptron, is closely related to Logistic Regression, so understanding it pays dividends later.

What is it?

A linear classifier that models the probability of a class using the logistic (sigmoid) link.
This note takes the probabilistic point of view rather than the geometric one.

When it works best

Works well when data is linearly separable or almost linearly separable.
Linearly separable: classes can be split by a straight line (2D), plane (3D), or hyperplane (higher‑D).
Almost separable: allow a few misclassifications, but a single boundary still explains most points.
If classes are highly non‑linear, a single linear boundary will struggle → consider feature engineering or non‑linear models.

The Perceptron Trick (intuition and simple learner)

Purpose: a simple approach to learn a separating line; it’s foundational for Deep Learning.

Notes:

Simple and easy to implement, though not always the optimal logistic solution.
Goal: learn coefficients (A, B, C) for the line A*x + B*y + C = 0 that best separate the classes.

Core loop

Initialise (A, B, C) randomly (or zeros) → an initial, likely poor line.
Repeat for several epochs or until convergence:
- Pick a random training point (X_i, Y_true).
- Ask: is it correctly classified by the current line?
- If misclassified, update the line to move toward the point if it’s positive, or away if it’s negative.

Positive/negative regions for `Ax + By + C = 0`

Point (x1, y1) is positive if A*x1 + B*y1 + C > 0.
Negative if A*x1 + B*y1 + C < 0.
On the line if A*x1 + B*y1 + C = 0.

How changing coefficients moves the line

Change C → parallel shift up/down.
Change A → rotate around Y‑axis.
Change B → rotate around X‑axis.

Update rules with augmented coordinates

Use augmented inputs with X0 = 1 and weights W = [W0, W1, W2] where W0=C, W1=A, W2=B.

Case 1: Negative point (Y_true = 0) predicted positive (Y_pred = 1) → subtract: W_new = W_old - eta * X_i.
Case 2: Positive point (Y_true = 1) predicted negative (Y_pred = 0) → add: W_new = W_old + eta * X_i.
- eta is the learning rate (e.g., 0.01).

Unified update rule

Let S = W·X_i (dot product with augmented X_i). Predict Y_pred = 1 if S >= 0, else 0.

W_new = W_old + eta * (Y_true - Y_pred) * X_i

Behaviours:

If correct: Y_true == Y_pred → no change.
If positive misclassified (1 vs 0): add eta * X_i.
If negative misclassified (0 vs 1): subtract eta * X_i.

Convergence and stopping

Stop when there are no misclassified points (convergence) or after a fixed number of epochs.

Perceptron‑style training loop (pseudo‑Python)

import numpy as np

def train_perceptron(X, y, epochs=1000, eta=0.01, seed=42):
    # X: shape (n_samples, 2) for features [x, y]; we'll augment inside
    rng = np.random.default_rng(seed)
    X_aug = np.hstack([np.ones((X.shape[0], 1)), X])  # add X0 = 1
    W = np.zeros(X_aug.shape[1])  # [C, A, B]

    for _ in range(epochs):
        i = rng.integers(0, X_aug.shape[0])
        Xi, yi = X_aug[i], y[i]
        S = W @ Xi
        y_pred = 1 if S >= 0 else 0
        W = W + eta * (yi - y_pred) * Xi
    return W

Key takeaways

Logistic Regression is central to ML and underpins Perceptron/Deep Learning ideas.
It works best when data is (almost) linearly separable; a single linear boundary is assumed.
The Perceptron Trick offers a simple learning procedure with intuitive add/subtract updates.
Use augmented coordinates and the unified rule W_new = W_old + eta * (Y_true - Y_pred) * X_i.

What is it?​

When it works best​

The Perceptron Trick (intuition and simple learner)​

Core loop​

Positive/negative regions for A*x + B*y + C = 0​

How changing coefficients moves the line​

Update rules with augmented coordinates​

Unified update rule​

Convergence and stopping​

Perceptron‑style training loop (pseudo‑Python)​

Key takeaways​