Logistic Regression

In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical. So it is a classification algorithm.

Logistic regression:
$$0 \le h_πœƒ(x) \le 1 $$

$$h_πœƒ(x) = g(πœƒ^Tx)$$

$$g(z) = \frac{1}{1+e^{-z}}$$
Logistic-curve.svg

The above is the figure of sigmoid function. In machine learning, sigmoid function and logistic function mean the same thing.

When z>=0, g(z)>=0.5 and when z<=0, g(z)<=0.5. So we can predict y=1 if h(x)>=0.5. In this way, we can use logistic regression to form a classifier.
$$πœƒ^Tx \ge 0, h_πœƒ(x)=g(πœƒ^Tx) \ge 0.5, y=1$$

$$πœƒ^Tx \lt 0, h_πœƒ(x)=g(πœƒ^Tx) \lt 0.5, y=0$$

Cost Function

$$h_πœƒ(x)= \frac{1}{1+e^{πœƒ^Tx}}$$

$$Cost(h_πœƒ(x), y)= f(n) =
\begin{cases}
-log(h(x)), & \text{if $y=1$} \\
-log(1 - h(x)), & \text{if $y=0$} \\
\end{cases}$$

$$Cost(h_πœƒ(x), y)=-ylog(h(x))-(1-y)log(1-h(x))$$

$$J(πœƒ) = \frac{1}{m} \sum_{i=1}^m Cost(h(x^{(i)}),y^{(i)})$$

$$J(πœƒ) = \frac{1}{m} \sum_{i=1}^m [-y^{(i)}log(h(x^{(i)}))-(1-y^{(i)})log(1-h(x^{(i)}))]$$

Gradient Descent

The gradient of the cost is a vector of the same length as πœƒ where the $j^{th}$ element (for j = 0,1,…,n) is defined as follows:
blog9_1.png

So we update πœƒ in this way:
Repeat {
lr2.png
} (simultaneously update for every j = 0,…,n)

Overfitting & Regularization

There are 2 options for addressing overfitting, one is to reduce the number of features, the other is regularization.

In regularization:

  • Keep all the features, but reduce magnitude/values of parameters πœƒ.
  • Works well when we have a lot of features, each of which contributes a bit to predicting y.

Cost function for regularized logistic regression

blog9_2.png

Gradient for regularized logistic regression

blog9_3.png
blog9_4.png

Note

Note that you should not regularize the parameter $πœƒ_0$. That is to say, you should not be regularizing the first parameter in theta.

Implementation

Now let’s implement the algorithm in Matlab.
First, suppose you have the following function to calculate simgoid:
sigmoid.m

1
2
3
4
5
6
7
8
9
10
11
12
function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
% J = SIGMOID(z) computes the sigmoid of z.
g = zeros(size(z));
gSize = size(z);

for i=1:gSize(1)
for j=1:gSize(2)
g(i,j) = 1 / (1 + exp(-z(i,j)));
end
end
end

Here is the code to compute initial cost and gradient:
costFunction.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = costFunction(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost

% Initialize some useful values
m = length(y); % number of training examples

J = 0;
grad = zeros(size(theta));

for i = 1:m
h = sigmoid(X(i,:)*theta);
J = J + (-y(i)*log(h) - (1-y(i))*log(1-h));
end
J = J / m;

thetaSize = size(theta);
for j = 1:thetaSize(1)
for i = 1:m
h = sigmoid(X(i,:)*theta);
grad(j) = grad(j) + (h-y(i))*X(i,j);
end
grad(j) = grad(j) / m;
end
end

If You want to compute initial cost and gradient for regularized logistic regression, then take a look at this:
costFunctionReg.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
function [J, grad] = costFunctionReg(theta, X, y, lambda)
% costFunctionReg Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

J = 0;
grad = zeros(size(theta));

thetaSize = size(theta);
for i = 1:m
h = sigmoid(X(i,:)*theta);
J = J + (-y(i)*log(h) - (1-y(i))*log(1-h));
end
J = J/m + lambda/(2*m)*sum(theta(2:thetaSize(1),:).^2);

thetaSize = size(theta);
for j = 1:thetaSize(1)
for i = 1:m
h = sigmoid(X(i,:)*theta);
grad(j) = grad(j) + (h-y(i))*X(i,j);
end
if 1==j
grad(j) = grad(j)/m;
else
grad(j) = grad(j)/m + lambda/m*theta(j);
end
end
end