In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical. So it is a classification algorithm.

Logistic regression:

$$0 \le h_π(x) \le 1 $$

$$h_π(x) = g(π^Tx)$$

$$g(z) = \frac{1}{1+e^{-z}}$$

The above is the figure of sigmoid function. In machine learning, sigmoid function and logistic function mean the same thing.

When z>=0, g(z)>=0.5 and when z<=0, g(z)<=0.5. So we can predict y=1 if h(x)>=0.5. In this way, we can use logistic regression to form a classifier.

$$π^Tx \ge 0, h_π(x)=g(π^Tx) \ge 0.5, y=1$$

$$π^Tx \lt 0, h_π(x)=g(π^Tx) \lt 0.5, y=0$$

### Cost Function

$$h_π(x)= \frac{1}{1+e^{π^Tx}}$$

$$Cost(h_π(x), y)= f(n) =

\begin{cases}

-log(h(x)), & \text{if $y=1$} \\

-log(1 - h(x)), & \text{if $y=0$} \\

\end{cases}$$

$$Cost(h_π(x), y)=-ylog(h(x))-(1-y)log(1-h(x))$$

$$J(π) = \frac{1}{m} \sum_{i=1}^m Cost(h(x^{(i)}),y^{(i)})$$

$$J(π) = \frac{1}{m} \sum_{i=1}^m [-y^{(i)}log(h(x^{(i)}))-(1-y^{(i)})log(1-h(x^{(i)}))]$$

### Gradient Descent

The gradient of the cost is a vector of the same length as π where the $j^{th}$ element (for j = 0,1,β¦,n) is defined as follows:

So we update π in this way:

Repeat {

} (simultaneously update for every j = 0,β¦,n)

### Overfitting & Regularization

There are 2 options for addressing overfitting, one is to reduce the number of features, the other is regularization.

In regularization:

- Keep all the features, but reduce magnitude/values of parameters π.
- Works well when we have a lot of features, each of which contributes a bit to predicting y.

#### Cost function for regularized logistic regression

#### Gradient for regularized logistic regression

#### Note

**Note that you should not regularize the parameter $π_0$**. That is to say, you should not be regularizing the first parameter in theta.

### Implementation

Now letβs implement the algorithm in Matlab.

First, suppose you have the following function to calculate simgoid:**sigmoid.m**

1 | function g = sigmoid(z) |

Here is the code to compute initial cost and gradient:**costFunction.m**

1 | function [J, grad] = costFunction(theta, X, y) |

If You want to compute initial cost and gradient for **regularized** logistic regression, then take a look at this:**costFunctionReg.m**

1 | function [J, grad] = costFunctionReg(theta, X, y, lambda) |