Logistic Regression uses a loss function more suited to the task of categorization where the target is 0 or 1 rather than any number.
Definition Note: In this course, these definitions are used:
Loss is a measure of the difference of a single example to its target value while the
Cost is a measure of the losses over the training set
This is defined:
-
is the cost for a single data point, which is:
- \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=1$}\\ - \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=0$} \end{cases}$$ -
is the modelβs prediction, while is the target value.
-
where function is the sigmoid function.
The defining feature of this loss function is the fact that it uses two separate curves. One for the case when the target is zero or () and another for when the target is one (). Combined, these curves provide the behavior useful for a loss function, namely, being zero when the prediction matches the target and rapidly increasing in value as the prediction differs from the target. Consider the curves below:
The loss function above can be rewritten to be easier to implement.
This is a rather formidable-looking equation. It is less daunting when you consider can have only two values, 0 and 1. One can then consider the equation in two pieces:
when , the left-hand term is eliminated:
and when , the right-hand term is eliminated: