# 1. 基本模型

Logistic 回归的基本形式为

$$f(\boldsymbol{x}) = g(\boldsymbol{w}^\mathrm{T} \boldsymbol{x}) = \frac{1}{1 + e^{-\boldsymbol{w}^\mathrm{T} \boldsymbol{x}}} \tag{1}$$

$$\boldsymbol{w} = \begin{bmatrix} w_0 \\ w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix} , \; \boldsymbol{x} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$$

$f(\boldsymbol{x})$ 为预测样本为正例的概率，即

$$P(y = 1|\boldsymbol{x};\boldsymbol{w}) = f(\boldsymbol{x}) \tag{2}$$

$$P(y = 0|\boldsymbol{x};\boldsymbol{w}) = 1 – f(\boldsymbol{x}) \tag{3}$$

$$P(y|\boldsymbol{x};\boldsymbol{w}) = \big(f(\boldsymbol{x})\big)^y \big(1 – f(\boldsymbol{x})\big)^{(1 – y)} \tag{4}$$

此外，由式 $(1)$ 可以解得

$$\boldsymbol{w}^\mathrm{T} \boldsymbol{x} = \log \frac{f(\boldsymbol{x})}{1 – f(\boldsymbol{x})} = \log \frac{P(y = 1|\boldsymbol{x};\boldsymbol{w})}{P(y = 0|\boldsymbol{x};\boldsymbol{w})}$$

# 2. Sigmoid 函数

式 $(1)$ 中的 $g(z)$ 为 Sigmoid 函数

$$g(z) = \frac{1}{1 + e^{-z}} \tag{5}$$

计算 $g(z)$ 的导数为

\begin{aligned} g'(z) &= -\frac{1}{(1 + e^{-z})^2} (-e^{-z}) \\ &= \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ &= \frac{1}{1 + e^{-z}} \bigg( 1 – \frac{1}{1 + e^{-z}} \bigg) \\ &= g(z) \big(1 – g(z)\big) \end{aligned} \tag{6}

# 3. 最大似然估计

由式 $(4)$，可以得到似然函数

\begin{aligned} L(\boldsymbol{w}) &= p(\boldsymbol{y}|X; \boldsymbol{w}) \\ &= \prod_{i=1}^m p(y^{(i)}|\boldsymbol{x}^{(i)}; \boldsymbol{w}) \\ &= \prod_{i=1}^m \big(f(\boldsymbol{x}^{(i)})\big)^{y^{(i)}} \big(1 – f(\boldsymbol{x}^{(i)})\big)^{(1 – y^{(i)})} \end{aligned} \tag{7}

对数似然函数为

\begin{aligned} l(\boldsymbol{w}) &= \log L(\boldsymbol{w}) \\ &= \sum_{i=1}^m y^{(i)} \log f(\boldsymbol{x}^{(i)}) + (1 – y^{(i)}) \log \big(1 – f(\boldsymbol{x}^{(i)})\big) \end{aligned} \tag{8}

# 4. 梯度下降

定义代价函数为负的对数似然函数，即

\begin{aligned} J(\boldsymbol{w}) = -l(\boldsymbol{w}) = – \sum_{i=1}^m y^{(i)} \log f(\boldsymbol{x}^{(i)}) + (1 – y^{(i)}) \log \big(1 – f(\boldsymbol{x}^{(i)})\big) \end{aligned} \tag{9}

$$Loss(\boldsymbol{w}) = -\Big( y \log f(\boldsymbol{x}) + (1 – y) \log \big(1 – f(\boldsymbol{x})\big) \Big) \tag{10}$$

计算式 $(9)$ 关于第 $j$ 个参数 $\boldsymbol{w}_j$ 的偏导，对于单个样例 $(\boldsymbol{x}, y)$，有

\begin{aligned} \frac{\partial}{\partial \boldsymbol w_j} J(\boldsymbol{w}) &= -\bigg( \frac{y}{f(\boldsymbol{x})} – \frac{1-y}{1 – f(\boldsymbol{x})} \bigg) \frac{\partial}{\partial \boldsymbol w_j} f(\boldsymbol{x}) \\ &= -\bigg( \frac{y}{f(\boldsymbol{x})} – \frac{1-y}{1 – f(\boldsymbol{x})} \bigg) g(\boldsymbol{w}^\mathrm{T} \boldsymbol{x}) \big(1 – g(\boldsymbol{w}^\mathrm{T} \boldsymbol{x})\big) \frac{\partial}{\partial \boldsymbol w_j} \boldsymbol{w}^\mathrm{T} \boldsymbol{x} \\ &= -\bigg( \frac{y}{f(\boldsymbol{x})} – \frac{1-y}{1 – f(\boldsymbol{x})} \bigg) f(\boldsymbol{x}) \big(1 – f(\boldsymbol{x})\big) x_j \\ &= -\Big( y\big(1 – f(\boldsymbol{x})\big) – (1 – y) f(\boldsymbol{x}) \Big) x_j \\ &= -\big(y – f(\boldsymbol{x}) \big) x_j \end{aligned} \tag{11}

$$w_j := w_j – \alpha \frac{\partial}{\partial w_j} J(\boldsymbol{w})$$

$$w_j := w_j + \alpha \big(y^{(i)} – f(\boldsymbol{x}^{(i)}) \big) \boldsymbol{x}^{(i)}_j \tag{12}$$