long read

Preliminaries It is usually the case that we have a dataset $\mathcal{D} = {x_1, \cdots, x_N}$ and a parametrized family of distributions $p_\theta (x)$. We would like to find the parameters that best describe the data. This is typically done using [[MLE and MAP|maximum likelihood estimation (MLE)]]. In this method, the optimal parameters are those that maximize the log likelihood of the data. Mathematically speaking, $$ \hat{\theta}_\mathrm{MLE} = \arg\max_\theta \frac{1}{N}\sum_{i=1}^{N}\log p_{\theta}(x_i)....

long read

Variational Inference

Optimization Primer