Lecture 18: Stochastic, Constrained, and Penalized Optimization. Difficulties of optimizing statistical functions when the data is large. Sampling as an alternative to averaging over the whole data. Stochastic gradient descent and stochastic Newton's method as an application of sampling. Simulated annealing to escape local minima. Constrained optimization: an example of why constraints matter. The method of Lagrange multipliers for equality constraints. Lagrange multipliers as shadow prices, indicating how much a marginal weakening of the constraint would improve the optimum. Inequality constraints and their Lagrange multipliers. Mathematical programming. Barrier methods for inequality constraints. The correspondence between constrained and penalized optimization.
Optional reading 1: Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning"
Please comment on the article here: Three-Toed Sloth