Stochastic gradient descent

SGD is the same as gradient descent, except that it is used for only partial data to train every time. The parameter is called mini-batch size. Theoretically, even one example can be used for training.

In practice, it is better to experiment with various numbers. In the next section, we will discuss convolutional neural networks that work better on image data than the standard ANN.

Visit https://yihui.name/animation/example/grad-desc/ to see a great visualization of gradient descent on convex and non-convex surfaces.

Submit a Comment Cancel reply

Cart

Troubleshooting and online tutorials

Product Categories

Product tags