For a broad discussion see:
Y. Cheng, Mean-shift, mode seeking, and clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.17, 1995, pp. 790-799
The radius or bandwidth is tied to the ‘width’ of the distribution and is data dependent. Note that the data should be normalized first so that all the dimensions have the same bandwidth. The rate determines how large the gradient decent steps are. The smaller the rate, the more iterations are needed for convergence, but the more likely minima are not overshot. A reasonable value for the rate is .2. Low value of the rate may require an increase in maxIter. Increase maxIter until convergence occurs regularly for a given data set (versus the algorithm being cut off at maxIter).