Random Forest
Random Forest is a schema for building a classification ensemble with a set of decision trees that grow in the different bootstrapped aggregation of the training set on the basis of CART (Classification and Regression Tree) and the Bagging techniques (Breiman, 2001). Instead of exploring the optimal split predictor among all controlled variables, this learning algorithm determines the best parameter at each node in one decision tree by randomly selecting a number of features. Unquestionably, this process not only ensures the model scale well when each feature vector owns many features, but also lessons the interdependence between the features. In other words, from the viewpoint of Random Forest, the attributes with low correlation are less vulnerable to inherent noise in the data (Criminisi, Shotton & Konukoglu, 2012). On the other hand, the diversity in each tree effectively restrains the possibility of an overfitting issue. The classification decision is yielded by averaging the mode of the class output by individual trees.
In view of Random Forest’s classification performance, Breiman (2001) regarded the Out-of-Bag (OOB) error rate as a signal of how well a forest classifier works on the data. The estimate of out-of-bag error rate can replace cross validation approach to
examine the explanation ratio of Random Forest model through the average of misclassification results over all trees made from the bootstrap sample. Theoretically, the classification strength of each individual tree and the correlation between trees affect the error rate of the Random Forest classifier. Therefore, to increase selected features improves both the correlation between the trees and the strength of each tree.
Compared with CART approach, Random Forest method fits a multitude of CARTs into bootstrap sets resampled from the training set. Moreover, it precedes the forecasting work through the mode of the predictions iterated by the fitted CARTs. In order to avoid disadvantages of CART- high variance, the modification of the Random Forest method not only adds the Bagging method but also adopts randomized node optimization to further reduce the CART variance (Mei, He, T., & Qu, 2014). In other words, single decision trees often lead to high variance or high bias.
Theoretically, Random Forest approach can produce a reasonable predictive model to form the highly accurate prediction by getting a natural balance between the two extremes: high variance or high bias. Many researches have demonstrated that Random Forest classifiers can achieve high accuracy in classifying data in domains of high dimensions with many classes (Banfield, Hall, Bowyer, & Kegelmeyer, 2007). Moreover, studies- the real time key point recognition (Parkour, 2013) and semantic segmentation (Shotton, Johnson & Cipolla, 2008) adopting Random Forest algorithm, are also the evidences to illustrate the better or comparable performance to other classification methods. Generally, the practice of Random Forest is realized on different design of forecasting framework. One instance is that scholars (Georga, Protopappas, Polyzos & Fotiadis, 2012) employed the Random Forests regression technique to solve the problem of subcutaneous glucose concentration prediction in type 1 diabetes on the basis of a multivariate dataset obtained under free-living conditions.
Theoretically, Random Forest approach can produce a reasonable predictive model to form the highly accurate prediction by getting a natural balance between the two extremes: high variance or high bias. Many researches have demonstrated that Random Forest classifiers can achieve high accuracy in classifying data in domains of high dimensions with many classes (Banfield, Hall, Bowyer, & Kegelmeyer, 2007). Moreover, studies- the real time key point recognition (Parkour, 2013) and semantic segmentation (Shotton, Johnson & Cipolla, 2008) adopting Random Forest algorithm, are also the evidences to illustrate the better or comparable performance to other classification methods. Generally, the practice of Random Forest is realized on different design of forecasting framework. One instance is that scholars (Georga, Protopappas, Polyzos & Fotiadis, 2012) employed the Random Forests regression technique to solve the problem of subcutaneous glucose concentration prediction in type 1 diabetes on the basis of a multivariate dataset obtained under free-living conditions.