Decision trees can be defined as structures that can be used to develop a graphical user interface where large data is divided into successively smaller sets of records. Decision trees are governed by a set of rules that are used to produce successively smaller sets of records of the larger records.
This technique is used to perform both classification and estimation tasks. The division of larger records (parent node) is called splitting and the successively smaller sets of records are called child nodes.
The top of the tree is the root node, while the subsequent rules are interior nodes, and the end of the node or the node with only one connection are leaf nodes. Decision trees run on split search algorithms using different strategies to make splits. Since, the variable under observation is a binary variable “1” or “0”, we use “decision” to assess the subtree model.
Advantage of Decision Tree
- The advantages of using decision tree is they are not so complex structures.
- They are not sensitive to usual values (outliers) in the data and thus provide better performance.
- They reveal a lot about the data and require less data preparation.