This module offers an overview of tree-based modeling techniques. Learn how they work, when to use them, their strengths and weaknesses, and their implementation including validation. Several applications are presented.
Tree-Based Modeling Techniques
Upon completion of this module, participants will have learned:
- The principle of regression trees
- How to build and interpret classification and regression trees
- How to measure the performance and fit of a regression tree, and how to improve it
- The advantages as well as the downsides of using this technique
This module is aimed at scientific staff who collect data and who must make decisions based on them. The regression techniques covered in this session will be particularly useful for people who are interested in exploring a new way of relating/predicting a variable to/from a set of explanatory variables.
- Participants should know the essential tools in statistics - descriptive statistics, both numerical (mean, standard deviation, standard error, etc.) and graphical (histogram, box-plot, scatter plot, etc.), and hypothesis testing and confidence intervals.
- Potential participants should either have attended the training session Fundamental Tools in Statistics or should possess a similar background.
- Working knowledge of ordinary multiple regression techniques is desirable but not mandatory.
A Different Approach
- C&RT methods as an alternative to linear regression
- A few examples
- Interpretation and use of tree-based models
- Models for a continuous response
- Models for a discrete response
Basic Model-Building: Tree Growing
- Criteria and algorithms for selecting optimal split
- Constraints on node and leave size
- Modifying control parameters
Model Improvement: Pruning the tree
- Reasons for pruning
- Pruning methods
- Model selection
Crossvalidation and alternative techniques
- Selecting a number of nodes
- Pruning output
- Understanding and interpreting software output for tree models
- Tables and summary statistics
- Graphical display
- Final Model Performance and Stability
- C&RT methods versus classical regression performance
- Sample size considerations
- Recent tools to improve the model stability: bagging and boosting
- Using trees for prediction purposes
- Combining tree-based techniques with classical regression tools
- An Overview of Other Tree-Based Methods
Recommended Duration: 1 day
Course materials :