The RokSprocket Module needs the RokSprocket Component enabled.

Tree-Based Modeling Techniques

Tree-Based Modeling Techniques

This module offers an overview of tree-based modeling techniques. Learn how they work, when to use them, their strengths and weaknesses, and their implementation including validation. Several applications are presented.

Course Details

Learning Objectives

Learning Objectives:

Upon completion of this module, participants will have learned:

  • The principle of regression trees
  • How to build and interpret classification and regression trees
  • How to measure the performance and fit of a regression tree, and how to improve it
  • The advantages as well as the downsides of using this technique

Target Audience

Target Audience:

This module is aimed at scientific staff who collect data and who must make decisions based on them. The regression techniques covered in this session will be particularly useful for people who are interested in exploring a new way of relating/predicting a variable to/from a set of explanatory variables.


  • Participants should know the essential tools in statistics - descriptive statistics, both numerical (mean, standard deviation, standard error, etc.) and graphical (histogram, box-plot, scatter plot, etc.), and hypothesis testing and confidence intervals.
  • Potential participants should either have attended the training session Fundamental Tools in Statistics or should possess a similar background.
  • Working knowledge of ordinary multiple regression techniques is desirable but not mandatory.

Course Outline

Course Outline:

A Different Approach

  • C&RT methods as an alternative to linear regression
  • A few examples
  • Interpretation and use of tree-based models
  • Models for a continuous response
  • Models for a discrete response

Basic Model-Building: Tree Growing

  • Criteria and algorithms for selecting optimal split
  • Constraints on node and leave size
  • Modifying control parameters

Model Improvement: Pruning the tree

  • Reasons for pruning
  • Pruning methods
  • Model selection

Crossvalidation and alternative techniques

  • Selecting a number of nodes
  • Pruning output

Detailed Output

  • Understanding and interpreting software output for tree models
  • Tables and summary statistics
  • Graphical display
  • Final Model Performance and Stability
    • C&RT methods versus classical regression performance
    • Sample size considerations
    • Recent tools to improve the model stability: bagging and boosting
  • Advanced Methods
    • Using trees for prediction purposes
    • Combining tree-based techniques with classical regression tools
    • An Overview of Other Tree-Based Methods

Practical Info

Practical Info:


Recommended Duration: 1 day

Course materials :

  • Course notes on statistical techniques
  • Sample datasets

    Related Sessions

    • An applied set of modules with focus on the most widely used multivariate methods and their applications in several fields of application. Learn about the principle of the methods, the data needed, and the information they provide.

    • Learn about preference mapping techniques to explore and understand consumer preferences. Applications dealing with segmentation and the identification of niche markets are discussed. Focus on pitfalls and good practices.

    • Predictive analytics (PA) is on everyone's lips. But what is it really all about? Discover its principle, implementation, typical pitfalls and good practices. Learn about data wrangling and munging, a crucial step in predictive analytics. An overview of the most commonly used models is also presented.

    • The primary goal of this method is to discover which variables have the best ability of discriminating between two or more known groups in your data. Discrimimant analysis may also be used to build predictive analytics models.