×

Warning

The RokSprocket Module needs the RokSprocket Component enabled.

Cluster Analysis

Cluster Analysis

Learn how to take data (consumers, genes, stores, ...) and organise them into homogeneous groups for use in many applications, such as market analysis and biomedical data analysis, or as a pre-processing step for many data mining tasks. Cluster analysis comprises a collection of powerful techniques. Learn about this very active field of research in statistics and data mining, and discover new techniques.

Course Details

Learning Objectives

Learning Objectives:

Upon completion of this module, participants will be able to:

  • Understand the context of use for cluster analysis
  • Understand the principle of the clustering techniques: distance, agglomeration methods, and so on.
  • Understand the difference between the covered classification techniques
  • Select an appropriate clustering technique based on study objective & type of the classification variables
  • Interpret statistical software output
  • Determine the most likely number of clusters to retain
  • Validate and interpret the clusters formed
  • Appreciate the limitations and difficulties associated with cluster analysis

Target Audience

Target Audience:

This module is intended for all scientific staff who collect large datasets and who wish to graphically summarise them as well as identify groups of objects or individuals with similar characteristics.

Prerequisite

Prerequisite:

This workshop introduces the important concepts in statistics and data analysis. It assumes that participants have no previous knowledge of statistics or that they have not used it for a long time.

Course Outline

Course Outline:
  • Introduction to Cluster Analysis
  • Context of Use, Objective, Terminology
  • Principle of Hierarchical Methods: Determining the Distance Between Objects & Linking Clusters
  • Modeling Techniques
  • Optimization Methods
  • Other Methods: Fuzzy Clustering
  • Use and Interpretation of Clusters
  • Software Packages for Cluster Analysis
  • Summary

Practical Info

Practical Info:

Recommended Duration: 1-1.5 days

Course Materials:

  • Course notes on statistical techniques
  • Datasets to illustrate specific statistical concepts

Course Reviews

  • posted by RD Reeleder

    This course was excellent value for the money. Well-structured and with plenty of hands-on opportunities, it is suited to both beginners and to those with some experience in the technique. The instructors were familiar with all the software packages used by the students and were able to offer practical advice on getting the desired output. A very practical course; loaded with information I could put to use right away. Highly recommended.

  • posted by Ping Qiu

    This course is very well structured and instructed. I attended both the PCA and cluster analysis session followed by workshop. The instructor (Natalie) is very knowledgeable and very good at explaining difficult statistical problem in a simple way. This course is especially suitable for non-statistician who needs to perform hands on data analysis. This course also exposed students to many different popular statistics packages so you can get a flavor of each of them which helps me a lot in choosing tools in my future research.

Related Sessions

  • This module offers an easy introduction to R programming. Learn the basics of R programming and the commonly used plots and statistical tools without pain.

  • Several clustering methods. Learn about their principle, conditions of use, data preparation phases, common pitfalls as well as good practices. Several real life applications are presented.

  • Predictive analytics (PA) is on everyone's lips. But what is it really all about? Discover its principle, implementation, typical pitfalls and good practices. Learn about data wrangling and munging, a crucial step in predictive analytics. An overview of the most commonly used models is also presented.

  • Conceptually similar to PCA, correspondence analysis a method is designed for discovering associstions in categorical rather than continuous data. Discover the informative 2D-plots for efficient data mapping.