Eva Huang

KAGGLE ENSEMBLING GUIDE

This is a reading digest of kaggle’s ensembling guide

This is how you win ML competitions: you take other peoples’ work and ensemble them together.” Vitaly Kuznetsov NIPS2014

Creating ensembles from submission files

The most basic and convenient way to ensemble is to ensemble Kaggle submission CSV files.

Voting ensembles

Averaging

Stacked Generalization & Blending

Netflix

Stacked Generalization

The basic idea behind stacked generalization is to use a pool of base classifiers, then using another classifier to combine their predictions, with the aim of reducing the generalization error.

Blending

With blending, instead of creating out-of-fold predictions for the train set, you create a small holdout set of say 10% of the train set. The stacker model then trains on this holdout set only.

Stacking Methods

Everything is a hyper-parameter

When doing stacking/blending/meta-modeling it is healthy to think of every action as a hyper-parameter for the stacker model.

Model Selection

You can further optimize scores by combining multiple ensembled models.

Automation


Share this: