Cuttlefish predictions #6

nickponline · 2013-12-13T18:18:35Z

It's a hard learning problem so I used a number of predictors to reduce the variance of the final estimators and try and prevent any overfitting.

mdagost · 2013-12-13T23:17:43Z

Nice! What was the best accuracy you got from averaging all of those models? What it significantly better than jus the random forest?

nickponline · 2013-12-14T00:36:52Z

It was a fair improvement, around 0.02 which seemed to be significant by a
Wilcoxon signed rank test, but it's difficult to tell for sure. There isn't
a lot separating the methods on this dataset when they're all correctly
trained. But ultimately you want the best generalization. Random forests
are fairly resistant to overfitting anyway - so the voting was more to
offset the variance introduced by mis-classifications. If a classifier
mis-classifies an example then the other classifiers can correct the error
by out-voting it.

There are a bunch of optimizations that can improve things further like
learning the optimal ensemble of classifiers by using predictive
probabilities instead of class labels. This seems to improve things further
although it's more expensive to train. I had initially included Gaussian
Processes and Boltzmann Machines, but over-fitting with GPs is more subtle
and RBM are pretty hard to analyze.

I'm trying to cascade the classifiers based on the residue (examples
classified with low confidence) and re-learning and only classifying when
the classifier is fairly certain. This seems to be improving things.

On Fri, Dec 13, 2013 at 3:17 PM, mdagost [email protected] wrote:

Nice! What was the best accuracy you got from averaging all of those
models? What it significantly better than jus the random forest?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6#issuecomment-30551705
.

nickponline · 2013-12-14T00:51:26Z

Just checked and my ensemble differs from a pure random forest by 11 out of
499 predictions (2.2%) so it seems to be a consistent result.

Nick

On Fri, Dec 13, 2013 at 4:36 PM, Nicholas Pilkington <
[email protected]> wrote:

It was a fair improvement, around 0.02 which seemed to be significant by a
Wilcoxon signed rank test, but it's difficult to tell for sure. There isn't
a lot separating the methods on this dataset when they're all correctly
trained. But ultimately you want the best generalization. Random forests
are fairly resistant to overfitting anyway - so the voting was more to
offset the variance introduced by mis-classifications. If a classifier
mis-classifies an example then the other classifiers can correct the error
by out-voting it.

There are a bunch of optimizations that can improve things further like
learning the optimal ensemble of classifiers by using predictive
probabilities instead of class labels. This seems to improve things further
although it's more expensive to train. I had initially included Gaussian
Processes and Boltzmann Machines, but over-fitting with GPs is more subtle
and RBM are pretty hard to analyze.

I'm trying to cascade the classifiers based on the residue (examples
classified with low confidence) and re-learning and only classifying when
the classifier is fairly certain. This seems to be improving things.

On Fri, Dec 13, 2013 at 3:17 PM, mdagost [email protected] wrote:

Nice! What was the best accuracy you got from averaging all of those
models? What it significantly better than jus the random forest?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6#issuecomment-30551705
.

Adding predictions

9ecb46b

Adding non ensemble predictions

950c015

nickponline and others added 8 commits December 13, 2013 16:51

Removing tmp files

b7c8b47

Removing more tmp files

ce452e8

Code cleanup and comments

d86304c

Adding ensemble vs brandf predictor

031e4ba

Adding more splits

51069e9

Adding more models

c12cba1

Smoothing predictions

d1a7b3f

Removing nohup.out

1b4fbd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuttlefish predictions #6

Cuttlefish predictions #6

nickponline commented Dec 13, 2013

mdagost commented Dec 13, 2013

nickponline commented Dec 14, 2013

nickponline commented Dec 14, 2013

Cuttlefish predictions #6

Are you sure you want to change the base?

Cuttlefish predictions #6

Conversation

nickponline commented Dec 13, 2013

mdagost commented Dec 13, 2013

nickponline commented Dec 14, 2013

nickponline commented Dec 14, 2013