Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuttlefish predictions #6

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Cuttlefish predictions #6

wants to merge 10 commits into from

Conversation

nickponline
Copy link

It's a hard learning problem so I used a number of predictors to reduce the variance of the final estimators and try and prevent any overfitting.

@mdagost
Copy link

mdagost commented Dec 13, 2013

Nice! What was the best accuracy you got from averaging all of those models? What it significantly better than jus the random forest?

@nickponline
Copy link
Author

It was a fair improvement, around 0.02 which seemed to be significant by a
Wilcoxon signed rank test, but it's difficult to tell for sure. There isn't
a lot separating the methods on this dataset when they're all correctly
trained. But ultimately you want the best generalization. Random forests
are fairly resistant to overfitting anyway - so the voting was more to
offset the variance introduced by mis-classifications. If a classifier
mis-classifies an example then the other classifiers can correct the error
by out-voting it.

There are a bunch of optimizations that can improve things further like
learning the optimal ensemble of classifiers by using predictive
probabilities instead of class labels. This seems to improve things further
although it's more expensive to train. I had initially included Gaussian
Processes and Boltzmann Machines, but over-fitting with GPs is more subtle
and RBM are pretty hard to analyze.

I'm trying to cascade the classifiers based on the residue (examples
classified with low confidence) and re-learning and only classifying when
the classifier is fairly certain. This seems to be improving things.

On Fri, Dec 13, 2013 at 3:17 PM, mdagost [email protected] wrote:

Nice! What was the best accuracy you got from averaging all of those
models? What it significantly better than jus the random forest?


Reply to this email directly or view it on GitHubhttps://github.com//pull/6#issuecomment-30551705
.

@nickponline
Copy link
Author

Just checked and my ensemble differs from a pure random forest by 11 out of
499 predictions (2.2%) so it seems to be a consistent result.

Nick

On Fri, Dec 13, 2013 at 4:36 PM, Nicholas Pilkington <
[email protected]> wrote:

It was a fair improvement, around 0.02 which seemed to be significant by a
Wilcoxon signed rank test, but it's difficult to tell for sure. There isn't
a lot separating the methods on this dataset when they're all correctly
trained. But ultimately you want the best generalization. Random forests
are fairly resistant to overfitting anyway - so the voting was more to
offset the variance introduced by mis-classifications. If a classifier
mis-classifies an example then the other classifiers can correct the error
by out-voting it.

There are a bunch of optimizations that can improve things further like
learning the optimal ensemble of classifiers by using predictive
probabilities instead of class labels. This seems to improve things further
although it's more expensive to train. I had initially included Gaussian
Processes and Boltzmann Machines, but over-fitting with GPs is more subtle
and RBM are pretty hard to analyze.

I'm trying to cascade the classifiers based on the residue (examples
classified with low confidence) and re-learning and only classifying when
the classifier is fairly certain. This seems to be improving things.

On Fri, Dec 13, 2013 at 3:17 PM, mdagost [email protected] wrote:

Nice! What was the best accuracy you got from averaging all of those
models? What it significantly better than jus the random forest?


Reply to this email directly or view it on GitHubhttps://github.com//pull/6#issuecomment-30551705
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants