Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Learning intercepts? #10

Open
sergeyf opened this issue Apr 24, 2017 · 10 comments
Open

Learning intercepts? #10

sergeyf opened this issue Apr 24, 2017 · 10 comments

Comments

@sergeyf
Copy link

sergeyf commented Apr 24, 2017

First, thanks for the hard work on this package. It looks like a great way to get higher-order interactions to potentially improve on the standard FM models/packages.

It looks like the constant offsets/intercepts are not learned. Is this a to-do item, or is it something that's easy to fix by, for example, doing a global demean of the training outputs y_train in the case of regression? What about classification? Does it matter at all in that case?

@vene
Copy link
Collaborator

vene commented Apr 24, 2017

Hi, thanks a lot!

First of all, I agree this is a feature that should be implemented, and it should not be too difficult. Would you be interested in contributing it? I am a bit caught up the following month, but I can look into it afterwards.

Regarding workarounds:

I think in the case of regression it's simply a case of subtracting the mean of y_train, and adding it back at the end, as you say. For classification this is not the case, but using sample weights can deal with imbalanced classes quite well.

A simple way around this is to add a dummy column, as performed by the fit_lower='augment' option.

If you're training a third order FM, as long as you use fit_lower='augment', fit_linear=True a dummy column is added so basically an intercept is learned.

Otherwise, you can do this in user code easily by using add_dummy_feature.

Of course, this workaround leads to a regularized intercept which might not be ideal.

HTH!

@sergeyf
Copy link
Author

sergeyf commented Apr 24, 2017

That is very helpful and answers my question, thanks.

I might have time to contribute this feature, depending on the complexity. What would be involved?

@vene
Copy link
Collaborator

vene commented Apr 25, 2017

The first step should be figuring out what objective function we want, so we can work out the intercept upgrades. Then, writing some failing unit tests.

@sergeyf
Copy link
Author

sergeyf commented Apr 25, 2017

Sure, sounds fun. I imagine we can just take the current objective functions and stick a + b into them?

@vonjackustc
Copy link

If I'm training a second order FM, how can I fit the intercept? I see "use fit_lower='augment', fit_linear=True" can not give mi an intercept. Thank you!

@vene
Copy link
Collaborator

vene commented Jan 29, 2019

If i'm not mistaken, if you use fit_lower='augment', fit_linear=True, you will (indirectly) be learning an intercept; check the dimensionality of the learned weight vectors and matrices: they will be greater by 1 than the input features. The first entry should correspond to the intercept.

@vonjackustc
Copy link

I set parameters as follows:
loss = 'logistic', fit_lower='augment', fit_linear=1, degree=2, n_components=2
My feature number is 29 and len(fm.P_) == 29 and fm.P_.shape == (1, 2, 29).
Is there anything wrong I've done?

@vene
Copy link
Collaborator

vene commented Jan 29, 2019

Thanks for pointing that out, you are not doing anything wrong. Indeed, fit_lower='augment' was designed with lower degrees in mind, not with linear terms in mind. If you set fit_linear=False, fit_lower='augment' you will indeed get fm.P_ to be of width 30, but there will be no linear term fm.w_.

This is kind of by design of the API, and I realize it is not ideal. We could change the api with a deprecation cycle, but I would prefer a pr to actually learn the bias by coordinate descent explicitly.

For your use case, I recommend that you just add the dummy feature (a column of all ones) explicitly:

from polylearn import FactorizationMachine
from sklearn.preprocessing import add_dummy_feature
X_aug = add_dummy_feature(X, value=1)
fm = FactorizationMachine(degree=2, fit_lower=None, fit_linear=True)
fm.fit(X_aug, y)

@vene
Copy link
Collaborator

vene commented Jan 29, 2019

(I had some typos in the comment above. If viewing this by e-mail, please visit the updated comment on github)

@vonjackustc
Copy link

Thank you for replying!
I modified _cd_direct_ho routine. When calling _cd_linear_epoch, it modifies X adding a dummy feature to fit the intercept (_w[0]).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants