Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend GLM Families #35

Open
mrocklin opened this issue Mar 20, 2017 · 6 comments
Open

Extend GLM Families #35

mrocklin opened this issue Mar 20, 2017 · 6 comments

Comments

@mrocklin
Copy link
Member

So far in https://github.com/dask/dask-glm/blob/master/dask_glm/families.py we have families for linear/normal and logistic that look like the following:

class Logistic(object):
    @staticmethod
    def loglike(Xbeta, y):
        eXbeta = exp(Xbeta)
        return (log1p(eXbeta)).sum() - dot(y, Xbeta)

    @staticmethod
    def pointwise_loss(beta, X, y):
        '''Logistic Loss, evaluated point-wise.'''
        beta, y = beta.ravel(), y.ravel()
        Xbeta = X.dot(beta)
        return Logistic.loglike(Xbeta, y)

    @staticmethod
    def pointwise_gradient(beta, X, y):
        '''Logistic gradient, evaluated point-wise.'''
        beta, y = beta.ravel(), y.ravel()
        Xbeta = X.dot(beta)
        return Logistic.gradient(Xbeta, X, y)

    @staticmethod
    def gradient(Xbeta, X, y):
        p = sigmoid(Xbeta)
        return dot(X.T, p - y)

    @staticmethod
    def hessian(Xbeta, X):
        p = sigmoid(Xbeta)
        return dot(p * (1 - p) * X.T, X)
  • Do we want to extend these to other GLM families?
  • If so then what other options are most useful?
  • Is the current interface sufficient and ideal?
@cicdw
Copy link
Collaborator

cicdw commented Mar 20, 2017

The only other GLM family I know of that is used "in the wild" is the Poisson family which should be very straightforward to include. (@mpancia do you know of any others?)

@hussainsultan didn't like the use of classes filled with staticmethods just for convenient namespacing, he would prefer a families module with Normal.py, etc. I originally implemented this with classes in case we ever wanted to take advantage of cacheing to speed up function calls, but that's premature optimization on my part.

@mpancia
Copy link
Contributor

mpancia commented Mar 20, 2017

@moody-marlin Poisson, for sure; I would also include some of the other basic families, even though they are slightly less common (e.g. binomial and/or with alternate link functions e.g. probit)

@mpancia
Copy link
Contributor

mpancia commented Mar 22, 2017

After looking at it for a little bit, I think mimicing the statsmodels API for their families makes sense: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/genmod/families/family.py

They have (what amounts to) an abstract base class Family which specifies Link and Variance. The abstract properties/methods are reflective of the method they are using to fit it (IRLS), as well as some more generic things (computation of residuals, a fit method, etc.).

I think something would make just as much sense here, except our specification of the ABC Family ought to incorporate the necessary functions for fitting via ADMM/ newton's method/etc. (the log-likelihood, the pointwise gradient, and the pointwise Hessian, etc.).

In addition, I think that the Family specification should be factored out of the algorithms in algorithms.py as they were before -- there's no reason to peg them to a Family, as they're just generic implementations and should be repurposable for other things.

@mrocklin
Copy link
Member Author

@mpancia to be concrete can you give an example of what algorithms in algorithms.py should take as inputs?

@mpancia
Copy link
Contributor

mpancia commented Mar 22, 2017

@mrocklin Yeah, sure! In, e.g. Newton's method:

def newton(X, y, max_steps=50, tol=1e-8, family=Logistic):
    '''Newtons Method for Logistic Regression.'''

    gradient, hessian = family.gradient, family.hessian
    n, p = X.shape
    beta = np.zeros(p)  # always init to zeros?
    Xbeta = dot(X, beta)

    iter_count = 0
    converged = False

    while not converged:
        beta_old = beta

        # should this use map_blocks()?
        hess = hessian(Xbeta, X)
        grad = gradient(Xbeta, X, y)

        hess, grad = da.compute(hess, grad)

        # should this be dask or numpy?
        # currently uses Python 3 specific syntax
        step, _, _, _ = np.linalg.lstsq(hess, grad)
        beta = (beta_old - step)

        iter_count += 1

        # should change this criterion
        coef_change = np.absolute(beta_old - beta)
        converged = (
            (not np.any(coef_change > tol)) or (iter_count > max_steps))

        if not converged:
            Xbeta = dot(X, beta)  # numpy -> dask converstion of beta

    return beta

The only time that family is accessed is to retrieve family.gradient and family.hessian; Seems to me that newton ought to just have gradient and hessian arguments, so that I could use it on a more arbitrary function.

@cicdw
Copy link
Collaborator

cicdw commented Mar 22, 2017

I actually really like this idea but I do think a refactor which preserves the runtimes will be more difficult than it initially seems; for example, @mpancia I'd encourage you to try and refactor the gradient_descent code and track the runtimes for a given dataset. In that code, we explicitly exploit the linear-model-ness and keep Xbeta as an in-memory numpy array, instead of having a pointwise call on beta every time we want to update. If we wanted to refactor, this could involve some clever cacheing or something possibly?

However, for newton / admm, this seems very do-able.

Moreover, if we could pull this off for all the algorithms, this would most likely allow for improved testing. We could run the algorithms on many different carefully crafted convex functions for which we know the optima (which isn't currently possible for regularized problems), and separately test the implementations of the GLM families / regularizers. This wouldn't be full-out integration testing, but it would still increase our understanding of the accuracy of our implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants