Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse isometric log ratio simplex method #39

Open
spinkney opened this issue Jul 22, 2022 · 17 comments
Open

Inverse isometric log ratio simplex method #39

spinkney opened this issue Jul 22, 2022 · 17 comments

Comments

@spinkney
Copy link
Collaborator

spinkney commented Jul 22, 2022

I can't find where @bob-carpenter mentioned that he was still trying to understand this transform but it piqued my curiosity because I hadn't heard of it.

After grokking the idea I was able to put together an R script that generates a simplex using it. The idea connects unit vector and simplex stuff together. The following is the reverse of the ILR so it's a bit weird. Needing to invert the Helmert matrix - which enforces the the sum-to-zero constraint in the forward mapping.

make_V <- function(D) {
  apply(t(contr.helmert(D)), 1, function(x) x /norm(x))
}

clr_trans <- function(x) {
  D <- length(x)
  log(x) - sum(log(x)) / D
}

random_simplex <- function(D) {

  x_test_ilr <- runif(D - 1, -5, 5)
  # To reverse transform 
  # append a row with all 0s except the last as 1 to make V matrix invertible
  V <- make_V(D)
  V_fullrank <- rbind(t(V), c(rep(0, D - 1), 1))
  V_inv_fullrank <- solve(V_fullrank)
  
  # append 0 on the ILR values
  x_test_ilr_zero <- c(x_test_ilr, 0)
  
  exp_s <- exp(V_inv_fullrank %*% x_test_ilr_zero)
  exp_s
  
  alpha <- sum(exp_s[1:D-1])
  Z <- -log1p(alpha)
  
  # values match x_test
  return(exp_s * exp(Z))
  
}
random_simplex(5)
         [,1]
1 0.020302134
2 0.039406348
3 0.704896030
4 0.005134803
5 0.230260686

I've written a Stan program without the log-det-jac adjustment because I'm tired and I think @sethaxen may be able to whip it up much quicker than me. I put a normal prior just to get sampling going.

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  {
    vector[N - 1] s = Vinv * y;
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }
 
}
model {
 y ~ normal(0, 10);
}
generated quantities {
  real z_sum = sum(z);
}

To run this you need

make_V <- function(D) {
  apply(t(contr.helmert(D)), 1, function(x) x /norm(x))
}

V <- make_V(N)
V_fullrank <- rbind(t(V), c(rep(0, N - 1), 1))
V_inv_fullrank <- solve(V_fullrank)

mod_out <- mod$sample(
  data = list(N = N,
              Vinv = matrix(V_inv_fullrank[1:(N-1), 1:(N-1)], N-1, N-1)),
  parallel_chains = 4,
  iter_sampling = 2000
)
mod_out$summary("z")

and example output of

 mod_out$summary("z")
# A tibble: 5 × 10
  variable  mean    median    sd       mad       q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>     <dbl> <dbl>     <dbl>    <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 z[1]     0.201 0.0000368 0.371 0.0000546 6.36e-14  1.00  1.00    8449.    6023.
2 z[2]     0.206 0.0000382 0.374 0.0000567 3.85e-14  1.00  1.00    9083.    6334.
3 z[3]     0.199 0.0000351 0.369 0.0000521 4.16e-14  1.00  1.00    8053.    6443.
4 z[4]     0.194 0.0000241 0.365 0.0000357 4.58e-14  1.00  1.00    7887.    6253.
5 z[5]     0.201 0.0000500 0.371 0.0000742 3.62e-14  1.00  1.00    7808.    6525.
@spinkney
Copy link
Collaborator Author

Now that I look at it again, it looks like the softmax formulation but using s instead of y. So just

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;
  
  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }
}
model {
 target += -N * log1p_exp(log_sum_exp(s)) + sum(s);
}
generated quantities {
  real z_sum = sum(z);
}

@sethaxen
Copy link
Collaborator

Is there an explanation of the reasoning behind this transformation somewhere?

I've written a Stan program without the log-det-jac adjustment because I'm tired and I think @sethaxen may be able to whip it up much quicker than me.

Happy to! 🙂

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 22, 2022

I was following this https://www.researchgate.net/publication/266864217_AN_ALGEBRAIC_METHOD_TO_COMPUTE_ISOMETRIC_LOGRATIO_TRANSFORMATION_AND_BACK_TRANSFORMATION_OF_COMPOSITIONAL_DATA.

It isn't a great paper but the calculations were confirmed in the compositions R package.

Here's my noob julia code to get the Jacobian. It is not equal to what I have in the Stan program above.

using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels

D = 5
helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix)

V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat))
V_raw = transpose(reduce(hcat, V_raw))

V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)')
V_inv_fullrank = inv(V_fullrank)

y = rand(Uniform(-10, 10), D - 1)

function euclid_to_simplex!(y, Vinv) 
    s = Vinv * y
    alpha = log(sum(exp.(vcat(s,0 ))))
    return vcat(exp.(s .- alpha), 1/exp(alpha))
end

J = ForwardDiff.jacobian(y -> euclid_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y)
logabsdet(J'J)[1]/2

# does not equal this
s_test = V_inv_fullrank[1:D-1,1:D-1]  * y
-D * log1pexp(logsumexp(s_test)) + sum(s_test)

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 22, 2022

I did get the Jacobian by plugging exp(V*y - vector(log(sum(exp(V*y))))) into matrixcalc.

There's gotta be a simpler formula though

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
 matrix[N, N] Vinv_full;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;
  real logabsdet = 0;
  
  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
    
    vector[N] t0 = exp(Vinv_full * append_row(y, 0));
    real t1 = sum(t0);
    vector[N] t2 = t0 / t1;
    
  //  matrix[N, N] J = diag_pre_multiply(t2, Vinv_full) - (1/t1) * t2 * (t0' * Vinv_full);
 // a bit faster
   matrix[N, N - 1] J = add_diag( -(1/t1) * t2 * t0', t2) * Vinv_full[:, 1:N - 1];
    logabsdet += log_determinant_spd(crossprod(J[,1:N-1])) * 0.5;
  }
 
}
model {
 target += logabsdet;
}
generated quantities {
  real z_sum = sum(z);
}

Test of jacobian in Julia

using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels


D = 5
helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix)

V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat))
V_raw = transpose(reduce(hcat, V_raw))

V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)')
V_inv_fullrank = inv(V_fullrank)

y = rand(Uniform(-10, 10), D - 1)

s = V_inv_fullrank[1:D-1,1:D-1] * y

z = vcat(exp.(s .- alpha), 1/exp(alpha))

function euclide_to_simplex!(y, Vinv) 
    s = Vinv * y
    alpha = log(sum(exp.(vcat(s,0 ))))
    return vcat(exp.(s .- alpha), 1/exp(alpha))
end

J = ForwardDiff.jacobian(y -> euclide_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y)

t0 =  V_inv_fullrank * vcat(y, 0)
t1 = exp.(t0)
t2 = sum(t1)
t3 = exp.(t0 .- log(t2) )

J_test = diagm(t3) * V_inv_fullrank - (1/t2) * t3 * (transpose(t1) * V_inv_fullrank)

@test logabsdet(J_test[:, 1:D-1]'J_test[:, 1:D-1])[1] * 0.5  logabsdet(J'J)[1]/2

Test Passed
  Expression: (logabsdet((J_test[:, 1:D - 1])' * J_test[:, 1:D - 1]))[1] * 0.5  (logabsdet(J' * J))[1] / 2
   Evaluated: -24.100049227510453  -24.100049227559065

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Jul 22, 2022

Two things should help.

$$ \det(A \ B) = (\det A \ \det B) \qquad \det(A^{\top}) = \det(A) $$

That means

$$ \log \det J J^{\top} = \log \det J + \log \det J^{\top} = 2 \log \det J $$

so we can avoid the cross-product. And then $J$ itself is a product that can be decomposed the same way with a component that can be precomputed as transformed data (Vinv_full), so we are left with needing a determinant for

$$ \textrm{diag}(t_2) + \frac{-t_2}{t_1} \ t_0^{\top} $$

Now we can apply the matrix determinant lemma

$$ \det (A + uv^{\top}) = (1 + v^{\top} A^{-1} u) \ \det(A) $$

with $A = \textrm{diag}(t_2)$, $u = \frac{-t_2}{t_1}$, and $v = t_0$.

I just learned this matrix determinant lemma last week and it's cropping up everywhere! Of course, I could've botched the linear algebra, so please double check!

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 22, 2022

I got it!!

By chance I happened to find a ILR cheatsheet. Wow! So easy to take the derivative. I confirmed with the Julia code.

$$ \log |J| = \log\text{softmax}(Vy)+ \log(D) $$

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
 matrix[N, N] Vinv_full;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;
  real logabsdet = sum(log_softmax( Vinv_full[:, 1:N-1] * y)) + log(N);
  
  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }
 
}
model {
 target += logabsdet;
}
generated quantities {
  real z_sum = sum(z);
}

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 22, 2022

The code can be simplified quite a bit. Where I've updated Vinv to be N x N -1. The adjustment is just the sum of the log simplex and the log of the dimension of the simplex.

data {
 int<lower=0> N;
 matrix[N, N - 1] Vinv;
}
transformed data {
  real logN = log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  vector[N] s = Vinv * y;
  real alpha = log_sum_exp(s);
  simplex[N] z = exp(s - alpha);
}
model {
target += sum(s - alpha) + logN;
}
generated quantities {
  real z_sum = sum(z);
}

@mjhajharia
Copy link
Owner

mjhajharia commented Jul 22, 2022 via email

@spinkney
Copy link
Collaborator Author

@sethaxen I'm seeing discrepancies from ForwardDiff and this log-abs-det that I'm not sure are due to numerical computation errors or I'm missing something on this transform. If I make the dimension size 10000 the difference is about 2 which seems much too high. Is this a typical observation or does it indicate I'm still missing something. The differences are very small for low dimension sizes.

@spinkney
Copy link
Collaborator Author

Ok, it's floating point errors. If I restrict the domain of y to 0,1 or -1, 1 or something the differences disappear. I'm guessing it's the final exp.

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 23, 2022

I haven't had this much fun doing math in a long time. What I have is correct and it follows from the matrix determinant lemma that @bob-carpenter gave (thanks a ton for that hint)! It's quite elegant that the log-determinant of the matrix calculus formula boils down to this simple update. I'll update the paper with the math and make a pull request for the stan code over the next few days.

@bob-carpenter
Copy link
Collaborator

I haven't had this much fun doing math in a long time.

Algebra is a lot of fun when a big pile of math like a log Jacobian determinant works out to something simple. I'm hoping we can convey that to the reader while making all this understandable.

@spinkney
Copy link
Collaborator Author

After going through all the math this is just the softmax transform with a linear scaling applied.

@spinkney spinkney reopened this Jul 25, 2022
@spinkney
Copy link
Collaborator Author

spinkney commented Jul 25, 2022

I'm opening this back up because - although I don't understand why - the ESS is 10x higher for the last parameter in this transform vs the softmax.

@bob-carpenter
Copy link
Collaborator

the ESS is 10x higher for the last parameter in this transform vs the softmax.

Is that softmax with the last element pinned to zero?

@spinkney
Copy link
Collaborator Author

Yes, softmax with the last value pinned to zero

@bob-carpenter
Copy link
Collaborator

Perhaps that's not surprising when you look at the Jacobian, which is all about how much probability is left over for the last element and then raised to dimensionality power.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants