Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLE best-fit parameter disagreeing with pyhf #345

Closed
Moelf opened this issue Apr 13, 2022 · 15 comments
Closed

MLE best-fit parameter disagreeing with pyhf #345

Moelf opened this issue Apr 13, 2022 · 15 comments

Comments

@Moelf
Copy link

Moelf commented Apr 13, 2022

Related to:

using PyCall, BAT, Distributions

py"""
import pyhf
spec = {
	'channels': [
		{
			'name': 'signal',
			'samples': [
				{
					'name': 'signal',
					'data': [2,3,4,5],
					'modifiers': [
						{'name': 'mu', 'type': 'normfactor', 'data': None}
					],
				},
				{
					'name': 'bkg1',
					'data': [30,19,9,4],
					'modifiers': [
						{ 
							"name": "theta", 
							"type": "histosys", 
							"data": {
								"hi_data": [31,21,12,7], 
								"lo_data": [29,17,6,1]
							}
						}
					],
				},
			],
		}
	]
}
pdf_pyhf = pyhf.Model(spec)
"""

pdf_pyhf = py"pdf_pyhf"
v_data = [34,22,13,11] # observed data
data_pyhf = [v_data; pdf_pyhf.config.auxdata]

prior = BAT.NamedTupleDist(
    μ = Uniform(0, 4),
    θ = Normal()
)

function likelihood_pyhf(v)
	(;μ, θ) = v
	LogDVal(only(pdf_pyhf.logpdf([μ, θ], data_pyhf)))
end

@assert likelihood((;μ=1.3, θ=-0.07)).logval  likelihood_pyhf((;μ=1.3, θ=-0.07)).logval

Result

pyhf = pyimport("pyhf")
(μ̂_pyhf, θ̂_pyhf), nll_pyhf = pyhf.infer.mle.fit(data_pyhf, pdf_pyhf, return_fitted_val=true)
# μ = 1.30654, θ = -0.0603666

posterior_BATpyhf = PosteriorDensity(likelihood_pyhf, prior)
best_fit_BATpyhf = bat_findmode(posterior_BATpyhf).result
# μ = 1.2872543725923415, θ = -0.030559943549792377
@lukasheinrich
Copy link

@Moelf can you try with a Flat prior for theta?

@Moelf
Copy link
Author

Moelf commented Apr 13, 2022

changing prior to:

	prior = BAT.NamedTupleDist(
		μ = Uniform(0, 4),
		θ = Uniform(-3, 3)
	)

fixed it:

μ = 1.3064153351104848, θ = -0.06050344150637746

that's pretty unexpected, shouldn't the MLE() fit mostly not care about prior (cuz we're not sampling)?

@Moelf
Copy link
Author

Moelf commented Apr 13, 2022

for comparison, if we use Turing.jl, also MLE problem (no sampling):

using Turing

function nll(μ, θ)
    variations = [1,2,3,3]
    v_data = [34,22,13,11] # observed data
    v_sig = [2,3,4,5] # signal
    v_bg = [30,19,9,4] # BKG

    bg = @. v_bg *(1 + θ*variations/v_bg)
    k = μ*v_sig + bg
    n_logfac = map(x->sum(log, 1:x), v_data)
    NM = Normal(0, 1)
    sum(@. v_data * log(k) - n_logfac - k) + logpdf(NM, θ)
end

@model function binned_f(bincounts)
	μ ~ Uniform(0, 6)
	θ ~ Normal(0, 1)
	Turing.@addlogprob! nll(μ, θ)
end

chain_f = optimize(binned_f(v_data), MLE())

ModeResult with maximized lp of -10.51
2-element Named Vector{Float64}
A  │ 
───┼───────────
1.30648
-0.0605151

@lukasheinrich
Copy link

MLE and posterior mode are only equivalent for flat priors

@Moelf
Copy link
Author

Moelf commented Apr 13, 2022

I understand the posterior would deviate from MLE if prior is not flat, but I thought for the MLE it shouldn't care much if prior is flat or not.

Turing gives same results regardless of:

θ ~ Normal()
# or
θ ~ Uniform(-1, 1)

@lukasheinrich
Copy link

lukasheinrich commented Apr 13, 2022

Afaik bat gives you the posterior mode in any case , I don't know if there is a pure MLE api in BAT

@lukasheinrich
Copy link

Ie findmode gives you maximum a posteriori estimates (MAP) not MLE

@Moelf
Copy link
Author

Moelf commented Apr 13, 2022

that would make sense then, so I guess as advertised bat_findmode still involves Bayesian sampling and is simply giving me the mode of the parameter, which is only equivalent to a likelihoodism MLE when all priors are flat.

Then for doing Frequentist Procedure, I would use Turing.jl + pyhf or (PyCall +) pyhf for now!

@Moelf Moelf closed this as completed Apr 13, 2022
@oschulz
Copy link
Member

oschulz commented Apr 14, 2022

I guess as advertised bat_findmode still involves Bayesian sampling and is simply giving me the mode of the parameter

It doesn't sample, but yes, it find the global maximum (at least it tries to) of the posterior density.

You can use BAT and ValueShape tools to do a MLE:

using DensityInterface, InverseFunctions, ValueShapes, Optim

posterior = PosteriorDensity(likelihood, prior)
vshp = varshape(posterior.prior)
x_init = inverse(vshp)(rand(posterior.prior))
neg_unshaped_likelihood = BAT.negative(logdensityof(posterior.likelihood)  vshp)
r = Optim.optimize(neg_unshaped_likelihood, x_init)
shaped_result = vshp(Optim.minimizer(r))

It's not really in line with BAT's philisophy as a Bayesian package, but we could add an MLE function to BAT, e.g. to enable comparison with frequentist results. The prior would then be used to inform the optimizer about the shape of the space, without influencing the location of the MLE.

@lukasheinrich
Copy link

I think it's fine to require folks to use flat priors if they want to compare to MLE

@Moelf Moelf reopened this Apr 15, 2022
@Moelf Moelf closed this as completed Apr 16, 2022
@Moelf
Copy link
Author

Moelf commented Apr 16, 2022

For posterity (moral of the story: learn more stats kids...), the real cause of disagreement is because the Frequentist likelihood and Bayesian likelihood should not equal to begin with:

julia> function baye_nll(v)
           variations = [1,2,3,3]
           v_data = [34,22,13,11] # observed data
           v_sig = [2,3,4,5] # signal
           v_bg = [30,19,9,4] # BKG
           (;μ, θ) = v
           bg = @. v_bg *(1 + θ*variations/v_bg)
           k = μ*v_sig + bg
           n_logfac = map(x->sum(log, 1:x), v_data)
           NM = Normal(0, 1)
           LogDVal(sum(@. v_data * log(k) - n_logfac - k))
       end

julia> posterior_BAT = PosteriorDensity(baye_nll, prior);

julia> best_fit_BAT = bat_findmode(posterior_BAT).result
[ Info: Using transform algorithm DensityIdentityTransform()
ShapedAsNT((μ = 1.306479238048563, θ = -0.06063208083677547))

notice the deliberate omission of logpdf(NM, θ) in the likelihood, this term which we add in Frequentist likelihood (and is associated with aux data in pyhf) is equivalent to finding the MAP in a Bayesian model given a prior θ ~ NM, thus we don't need it in a Bayesian likelihood.

@lukasheinrich
Copy link

lukasheinrich commented Apr 16, 2022

It depends a bit which part you model with data and what you leave to the prior. The measurements in the constraint terms often represent actual measurements done by eg the collaboration that should be considered and not overridden by the prior. One way is either to have a first Bayesian step to derive a posterior on the NP given this aux measurement or alternatively model this in the likelihood.

@oschulz
Copy link
Member

oschulz commented Apr 16, 2022

I guess it really depends on what you consider your "prior knowledge to be" - this can, of course, be a philosophical question. :-)

@Moelf
Copy link
Author

Moelf commented Apr 16, 2022

the prior in LHC experiments is what Combined Performance groups measured. The CP tools give you the expected bin counts at +/- 1 sigma (of a nuisance parameter), which is why all the parameters have a prior of Normal(0, 1).

I think now I feel really weird we do this, namely, adding constraints to likelihood to mimic Bayesian prior (in Bayesian formulation this is naturally the result)

@lukasheinrich
Copy link

We can discuss offline but I would posit it's natural either way

At the Core the CP groups provide measurements of actual data not beliefs. You can either use that to update your prior on nuisance parameter that later then you use for your main measurement or you skip that step and model the joint measurement of the cp measurement and you main analysis

I wouldn't say one is more natural than the other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants