MLE best-fit parameter disagreeing with `pyhf` #345

Moelf · 2022-04-13T21:47:26Z

Related to:

Support pyhf #318

using PyCall, BAT, Distributions

py"""
import pyhf
spec = {
	'channels': [
		{
			'name': 'signal',
			'samples': [
				{
					'name': 'signal',
					'data': [2,3,4,5],
					'modifiers': [
						{'name': 'mu', 'type': 'normfactor', 'data': None}
					],
				},
				{
					'name': 'bkg1',
					'data': [30,19,9,4],
					'modifiers': [
						{ 
							"name": "theta", 
							"type": "histosys", 
							"data": {
								"hi_data": [31,21,12,7], 
								"lo_data": [29,17,6,1]
							}
						}
					],
				},
			],
		}
	]
}
pdf_pyhf = pyhf.Model(spec)
"""

pdf_pyhf = py"pdf_pyhf"
v_data = [34,22,13,11] # observed data
data_pyhf = [v_data; pdf_pyhf.config.auxdata]

prior = BAT.NamedTupleDist(
    μ = Uniform(0, 4),
    θ = Normal()
)

function likelihood_pyhf(v)
	(;μ, θ) = v
	LogDVal(only(pdf_pyhf.logpdf([μ, θ], data_pyhf)))
end

@assert likelihood((;μ=1.3, θ=-0.07)).logval ≈ likelihood_pyhf((;μ=1.3, θ=-0.07)).logval

Result

pyhf = pyimport("pyhf")
(μ̂_pyhf, θ̂_pyhf), nll_pyhf = pyhf.infer.mle.fit(data_pyhf, pdf_pyhf, return_fitted_val=true)
# μ = 1.30654, θ = -0.0603666

posterior_BATpyhf = PosteriorDensity(likelihood_pyhf, prior)
best_fit_BATpyhf = bat_findmode(posterior_BATpyhf).result
# μ = 1.2872543725923415, θ = -0.030559943549792377

lukasheinrich · 2022-04-13T22:39:12Z

@Moelf can you try with a Flat prior for theta?

Moelf · 2022-04-13T22:41:35Z

changing prior to:

	prior = BAT.NamedTupleDist(
		μ = Uniform(0, 4),
		θ = Uniform(-3, 3)
	)

fixed it:

μ = 1.3064153351104848, θ = -0.06050344150637746

that's pretty unexpected, shouldn't the MLE() fit mostly not care about prior (cuz we're not sampling)?

Moelf · 2022-04-13T22:48:36Z

for comparison, if we use Turing.jl, also MLE problem (no sampling):

using Turing

function nll(μ, θ)
    variations = [1,2,3,3]
    v_data = [34,22,13,11] # observed data
    v_sig = [2,3,4,5] # signal
    v_bg = [30,19,9,4] # BKG

    bg = @. v_bg *(1 + θ*variations/v_bg)
    k = μ*v_sig + bg
    n_logfac = map(x->sum(log, 1:x), v_data)
    NM = Normal(0, 1)
    sum(@. v_data * log(k) - n_logfac - k) + logpdf(NM, θ)
end

@model function binned_f(bincounts)
	μ ~ Uniform(0, 6)
	θ ~ Normal(0, 1)
	Turing.@addlogprob! nll(μ, θ)
end

chain_f = optimize(binned_f(v_data), MLE())

ModeResult with maximized lp of -10.51
2-element Named Vector{Float64}
A  │ 
───┼───────────
:μ │    1.30648
:θ │ -0.0605151

lukasheinrich · 2022-04-13T22:55:23Z

MLE and posterior mode are only equivalent for flat priors

Moelf · 2022-04-13T22:56:33Z

I understand the posterior would deviate from MLE if prior is not flat, but I thought for the MLE it shouldn't care much if prior is flat or not.

Turing gives same results regardless of:

θ ~ Normal()
# or
θ ~ Uniform(-1, 1)

lukasheinrich · 2022-04-13T22:59:38Z

Afaik bat gives you the posterior mode in any case , I don't know if there is a pure MLE api in BAT

lukasheinrich · 2022-04-13T23:00:19Z

Ie findmode gives you maximum a posteriori estimates (MAP) not MLE

Moelf · 2022-04-13T23:01:44Z

that would make sense then, so I guess as advertised bat_findmode still involves Bayesian sampling and is simply giving me the mode of the parameter, which is only equivalent to a likelihoodism MLE when all priors are flat.

Then for doing Frequentist Procedure, I would use Turing.jl + pyhf or (PyCall +) pyhf for now!

oschulz · 2022-04-14T06:43:23Z

I guess as advertised bat_findmode still involves Bayesian sampling and is simply giving me the mode of the parameter

It doesn't sample, but yes, it find the global maximum (at least it tries to) of the posterior density.

You can use BAT and ValueShape tools to do a MLE:

using DensityInterface, InverseFunctions, ValueShapes, Optim

posterior = PosteriorDensity(likelihood, prior)
vshp = varshape(posterior.prior)
x_init = inverse(vshp)(rand(posterior.prior))
neg_unshaped_likelihood = BAT.negative(logdensityof(posterior.likelihood) ∘ vshp)
r = Optim.optimize(neg_unshaped_likelihood, x_init)
shaped_result = vshp(Optim.minimizer(r))

It's not really in line with BAT's philisophy as a Bayesian package, but we could add an MLE function to BAT, e.g. to enable comparison with frequentist results. The prior would then be used to inform the optimizer about the shape of the space, without influencing the location of the MLE.

lukasheinrich · 2022-04-14T07:46:02Z

I think it's fine to require folks to use flat priors if they want to compare to MLE

Moelf · 2022-04-16T05:37:27Z

For posterity (moral of the story: learn more stats kids...), the real cause of disagreement is because the Frequentist likelihood and Bayesian likelihood should not equal to begin with:

julia> function baye_nll(v)
           variations = [1,2,3,3]
           v_data = [34,22,13,11] # observed data
           v_sig = [2,3,4,5] # signal
           v_bg = [30,19,9,4] # BKG
           (;μ, θ) = v
           bg = @. v_bg *(1 + θ*variations/v_bg)
           k = μ*v_sig + bg
           n_logfac = map(x->sum(log, 1:x), v_data)
           NM = Normal(0, 1)
           LogDVal(sum(@. v_data * log(k) - n_logfac - k))
       end

julia> posterior_BAT = PosteriorDensity(baye_nll, prior);

julia> best_fit_BAT = bat_findmode(posterior_BAT).result
[ Info: Using transform algorithm DensityIdentityTransform()
ShapedAsNT((μ = 1.306479238048563, θ = -0.06063208083677547))

notice the deliberate omission of logpdf(NM, θ) in the likelihood, this term which we add in Frequentist likelihood (and is associated with aux data in pyhf) is equivalent to finding the MAP in a Bayesian model given a prior θ ~ NM, thus we don't need it in a Bayesian likelihood.

lukasheinrich · 2022-04-16T13:14:53Z

It depends a bit which part you model with data and what you leave to the prior. The measurements in the constraint terms often represent actual measurements done by eg the collaboration that should be considered and not overridden by the prior. One way is either to have a first Bayesian step to derive a posterior on the NP given this aux measurement or alternatively model this in the likelihood.

oschulz · 2022-04-16T18:49:42Z

I guess it really depends on what you consider your "prior knowledge to be" - this can, of course, be a philosophical question. :-)

Moelf · 2022-04-16T19:04:49Z

the prior in LHC experiments is what Combined Performance groups measured. The CP tools give you the expected bin counts at +/- 1 sigma (of a nuisance parameter), which is why all the parameters have a prior of Normal(0, 1).

I think now I feel really weird we do this, namely, adding constraints to likelihood to mimic Bayesian prior (in Bayesian formulation this is naturally the result)

lukasheinrich · 2022-04-16T20:41:03Z

We can discuss offline but I would posit it's natural either way

At the Core the CP groups provide measurements of actual data not beliefs. You can either use that to update your prior on nuisance parameter that later then you use for your main measurement or you skip that step and model the joint measurement of the cp measurement and you main analysis

I wouldn't say one is more natural than the other

Moelf closed this as completed Apr 13, 2022

oschulz mentioned this issue Apr 14, 2022

Support pyhf #318

Open

Moelf reopened this Apr 15, 2022

Moelf closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLE best-fit parameter disagreeing with `pyhf` #345

MLE best-fit parameter disagreeing with `pyhf` #345

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022 •

edited

Loading

oschulz commented Apr 14, 2022

lukasheinrich commented Apr 14, 2022

Moelf commented Apr 16, 2022 •

edited

Loading

lukasheinrich commented Apr 16, 2022 •

edited

Loading

oschulz commented Apr 16, 2022

Moelf commented Apr 16, 2022 •

edited

Loading

lukasheinrich commented Apr 16, 2022

MLE best-fit parameter disagreeing with pyhf #345

MLE best-fit parameter disagreeing with pyhf #345

Comments

Moelf commented Apr 13, 2022 • edited Loading

Result

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022

Moelf commented Apr 13, 2022 • edited Loading

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022 • edited Loading

lukasheinrich commented Apr 13, 2022 • edited Loading

lukasheinrich commented Apr 13, 2022

Moelf commented Apr 13, 2022 • edited Loading

oschulz commented Apr 14, 2022

lukasheinrich commented Apr 14, 2022

Moelf commented Apr 16, 2022 • edited Loading

lukasheinrich commented Apr 16, 2022 • edited Loading

oschulz commented Apr 16, 2022

Moelf commented Apr 16, 2022 • edited Loading

lukasheinrich commented Apr 16, 2022

MLE best-fit parameter disagreeing with `pyhf` #345

MLE best-fit parameter disagreeing with `pyhf` #345

Moelf commented Apr 13, 2022 •

edited

Loading

Moelf commented Apr 13, 2022 •

edited

Loading

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022 •

edited

Loading

Moelf commented Apr 13, 2022 •

edited

Loading

Moelf commented Apr 16, 2022 •

edited

Loading

lukasheinrich commented Apr 16, 2022 •

edited

Loading

Moelf commented Apr 16, 2022 •

edited

Loading