Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12 4 8 new vars v1 #47

Open
wants to merge 11 commits into
base: 12_4_8
Choose a base branch
from
Open

Conversation

angzaza
Copy link

@angzaza angzaza commented Dec 14, 2022

New variables for FatJets and SubJets are added in PFNano/python/addBTV.py

  • uncorrpt
  • residual
  • jes
  • CombIVF
  • DeepCSVBDisc
  • btagDeepB_c
  • DeepCSVCvsLDisc
  • DeepCSVCvsBDisc

For FatJets, a MuonTable is defined in PFNano/plugins/JetConstituentTableProducer.cc, which contains the following variables:

  • pt
  • eta
  • ptrel
  • phi
  • isGlobal
  • nMuHit
  • nMatched
  • nTkHit
  • nPixHit
  • nOutHit

The distributions of these variables, obtained by running PFNano/test/nano_mc_Run3_122X_NANO.py over 1k events of /ZprimeToTT_M3000_W30_TuneCP2_13p6TeV-madgraph-pythiaMLM-pythia8/Run3Winter22MiniAOD-122X_mcRun3_2021_realistic_v9-v2/MINIAODSIM, are shown at this link.

This work was carried out under the supervision of Soureek Mitra and Denise Muller.

@demuller
Copy link

Hi @angzaza many thanks for your PR! I will have a look later today at it.

@AnnikaStein could you review it as well?

@demuller
Copy link

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

@demuller
Copy link

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

it probably would also help with the branch conflicts

@AnnikaStein
Copy link
Contributor

Hi @angzaza many thanks for your PR! I will have a look later today at it.

@AnnikaStein could you review it as well?

Sure, I can do that!

[General comment]
What I see is that this PR which is meant for 12_4_8 has been prepared for the master branch, hence the conflicts. If this was updated, I guess it would already look a lot cleaner. I'll use the edit button to switch that, and then we can look at the details.

@AnnikaStein AnnikaStein changed the base branch from master to 12_4_8 December 16, 2022 13:48
@AnnikaStein AnnikaStein self-requested a review December 16, 2022 13:52
@AnnikaStein
Copy link
Contributor

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

it probably would also help with the branch conflicts

Indeed after switching the target branch, neither conflicts nor commits from other people are left in the PR :)

So thanks also from my side @angzaza ! I'll take a closer look and get back to you with specific comments later.

Copy link
Contributor

@AnnikaStein AnnikaStein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this nice work @angzaza !

Although more detailed statements and examples are directly marked in the code, here is a summary:

[Style]
Mostly indentation & commented out code, especially for indentation I only commented on the first occurrence although there were many:

  • For plugins/JetConstituentTableProducer.cc in L51,54,67,82,85,92,97,110,116-118,125,… 243ff., 326ff.,353,356,363f.
  • python/addBTV.py L454
  • python/addPFcands_cff.py L65,76,129,139

[Implementation itself, technical details & storage]

  • Some comments on more robust calculation of discriminators, similar to what is done in NanoAOD, directly marked inline.
  • Synchronize precision of floating point branches (10 so far for everything versus 20 for Muon)
  • Cross-check variables w.r.t. RecoBTag-PerformanceMeasurements (where there seem to be more)

[Suggestion]

  • Checked out this development branch and ran a quick test myself: currently, the Muon information is added for all Jet collections, FatJets included which is good, but I’m not sure we need this also for AK4, AK4GenJets, AK8GenJets? I’d suggest we add a boolean flag per jet collection which is read by the producer. Like so: addMuonTable = cms.bool(False) for those where we will definitely not add them, but addMuonTable = cms.bool(True) for FatJets, needs some if{} clauses inside the producer then and could be handled similar to readBtag. Whether or not it should be added for more than just FatJets is probably something for discussion, can other commissioning workflows besides the one for fat jets make use of (Gen)JetMuon tables? Storage-wise it doesn’t look too heavy to include that also for the other collections as it’s not hundreds of features and half of them are integers.

@@ -48,19 +48,23 @@ class JetConstituentTableProducer : public edm::stream::EDProducer<> {
//const std::string name_;
const std::string name_;
const std::string nameSV_;
const std::string nameMu_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements

Comment on lines 131 to 137
/*outMuons->clear();
jetIdx_mu.clear();
muIdx.clear();
muon_pt.clear();
muon_eta.clear();
muon_phi.clear();
muon_ptrel.clear();*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Places with commented out code which could be removed for clarity

python/addBTV.py Outdated
@@ -363,6 +449,7 @@ def add_BTV(process, runOnMC=False, onlyAK4=False, onlyAK8=False, keepInputs=['D
extension=cms.bool(True), # this is the extension table for FatJets
variables=cms.PSet(
CommonVars,
SubJetVars,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements

@@ -62,6 +62,8 @@ def addPFCands(process, runOnMC=False, allPF = False, onlyAK4=False, onlyAK8=Fal
idx_name = cms.string("pFCandsIdx"),
nameSV = cms.string("FatJetSVs"),
idx_nameSV = cms.string("sVIdx"),
nameMu = cms.string("FatJetMuons"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements
Especially here, this looked like an empty line and caused some confusion first

python/addBTV.py Outdated
Comment on lines 371 to 378
DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1",
float,
doc="DeepCSV C vs L discriminator",
precision=10),
DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1",
float,
doc="DeepCSV C vs B discriminator",
precision=10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1 to >=0. Although that should not make that big of a difference, probably it has some effects related to floating point precision.

python/addBTV.py Outdated
Comment on lines 328 to 335
DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1",
float,
doc="DeepCSV C vs L discriminator",
precision=10),
DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1",
float,
doc="DeepCSV C vs B discriminator",
precision=10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1 to >=0. Although that should not make that big of a difference, probably it has some effects related to floating point precision.

python/addBTV.py Outdated
float,
doc="combinedIVF",
precision=10),
DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Implementation itself, technical details & storage]
For the b discriminator (the sum of the individual b, bb related outputs) something similar to the CvsL/CvsB discriminator approach could be done to avoid getting irregular -2 bins: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L83 -> also add the ternary operator to test whether the sum is >=0 and assign either that sum or -1 finally.

python/addBTV.py Outdated
float,
doc="combinedIVF",
precision=10),
DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Implementation itself, technical details & storage]
Same comment as above

Comment on lines 327 to 343
auto muonTable = std::make_unique<nanoaod::FlatTable>(outMuons->size(), nameMu_, false);
// We fill from here only stuff that cannot be created with the SimpleFlatTnameableProducer
muonTable->addColumn<int>("jetIdx", jetIdx_mu, "Index of the parent jet");
muonTable->addColumn<int>(idx_nameMu_, muIdx, "Index in the Muon list");
if (readBtag_) {
muonTable->addColumn<float>("pt", muon_pt, "pt", 20);
muonTable->addColumn<float>("eta", muon_eta, "eta", 20);
muonTable->addColumn<float>("phi", muon_phi, "phi", 20);
muonTable->addColumn<double>("isGlobal", muon_isGlobal, "isGlobal", 20);
muonTable->addColumn<float>("ptrel", muon_ptrel, "pt relative to parent jet", 20);
muonTable->addColumn<int>("nMuHit", muon_nMuHit, "number of muon hits", 20);
muonTable->addColumn<int>("nMatched", muon_nMatched, "number of matched stations", 20);
muonTable->addColumn<int>("nTkHit", muon_nTkHit, "number of tracker hits", 20);
muonTable->addColumn<int>("nPixHit", muon_nPixHit, "number of pixel hits", 20);
muonTable->addColumn<int>("nOutHit", muon_nOutHit, "number of missing outer hits", 20);
muonTable->addColumn<double>("chi2", muon_chi2, "chi2", 20);
muonTable->addColumn<double>("chi2Tk", muon_chi2Tk, "chi2 inner track", 20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Implementation itself, technical details & storage]
The observables of type float or double have precision of 20, while the other variables in JetConstituentTableProducer all use 10. Was there a specific reason to effectively double the precision? Maybe we should stick to one level of precision for jet-constituent-related features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For my own education, but maybe to avoid we might be missing something here]
How were the stored Muon variables chosen, related to e.g. https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/12_2_X/interface/JetInfoBranches.h#L344-L369?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what concerns the precision, I tried with 10 but I obtained a segmentation violation error.
About the variables stored, I followed this list according to Soureek instructions https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/10_6_X_UL2016_PreliminaryJECs/python/varGroups_cfi.py#L1738-L1890

Comment on lines 76 to 77
#nameMu = cms.string("JetMuons"),
#idx_nameMu = cms.string("MuIdx"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Places with commented out code which could be removed for clarity
The default name JetMuons and Idx when adding the Muon table to AK4 jets could also be removed (currently commented out).

@AnnikaStein
Copy link
Contributor

Hi,
Happy New Year to you! Just wanted to check, @angzaza if you had any questions regarding the review comments, or how to implement them. @demuller was there something else that I missed so far which should be included in the merge or were there further comments?
Thanks! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants