Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charges are missing #42

Open
raimis opened this issue Aug 24, 2022 · 8 comments
Open

Charges are missing #42

raimis opened this issue Aug 24, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@raimis
Copy link

raimis commented Aug 24, 2022

The dataset file (https://github.com/openmm/spice-dataset/releases/download/1.0/SPICE.hdf5) doesn't contain the total molecular charge. This could be extracted parsing the SMILES, but it is inconvenient and adds additional burden on the users.

The dataset should provide the complete QM description of a molecule (i.e. elements, positions, charge, and spin state) in a convenient form. The downloader should be modified to add a field with the total charge (and maybe formal charges) for each molecule.

Also, I would suggest including Mulliken charges (Psi4 computes them by default). They could be used to filter "broken" molecules. From my recent experience, the large forces aren't enough to catch them all.

@peastman
Copy link
Member

Also, I would suggest including Mulliken charges (Psi4 computes them by default).

I don't think it's possible to add them without recomputing the whole dataset from scratch. Perhaps @pavankum or @dotsdl can confirm that?

@pavankum
Copy link
Collaborator

pavankum commented Aug 24, 2022

Yeah, I agree with @peastman that we have to recompute from scratch. Although, MBIS charges are available on almost all SPICE sets (except the DES370K supplement). Even the psi4 stdout on the QCA records don't have Mulliken charges printed out since they're calculated post SCF (afaik) and we didn't specify in our inputs to calculate those.

@raimis
Copy link
Author

raimis commented Sep 2, 2022

Let's go back to the main issue: how to get the molecular and optionally partial charges. I want to make the SPICE loader in TorchMD-NET (https://github.com/torchmd/torchmd-net/blob/main/torchmdnet/datasets/spice.py) to be able load them.

@peastman
Copy link
Member

peastman commented Sep 2, 2022

Having the downloader create a formal_charges field would be reasonable. You can retrieve them like this:

mol = Chem.MolFromSmiles(smiles)
charges = [atom.GetFormalCharge() for atom in mol.GetAtoms()]

If you want MBIS partial charges, you can already store those by including the 'MBIS CHARGES' option in the config file.

@jchodera
Copy link
Member

jchodera commented Sep 2, 2022

The dataset should provide the complete QM description of a molecule (i.e. elements, positions, charge, and spin state) in a convenient form.

@bennybp: Any chance you have thought about how to represent this information in a common way in your HDF5 files built for machine learning?

@jchodera
Copy link
Member

jchodera commented Sep 3, 2022

Also, I would suggest including Mulliken charges (Psi4 computes them by default). They could be used to filter "broken" molecules. From my recent experience, the large forces aren't enough to catch them all.

I just wanted to point you to this PR that shows how to identify molecules that have changed connectivity as an alternative to using charges to filter "broken" molecules.

@davkovacs
Copy link

I would like to extend the SPICE dataset, and am trying to reproduce some of the QM calculations to ensure my DFT settings are correct.

I would really support adding to the dataset the total charge and spin multiplicity of the molecules at the very least to improve reproducibility of the DFT calculations.

@jokpreiksa
Copy link

[davkovacs], you can easily extract this information from smiles using rdkit:

FOR MULTIPLICITY:

def GetSpinMultiplicity(Mol, CheckMolProp = True):
Name = 'SpinMultiplicity'
if (CheckMolProp and Mol.HasProp(Name)):
return int(float(Mol.GetProp(Name)))

# Calculate spin multiplicity using Hund's rule of maximum multiplicity...
NumRadicalElectrons = 0
for Atom in Mol.GetAtoms():
   NumRadicalElectrons += Atom.GetNumRadicalElectrons()

TotalElectronicSpin = NumRadicalElectrons/2
SpinMultiplicity = 2 * TotalElectronicSpin + 1

return int(SpinMultiplicity)

FOR CHARGE:

charge = Chem.GetFormalCharge(molecule)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants