Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make test fails for multiple tests #75

Closed
wiederm opened this issue May 18, 2022 · 6 comments
Closed

make test fails for multiple tests #75

wiederm opened this issue May 18, 2022 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@wiederm
Copy link

wiederm commented May 18, 2022

I have build openmm-torch from source and I can run the Python example shown in README.
But since I am still having trouble here I wanted to make sure that openmm-torch behaves as expected and I tried to run make test.

(rew) [mwieder@a7srv5 build 💡 ](master)$ make test
Running tests...
Test project /data/shared/software/openmm-torch/build
    Start 1: TestSerializeTorchForce
1/8 Test #1: TestSerializeTorchForce ..........   Passed    0.20 sec
    Start 2: TestReferenceTorchForce
2/8 Test #2: TestReferenceTorchForce ..........***Failed    0.22 sec
    Start 3: TestOpenCLTorchForceSingle
3/8 Test #3: TestOpenCLTorchForceSingle .......***Failed    0.53 sec
    Start 4: TestOpenCLTorchForceMixed
4/8 Test #4: TestOpenCLTorchForceMixed ........***Failed    0.55 sec
    Start 5: TestOpenCLTorchForceDouble
5/8 Test #5: TestOpenCLTorchForceDouble .......***Failed    0.55 sec
    Start 6: TestCudaTorchForceSingle
6/8 Test #6: TestCudaTorchForceSingle .........***Failed    0.40 sec
    Start 7: TestCudaTorchForceMixed
7/8 Test #7: TestCudaTorchForceMixed ..........***Failed    0.37 sec
    Start 8: TestCudaTorchForceDouble
8/8 Test #8: TestCudaTorchForceDouble .........***Failed    0.40 sec

13% tests passed, 7 tests failed out of 8

Total Test time (real) =   3.28 sec

The following tests FAILED:
	  2 - TestReferenceTorchForce (Failed)
	  3 - TestOpenCLTorchForceSingle (Failed)
	  4 - TestOpenCLTorchForceMixed (Failed)
	  5 - TestOpenCLTorchForceDouble (Failed)
	  6 - TestCudaTorchForceSingle (Failed)
	  7 - TestCudaTorchForceMixed (Failed)
	  8 - TestCudaTorchForceDouble (Failed)
Errors while running CTest
Output from these tests are in: /data/shared/software/openmm-torch/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

a closer look at the log file shows the same error (until frame 9) for all 7 of the 8 tests:

2/8 Testing: TestReferenceTorchForce
2/8 Test: TestReferenceTorchForce
Command: "/data/shared/software/openmm-torch/build/TestReferenceTorchForce" "single"
Directory: /data/shared/software/openmm-torch/build
"TestReferenceTorchForce" start time: May 18 14:00 CEST
Output:
----------------------------------------------------------
exception: Legacy model format is not supported on mobile.
Exception raised from deserialize at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1650973827143/work/torch/csrc/jit/serialization/import.cpp:267 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68
 (0x7f692001dc78 in /data/shared/software/python_env/anaconda3/envs/rew/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xf4 (0x7f691fffbb5b in /data/shared/softwa
re/python_env/anaconda3/envs/rew/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x34c0b44 (0x7f694fa24b44 in /data/shared/software/python_env/anaconda3/envs/rew/lib/python3.9/site-packa
ges/torch/lib/libtorch_cpu.so)
frame #3: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std:
:__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, st
d::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x1c6 (0x7f694fa25b36 in /data/shared/software/python_env/anaconda3/envs/rew/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc7 (0x7f694fa27517 in /data/shared/software/python_env/anaconda3/envs/rew/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x6c (0x7f694fa275fc in /data/shared/software/python_env/anaconda3/envs/rew/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: TorchPlugin::TorchForceImpl::initialize(OpenMM::ContextImpl&) + 0x58 (0x7f69535889f8 in /data/shared/software/openmm-torch/build/libOpenMMTorch.so)
frame #7: OpenMM::ContextImpl::initialize() + 0x3c5 (0x7f69530ec6a5 in /data/shared/software/python_env/anaconda3/envs/rew/lib/libOpenMM.so.7.7)
frame #8: OpenMM::Context::Context(OpenMM::System const&, OpenMM::Integrator&, OpenMM::Platform&) + 0x7f (0x7f69530e53ff in /data/shared/software/python_env/anaconda3/envs/rew/lib/libOpenMM.so.7.7)
frame #9: /data/shared/software/openmm-torch/build/TestReferenceTorchForce() [0x403220]
frame #10: /data/shared/software/openmm-torch/build/TestReferenceTorchForce() [0x402d5d]
frame #11: __libc_start_main + 0xf3 (0x7f692008fca3 in /lib64/libc.so.6)
frame #12: /data/shared/software/openmm-torch/build/TestReferenceTorchForce() [0x402dfe]

Do you have any idea what is going on here?

@peastman
Copy link
Member

exception: Legacy model format is not supported on mobile.

You're running on a mobile device? What hardware and OS are you using?

@wiederm
Copy link
Author

wiederm commented May 18, 2022 via email

@peastman
Copy link
Member

I'm confused then! A search online turned up a few people encountering that error message, but they were all on Android. The "legacy model format" part is easier to understand. The tests try to load models from files. It sounds like PyTorch has changed its format since those files were created. We could regenerate them. But I can't explain why it thinks you're on mobile.

@raimis raimis added the help wanted Extra attention is needed label May 18, 2022
@wiederm
Copy link
Author

wiederm commented May 19, 2022

after reinstalling with the new openMM dev build and rebuilding all tests pass now!
I have no idea what caused the mobile platform error, but it seems to have vanished. Thanks for the help!

@wiederm wiederm closed this as completed May 19, 2022
@peastman
Copy link
Member

The world is a mysterious place!

@wiederm
Copy link
Author

wiederm commented May 20, 2022

Indeed :-)
Just for the record: tests pass with Pytorch 1.10 but after upgrading to 1.11 tests fail with the legacy import issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants