Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

powerai:1.5.4 Tensorflow #274

Open
ghost opened this issue Feb 2, 2021 · 5 comments
Open

powerai:1.5.4 Tensorflow #274

ghost opened this issue Feb 2, 2021 · 5 comments

Comments

@ghost
Copy link

ghost commented Feb 2, 2021

Hi,

I am trying to use ibmcom/powerai:1.5.4-all-ubuntu18.04-py3 since it's listed as having TF 1.12.0, which is required for code I am trying to run.

I've also tried source /opt/DL/tensorflow/bin/install_dependencies then source /opt/DL/tensorflow/bin/tensorflow-activate as recommended from a similar issue, but attempting import tensorflow still fails with ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

The architecture is ppc64le.

@hartb
Copy link
Member

hartb commented Feb 2, 2021

I think we weren't shipping CUDA as part of PowerAI back in the 1.5.4 timeframe, so that needed to be downloaded directly from NVIDIA and installed separately as a pre-requisite to PowerAI itself.

There's some info in the System Setup section of the 1.5.4 docs: https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_systemsetup.html

Note each version of PowerAI was built against specific CUDA version, and for 1.5.4 that was CUDA 10.0. That may limit the RHEL versions you could install on. The GPU driver version is a bit more flexible--generally newer driver versions should support older CUDA versions. But you'd still need a RHEL OS, CUDA, and driver version that are all supported together.

@jayfurmanek
Copy link
Contributor

Right. Can you share your CUDA installation details?
Do you have nvidia-docker setup properly? (Does nvidia-smi work from inside the container?)

@ghost
Copy link
Author

ghost commented Feb 4, 2021

I am working on a provided machine so I have very restricted permissions, i.e. no installation.

OS is Ubuntu 16.04, CUDA is at 9.2:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:37_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

nvidia-docker is also installed but I am not sure about the setup:

$ nvidia-docker --version
Docker version 18.03.1-ce, build 9ee9f40

nvidia-smi works outside the container, but not inside.

@jayfurmanek
Copy link
Contributor

OK, it sounds like your nvidia-docker set up is not working.
What is the output of docker info ?

@ghost
Copy link
Author

ghost commented Feb 5, 2021

$ docker info
Containers: 4
 Running: 2
 Paused: 0
 Stopped: 2
Images: 2423
Server Version: 18.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1669
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-193-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: ppc64le
CPUs: 152
Total Memory: 510.9GiB
Name: 
ID: 
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants