Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceExhaustedError #85

Open
sayami888 opened this issue Jan 16, 2021 · 0 comments
Open

ResourceExhaustedError #85

sayami888 opened this issue Jan 16, 2021 · 0 comments

Comments

@sayami888
Copy link

I tried to train with my own dataset, but I couldn't because of a memory error.
I confirmed that I was able to learn normally by using kitti data.

The image size of the data I created is 1024x576.
The GPU used is GeForce GTX 1080 Ti,and freememory is 9.92GB.
The size of the training hkl file is 4.7GB and the size of the evaluation data is 1.4GB.

A warning message will be displayed as soon as learning begins.
Epoch 1/150
2021-01-16 16:11:36.539265: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 486.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

The final error statement is:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4,96,288,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: pred_net_1/while/convolution_19 = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](pred_net_1/while/concat_5, pred_net_1/while/convolution_19/Enter)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: loss/mul/_577 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7402_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

The image size was larger at first, but I made it smaller many times. I also reduced the amount of images, but the error persists. Is the amount of my GPU internal memory insufficient?Do I have to reduce the image size?

If there is a solution, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant