Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up raster import process and solve some issues #82

Closed
ivanprado opened this issue Nov 16, 2017 · 5 comments
Closed

Speed up raster import process and solve some issues #82

ivanprado opened this issue Nov 16, 2017 · 5 comments
Assignees

Comments

@ivanprado
Copy link

  • Using virtual rasters (VRT) for avoiding rewrite the raster several times.
  • Now intermediate rasters are just VRT files, so the just one tif file is written. That is not the case when downsampling is needed (change raster type) because a intermediate file is needed here.
  • As all transformation are done togheter into gdaltransform so compression is much better (it fails if compression is done at gdalwarp: the resultant tif size explodes).
  • Now always BIGTIFF format is used.
  • Saving tiffs internally with tiles
  • More memory assigned to gdalwarp + multi option is used.
@ivanprado ivanprado self-assigned this Nov 16, 2017
@ivanprado
Copy link
Author

Working code at https://github.com/splashblot/dronedb/tree/tileo-feature/speedup_gdal_import

Deploying into beta to test there.

@ivanprado
Copy link
Author

That are the proposed changes: 86a13ec

@ivanprado
Copy link
Author

ivanprado commented Nov 17, 2017

Noticed that the import got stuck with big rasters. Finally found the cause.

The source of problem was a statement timeout (it seems to be 5 minutes). But there where a second problem that was the one that make it get stuck instead of just failing.
A library for running command is being used Open3. That library has a command to execute a pipe and then read the output of the pipe. But if was executed in a blocking fasion... So if pipe results is bigger than the output buffer then the pipe just stop waiting somebody to read the buffer.... but Open3 is waiting the pipe to finish to consume the buffer! so a deadlock I've changed the code to avoid piping (just writting intermediate files)... and now the Statement timeout arises.

Remaining tasks:

  • Find the way to increase the statement_timeout but just for these import process, including the calls to psql and the direct SQL through the rails sql driver.
  • Include new tests for the new code. See file raster2psql_spec.rb

@ivanprado
Copy link
Author

Code is implemented and in process on being deployed to production: 860a05c

There is a remaining problem. Diego's tiff 2017-09-27_rgb_olivo_valdezarza_2_transparent_mosaic_group1.tif (3GB) is painfully slow. It is interesting that I was able to load a crop of that raster with weight 500MB pretty quickly. So probably gdal doing not the proper work here.

I have noticed that we have a pretty old gdal version: GDAL 1.11.0, released 2014/04/16. Maybe newer versions has solved that in a better way. As next step it would be nice if we can recompile a newer version and test with it to see if the problem disappears.

@ivanprado
Copy link
Author

On production. Still the issue with Diego's Tiff but with a working workaround. We'll keep track of that on the issue #88. Closing that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant