Optimized rasterOptions() for a (virtually) infinite RAM machine

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimized rasterOptions() for a (virtually) infinite RAM machine

R-sig-geo mailing list
Dear all,
I am using the raster package to process a total of 32 daily climate files supplied as netcdf files. Each file is a raster brick with 100 rows x 95 cols x 54750 time slices and weighs about 1.5 GB.
Essentially, all the processing I am performing on each netcdf file is:
a) to subset a specific date rangeb) to extract values using points
After that, I just convert the extracted data to data.tables and keep working in that format.
Since I extract data for about 450 points, and append all the data in a huge data.table, I need to use a computer with as much RAM as possible.
I ended up using a spot instance on Amazon EC2. Using an instance with 32 cores and 244GB of RAM will cost me around $0.30/hour.
Since I will be charged per hour, I need to optimize my code to get my results as fast as possible.
I don't even copy my data to the instance's hard disk; I send the files directly to the ram disk (/dev/shm). Even using 48GB of ram disk to store the files, I'll still have 196GB of RAM.
Under the scenario of having virtually infinite RAM memory, what would be the best rasterOptions() to make sure I am processing all my rasters in memory? Any other tips to benefit from such a large amount of RAM?
Thanks, -- Thiago V. dos Santos
Postdoctoral Research FellowDepartment of Climate and Space Science and EngineeringUniversity of Michigan
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Optimized rasterOptions() for a (virtually) infinite RAM machine

Michael Sumner-2
See Noam's post here for good advice,  avoiding temp files is very
important in your case:

https://discuss.ropensci.org/t/how-to-avoid-space-hogging-raster-tempfiles/864

For using data frames raster's cell index abstraction is super powerful and
sadly underused,  see tabularaster for some easy approaches. Don't store
coordinates explicitly,  for example,  at least not until you are ready to
plot with ggplot2.

Finally,  raster us generally great with NetCDF if you let it control the
task,  but different situations and file setups can really matter so feel
free to provide details if things aren't working well.  Generally using
raster can easily match the best you can achieve with the NetCDF API but
lots of specifics can bite.  Raster is generally not able to efficiently
crop space and time together,  for example but functions mapped to slice
extraction can he used to hone performance.

Cheers,  Mike

On Sat, 23 Sep 2017, 14:02 Thiago V. dos Santos via R-sig-Geo <
[hidden email]> wrote:

> Dear all,
> I am using the raster package to process a total of 32 daily climate files
> supplied as netcdf files. Each file is a raster brick with 100 rows x 95
> cols x 54750 time slices and weighs about 1.5 GB.
> Essentially, all the processing I am performing on each netcdf file is:
> a) to subset a specific date rangeb) to extract values using points
> After that, I just convert the extracted data to data.tables and keep
> working in that format.
> Since I extract data for about 450 points, and append all the data in a
> huge data.table, I need to use a computer with as much RAM as possible.
> I ended up using a spot instance on Amazon EC2. Using an instance with 32
> cores and 244GB of RAM will cost me around $0.30/hour.
> Since I will be charged per hour, I need to optimize my code to get my
> results as fast as possible.
> I don't even copy my data to the instance's hard disk; I send the files
> directly to the ram disk (/dev/shm). Even using 48GB of ram disk to store
> the files, I'll still have 196GB of RAM.
> Under the scenario of having virtually infinite RAM memory, what would be
> the best rasterOptions() to make sure I am processing all my rasters in
> memory? Any other tips to benefit from such a large amount of RAM?
> Thanks, -- Thiago V. dos Santos
> Postdoctoral Research FellowDepartment of Climate and Space Science and
> EngineeringUniversity of Michigan
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo