Raster math on a stack of large rasters

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Raster math on a stack of large rasters

Dave Gregovich
Hi,
I am performing a math operation on a stack of large rasters. The code below uses smaller files for illustration and reproducibility.
Any alternative way of performing this task that does not create huge temporary files, and perhaps cuts down on processing time, would be greatly appreciated. The 'calc'  process creates a couple of temporary raster files that are huge. The first one is 142 GB, and I don't have hard drive space for that one and the second one that begins writing during the process.
Thanks kindly for any advice!
Dave.

#create raster stack and coefficients...
library(raster)
mod.coefs <- rnorm(10)
s <- stack()
r <- raster(nrow = 100, ncol = 100)
#actual rasters I am working with are about 40000, pixels square, with each GeoTiff raster in the stack taking about 2.5 GB on disk

for(i in 1:10){
  r[] <- rnorm(10000)
  s <- addLayer(s, r)
}

#attempt to perform raster math...
out.file <- 'C:/ out.rast.tif'
out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = out.file)

#at this point, the temporary files in the \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite large,
#with one .GRI file reaching 142 GB, and another now growing to 8 GB before I ended the process
#the file 'out.file' has not been created at that point.
#_____________end____________________________________________________________________

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster math on a stack of large rasters

Dave Gregovich
I apologize for not including my sessionInfo() previously:
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] raster_2.5-8 sp_1.2-4

loaded via a namespace (and not attached):
[1] rgdal_1.2-5     tools_3.3.3     Rcpp_0.12.9     grid_3.3.3      lattice_0.20-34







From: Gregovich, Dave P (DFG)
Sent: Thursday, April 06, 2017 9:27 AM
To: '[hidden email]'
Subject: Raster math on a stack of large rasters

Hi,
I am performing a math operation on a stack of large rasters. The code below uses smaller files for illustration and reproducibility.
Any alternative way of performing this task that does not create huge temporary files, and perhaps cuts down on processing time, would be greatly appreciated. The 'calc'  process creates a couple of temporary raster files that are huge. The first one is 142 GB, and I don't have hard drive space for that one and the second one that begins writing during the process.
Thanks kindly for any advice!
Dave.

#create raster stack and coefficients...
library(raster)
mod.coefs <- rnorm(10)
s <- stack()
r <- raster(nrow = 100, ncol = 100)
#actual rasters I am working with are about 40000, pixels square, with each GeoTiff raster in the stack taking about 2.5 GB on disk

for(i in 1:10){
  r[] <- rnorm(10000)
  s <- addLayer(s, r)
}

#attempt to perform raster math...
out.file <- 'C:/ out.rast.tif'
out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = out.file)

#at this point, the temporary files in the \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite large,
#with one .GRI file reaching 142 GB, and another now growing to 8 GB before I ended the process
#the file 'out.file' has not been created at that point.
#_____________end____________________________________________________________________

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster math on a stack of large rasters

Benjamin Leutner
In reply to this post by Dave Gregovich
You could stick to the native raster format (grd), in which case calc()
writes to your final file directly.
Of course that doesn't change the file size issue, but saves you the
translation step to geotiff.

For the file size you could consider restricting the datatype to integer
(see ?dataType).
If I have reflectance data [0,1] for example; I scale them by a factor
10000 and then save as "INT4U" or "INT2U" (depending on your maximally
expected value range), or "INT4S" or "INT2S" if you have negative
values. That can bring down the filesize quite a bit, while retaining
most of the relevant precission (beware: remaining decimal places will
be cut-off,)

e.g. calc(..., function(x) { yourcalculations(x) * 10000 }, datatype =
"INT4S")

pro tip: the argument in calc() (or more precisely in writeRaster()) is
called datatype, not to be confused with the stand-alone function
dataType() with a capital T. That one has bitten me many times, because
due to the "..." argument there will be no warning if you mistype it ;-)



On 06.04.2017 19:27, Gregovich, Dave P (DFG) wrote:

> Hi,
> I am performing a math operation on a stack of large rasters. The code below uses smaller files for illustration and reproducibility.
> Any alternative way of performing this task that does not create huge temporary files, and perhaps cuts down on processing time, would be greatly appreciated. The 'calc'  process creates a couple of temporary raster files that are huge. The first one is 142 GB, and I don't have hard drive space for that one and the second one that begins writing during the process.
> Thanks kindly for any advice!
> Dave.
>
> #create raster stack and coefficients...
> library(raster)
> mod.coefs <- rnorm(10)
> s <- stack()
> r <- raster(nrow = 100, ncol = 100)
> #actual rasters I am working with are about 40000, pixels square, with each GeoTiff raster in the stack taking about 2.5 GB on disk
>
> for(i in 1:10){
>    r[] <- rnorm(10000)
>    s <- addLayer(s, r)
> }
>
> #attempt to perform raster math...
> out.file <- 'C:/ out.rast.tif'
> out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = out.file)
>
> #at this point, the temporary files in the \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite large,
> #with one .GRI file reaching 142 GB, and another now growing to 8 GB before I ended the process
> #the file 'out.file' has not been created at that point.
> #_____________end____________________________________________________________________
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Benjamin Leutner M.Sc.
PhD candidate

Department of Remote Sensing
University of Wuerzburg
Campus Hubland Nord 86
97074 Wuerzburg, Germany

Tel: +49-(0)931-31 89594
Email: [hidden email]
Web: http://www.fernerkundung.uni-wuerzburg.de

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster math on a stack of large rasters

Bacou, Melanie
Also take a look at using raster:: clusterR() in combination with calc()
to use multiple cores. This works well if your your calculation function
does not require neighboring cells.

--Mel.


On 04/07/2017 09:50 AM, Benjamin Leutner wrote:

> You could stick to the native raster format (grd), in which case
> calc() writes to your final file directly.
> Of course that doesn't change the file size issue, but saves you the
> translation step to geotiff.
>
> For the file size you could consider restricting the datatype to
> integer (see ?dataType).
> If I have reflectance data [0,1] for example; I scale them by a factor
> 10000 and then save as "INT4U" or "INT2U" (depending on your maximally
> expected value range), or "INT4S" or "INT2S" if you have negative
> values. That can bring down the filesize quite a bit, while retaining
> most of the relevant precission (beware: remaining decimal places will
> be cut-off,)
>
> e.g. calc(..., function(x) { yourcalculations(x) * 10000 }, datatype =
> "INT4S")
>
> pro tip: the argument in calc() (or more precisely in writeRaster())
> is called datatype, not to be confused with the stand-alone function
> dataType() with a capital T. That one has bitten me many times,
> because due to the "..." argument there will be no warning if you
> mistype it ;-)
>
>
>
> On 06.04.2017 19:27, Gregovich, Dave P (DFG) wrote:
>> Hi,
>> I am performing a math operation on a stack of large rasters. The
>> code below uses smaller files for illustration and reproducibility.
>> Any alternative way of performing this task that does not create huge
>> temporary files, and perhaps cuts down on processing time, would be
>> greatly appreciated. The 'calc'  process creates a couple of
>> temporary raster files that are huge. The first one is 142 GB, and I
>> don't have hard drive space for that one and the second one that
>> begins writing during the process.
>> Thanks kindly for any advice!
>> Dave.
>>
>> #create raster stack and coefficients...
>> library(raster)
>> mod.coefs <- rnorm(10)
>> s <- stack()
>> r <- raster(nrow = 100, ncol = 100)
>> #actual rasters I am working with are about 40000, pixels square,
>> with each GeoTiff raster in the stack taking about 2.5 GB on disk
>>
>> for(i in 1:10){
>>    r[] <- rnorm(10000)
>>    s <- addLayer(s, r)
>> }
>>
>> #attempt to perform raster math...
>> out.file <- 'C:/ out.rast.tif'
>> out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename =
>> out.file)
>>
>> #at this point, the temporary files in the
>> \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite
>> large,
>> #with one .GRI file reaching 142 GB, and another now growing to 8 GB
>> before I ended the process
>> #the file 'out.file' has not been created at that point.
>> #_____________end____________________________________________________________________
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster math on a stack of large rasters

Christoph Raab
In reply to this post by Dave Gregovich
You can try to run the calc process based on tiles in parallel (pblapply
package ) and delete all files in the tempdir of the raster package
after each iteration. In the end you need to mosaic the tiles back together.


library(rgdal)
library(raster)
library(pbapply)
library(GSIF)

#set the tempdir for the raster package
#the tempdir can be cleared after a tile is processed
tempdir <- 'path_to_your_desired_tempdir'
rasterOptions(tmpdir=tempdir)

#detect number of cores
#you should adjust this according to your memory
n.cores <- detectCores()

out.file <- 'path_to_result_folder_where_the_tiles_will_be_saved'
fn <- "path_to_your_in_raster.tif"
obj <- GDALinfo(fn)

#create tiles with e.g. 2000*2000 pixels and no overlap
#you can adjust the block size in x and y direction
ras.lst <- GSIF::getSpatialTiles(obj, block.x=2000, overlap.percent = 0)

#use pbapply in order to run the processing in parallel
out.list <- pblapply(1:length(ras.lst[,1]),function(i){
   #credit goes to stackoverflow or so, but I couldn't find the source I
got it from anymore
   offset <- c(ras.lst$offset.y[i], ras.lst$offset.x[i])
   region.dim <- c(ras.lst$region.dim.y[i],
                   ras.lst$region.dim.x[i])
   ## read a tile
   s <- readGDAL(fn, offset=offset,
                   region.dim=region.dim)

   #perfome calc
   out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename =
paste0(out.file,toString(i),".tif"))
   #clear the tempdir
   do.call(file.remove, list(list.files(tempdir, full.names = TRUE)))

} , cl = n.cores)

#settings for the mosaic process
out.list$fun <- mean
out.list$filename <- 'final_result.tif'
#mosaic each tile in the list from pblapply
mos <- do.call(mosaic, out.list)


On 08.04.2017 12:00, [hidden email] wrote:

> Send R-sig-Geo mailing list submissions to
> [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> or, via email, send a message with subject or body 'help' to
> [hidden email]
>
> You can reach the person managing the list at
> [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-Geo digest..."
>
>
> Today's Topics:
>
>     1. Re: Raster math on a stack of large rasters (Benjamin Leutner)
>     2. Re: Raster math on a stack of large rasters (Melanie Bacou)
>     3. how to get weights for inverse distance weighting
>        (Waichler, Scott R)
>     4. Re: how to get weights for inverse distance weighting
>        (Edzer Pebesma)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 7 Apr 2017 15:50:06 +0200
> From: Benjamin Leutner <[hidden email]>
> To: R-mailing list <[hidden email]>
> Subject: Re: [R-sig-Geo] Raster math on a stack of large rasters
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> You could stick to the native raster format (grd), in which case calc()
> writes to your final file directly.
> Of course that doesn't change the file size issue, but saves you the
> translation step to geotiff.
>
> For the file size you could consider restricting the datatype to integer
> (see ?dataType).
> If I have reflectance data [0,1] for example; I scale them by a factor
> 10000 and then save as "INT4U" or "INT2U" (depending on your maximally
> expected value range), or "INT4S" or "INT2S" if you have negative
> values. That can bring down the filesize quite a bit, while retaining
> most of the relevant precission (beware: remaining decimal places will
> be cut-off,)
>
> e.g. calc(..., function(x) { yourcalculations(x) * 10000 }, datatype =
> "INT4S")
>
> pro tip: the argument in calc() (or more precisely in writeRaster()) is
> called datatype, not to be confused with the stand-alone function
> dataType() with a capital T. That one has bitten me many times, because
> due to the "..." argument there will be no warning if you mistype it ;-)
>
>
>
> On 06.04.2017 19:27, Gregovich, Dave P (DFG) wrote:
>> Hi,
>> I am performing a math operation on a stack of large rasters. The code below uses smaller files for illustration and reproducibility.
>> Any alternative way of performing this task that does not create huge temporary files, and perhaps cuts down on processing time, would be greatly appreciated. The 'calc'  process creates a couple of temporary raster files that are huge. The first one is 142 GB, and I don't have hard drive space for that one and the second one that begins writing during the process.
>> Thanks kindly for any advice!
>> Dave.
>>
>> #create raster stack and coefficients...
>> library(raster)
>> mod.coefs <- rnorm(10)
>> s <- stack()
>> r <- raster(nrow = 100, ncol = 100)
>> #actual rasters I am working with are about 40000, pixels square, with each GeoTiff raster in the stack taking about 2.5 GB on disk
>>
>> for(i in 1:10){
>>     r[] <- rnorm(10000)
>>     s <- addLayer(s, r)
>> }
>>
>> #attempt to perform raster math...
>> out.file <- 'C:/ out.rast.tif'
>> out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = out.file)
>>
>> #at this point, the temporary files in the \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite large,
>> #with one .GRI file reaching 142 GB, and another now growing to 8 GB before I ended the process
>> #the file 'out.file' has not been created at that point.
>> #_____________end____________________________________________________________________
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Loading...