# Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

11 messages
Open this post in threaded view
|

## Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

 Hi Chris, The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm'). Hope this helps. Regards, Jin -----Original Message----- From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram Sent: Wednesday, 22 November 2017 11:08 AM To: [hidden email] Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? down votefavorite I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram. I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome. library(sp) library(gstat) data(meuse) dataset= meuse set.seed(999) # Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset))) Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,] holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))]) Hold_out_sample = holdoutvars_df[-Training_ids,] coordinates(Training_sample) <- c('x','y') coordinates(Hold_out_sample) <- c('x','y') # Semivariogram modeling m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m) # Apply Univ Krig to Training dataset prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data\$var1.pred) # Apply Univ Krig to Hold Out dataset prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data\$var1.pred) # Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample\$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars\$zinc - prediction_holdout_data # Function that returns Mean Absolute Error mae <- function(error) {   mean(abs(error)) } # Mean Absolute Error metric : # UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set print(mae(training_prediction_error_term)) #Error for Training sample set print(mae(holdout_prediction_error_term)) #Error for Hold out sample set cheers, Kristopher (Chris)         [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geoGeoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks. _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Open this post in threaded view
|

## Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

 Hi Jin, thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based. In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm? thanks Chris ________________________________ From: Li Jin <[hidden email]> Sent: November 21, 2017 5:33 PM To: Joelle k. Akram; [hidden email] Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED] Hi Chris, The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm'). Hope this helps. Regards, Jin -----Original Message----- From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram Sent: Wednesday, 22 November 2017 11:08 AM To: [hidden email] Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? down votefavorite I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram. I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome. library(sp) library(gstat) data(meuse) dataset= meuse set.seed(999) # Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset))) Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,] holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))]) Hold_out_sample = holdoutvars_df[-Training_ids,] coordinates(Training_sample) <- c('x','y') coordinates(Hold_out_sample) <- c('x','y') # Semivariogram modeling m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m) # Apply Univ Krig to Training dataset prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data\$var1.pred) # Apply Univ Krig to Hold Out dataset prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data\$var1.pred) # Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample\$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars\$zinc - prediction_holdout_data # Function that returns Mean Absolute Error mae <- function(error) {   mean(abs(error)) } # Mean Absolute Error metric : # UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set print(mae(training_prediction_error_term)) #Error for Training sample set print(mae(holdout_prediction_error_term)) #Error for Hold out sample set cheers, Kristopher (Chris)         [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geoGeoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks. -------------------------------------------------------------------------------------------------------------------------         [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Open this post in threaded view
|

## Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|

## Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|

## Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|

## Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|

## Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|

## Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Open this post in threaded view
|