Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram
Hi Jin,


thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.


In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?


thanks

Chris


________________________________
From: Li Jin <[hidden email]>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
In reply to this post by Li Jin
They are not yet.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris

________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram
no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
In reply to this post by Joelle k. Akram
BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram
thanks Jin. The reason I am very surprised by the MAE_train and MAE_holdOut differences is due to my comparison of the KED (i.e., Univ krig. code in my initial message post) with Linear Regression.


Please see below for the Linear Regression code where the MAE_training_set = 90.1 and the MAE_holdOut_set = 97.4

On the other hand, KED  gave me MAE_training_set = 1 and the MAE_holdOut_set = 76.5.


Given that KED is a linear model (i.e. Linear Reg + Ord Krig.) I am surprised by these differences. Any insight from your end is appreciated.


cat("\014")
rm(list=ls())
cls <- function() cat(rep("\n",100))
cls()
graphics.off()
setwd("C:/Users/Ravi Persad/Desktop/OwenSound_Region25_UR010")
options(scipen = 999)
graphics.off()



library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets
Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,]
Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)
m <- vgm("Exp")
m <- fit.variogram(m1, m)


# Apply Linear regression to Training dataset
train_model <- lm(log(zinc)~lead+copper+elev+dist, Training_sample)
prediction_training_data <- expm1(predict(train_model,newdata =Training_sample ))

# Apply Linear Regression to Hold Out dataset
prediction_holdout_data <- expm1(predict(train_model,newdata =Hold_out_sample ))

# Computing Predictive errors for Training and Hold Out samples respectively
training_prediction_error_term <- Training_sample$zinc - prediction_training_data
holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data

# Function that returns Mean Absolute Error
mae <- function(error)
{
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set



________________________________
From: Li Jin <[hidden email]>
Sent: November 21, 2017 6:36 PM
To: Li Jin; Joelle k. Akram; [hidden email]
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
For both models, the MAE for holdout is larger than that for the training. That is expected.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:49 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


thanks Jin. The reason I am very surprised by the MAE_train and MAE_holdOut differences is due to my comparison of the KED (i.e., Univ krig. code in my initial message post) with Linear Regression.



Please see below for the Linear Regression code where the MAE_training_set = 90.1 and the MAE_holdOut_set = 97.4

On the other hand, KED  gave me MAE_training_set = 1 and the MAE_holdOut_set = 76.5.



Given that KED is a linear model (i.e. Linear Reg + Ord Krig.) I am surprised by these differences. Any insight from your end is appreciated.


cat("\014")
rm(list=ls())
cls <- function() cat(rep("\n",100))
cls()
graphics.off()
setwd("C:/Users/Ravi Persad/Desktop/OwenSound_Region25_UR010")
options(scipen = 999)
graphics.off()



library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets
Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,]
Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)
m <- vgm("Exp")
m <- fit.variogram(m1, m)


# Apply Linear regression to Training dataset
train_model <- lm(log(zinc)~lead+copper+elev+dist, Training_sample)
prediction_training_data <- expm1(predict(train_model,newdata =Training_sample ))

# Apply Linear Regression to Hold Out dataset
prediction_holdout_data <- expm1(predict(train_model,newdata =Hold_out_sample ))

# Computing Predictive errors for Training and Hold Out samples respectively
training_prediction_error_term <- Training_sample$zinc - prediction_training_data
holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data

# Function that returns Mean Absolute Error
mae <- function(error)
{
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:36 PM
To: Li Jin; Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram
Jin,


do you think there is potential evidence of overfitting for KED given the large difference in MAE betwen the train and holdout sets?


________________________________
From: Li Jin <[hidden email]>
Sent: November 21, 2017 7:00 PM
To: Joelle k. Akram; [hidden email]
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


For both models, the MAE for holdout is larger than that for the training. That is expected.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:49 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



thanks Jin. The reason I am very surprised by the MAE_train and MAE_holdOut differences is due to my comparison of the KED (i.e., Univ krig. code in my initial message post) with Linear Regression.



Please see below for the Linear Regression code where the MAE_training_set = 90.1 and the MAE_holdOut_set = 97.4

On the other hand, KED  gave me MAE_training_set = 1 and the MAE_holdOut_set = 76.5.



Given that KED is a linear model (i.e. Linear Reg + Ord Krig.) I am surprised by these differences. Any insight from your end is appreciated.



cat("\014")

rm(list=ls())

cls <- function() cat(rep("\n",100))

cls()

graphics.off()

setwd("C:/Users/Ravi Persad/Desktop/OwenSound_Region25_UR010")

options(scipen = 999)

graphics.off()







library(sp)

library(gstat)

data(meuse)

dataset= meuse

set.seed(999)



# Split Meuse Dataset into Training and HoldOut Sample datasets

Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))



Training_sample = dataset[Training_ids,]

Holdout_sample_allvars = dataset[-Training_ids,]



holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])

Hold_out_sample = holdoutvars_df[-Training_ids,]



coordinates(Training_sample) <- c('x','y')

coordinates(Hold_out_sample) <- c('x','y')



# Semivariogram modeling

m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)

m <- vgm("Exp")

m <- fit.variogram(m1, m)





# Apply Linear regression to Training dataset

train_model <- lm(log(zinc)~lead+copper+elev+dist, Training_sample)

prediction_training_data <- expm1(predict(train_model,newdata =Training_sample ))



# Apply Linear Regression to Hold Out dataset

prediction_holdout_data <- expm1(predict(train_model,newdata =Hold_out_sample ))



# Computing Predictive errors for Training and Hold Out samples respectively

training_prediction_error_term <- Training_sample$zinc - prediction_training_data

holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error

mae <- function(error)

{

  mean(abs(error))

}



# Mean Absolute Error metric :

# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set

print(mae(training_prediction_error_term)) #Error for Training sample set

print(mae(holdout_prediction_error_term)) #Error for Hold out sample set





________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:36 PM
To: Li Jin; Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Li Jin
Now I guess you need to understand how KED works. This paper https://doi.org/10.1016/j.envsoft.2013.12.008 may give you some clue. Theoretically, the mae of KED for the training dataset should be 0 due to its nature (hint: exactness).

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 5:38 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


Jin,



do you think there is potential evidence of overfitting for KED given the large difference in MAE betwen the train and holdout sets?

________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 7:00 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


For both models, the MAE for holdout is larger than that for the training. That is expected.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:49 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



thanks Jin. The reason I am very surprised by the MAE_train and MAE_holdOut differences is due to my comparison of the KED (i.e., Univ krig. code in my initial message post) with Linear Regression.



Please see below for the Linear Regression code where the MAE_training_set = 90.1 and the MAE_holdOut_set = 97.4

On the other hand, KED  gave me MAE_training_set = 1 and the MAE_holdOut_set = 76.5.



Given that KED is a linear model (i.e. Linear Reg + Ord Krig.) I am surprised by these differences. Any insight from your end is appreciated.



cat("\014")

rm(list=ls())

cls <- function() cat(rep("\n",100))

cls()

graphics.off()

setwd("C:/Users/Ravi Persad/Desktop/OwenSound_Region25_UR010")

options(scipen = 999)

graphics.off()







library(sp)

library(gstat)

data(meuse)

dataset= meuse

set.seed(999)



# Split Meuse Dataset into Training and HoldOut Sample datasets

Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))



Training_sample = dataset[Training_ids,]

Holdout_sample_allvars = dataset[-Training_ids,]



holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])

Hold_out_sample = holdoutvars_df[-Training_ids,]



coordinates(Training_sample) <- c('x','y')

coordinates(Hold_out_sample) <- c('x','y')



# Semivariogram modeling

m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)

m <- vgm("Exp")

m <- fit.variogram(m1, m)





# Apply Linear regression to Training dataset

train_model <- lm(log(zinc)~lead+copper+elev+dist, Training_sample)

prediction_training_data <- expm1(predict(train_model,newdata =Training_sample ))



# Apply Linear Regression to Hold Out dataset

prediction_holdout_data <- expm1(predict(train_model,newdata =Hold_out_sample ))



# Computing Predictive errors for Training and Hold Out samples respectively

training_prediction_error_term <- Training_sample$zinc - prediction_training_data

holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error

mae <- function(error)

{

  mean(abs(error))

}



# Mean Absolute Error metric :

# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set

print(mae(training_prediction_error_term)) #Error for Training sample set

print(mae(holdout_prediction_error_term)) #Error for Hold out sample set





________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:36 PM
To: Li Jin; Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [DKIM] Re: [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram
thanks Jin.


________________________________
From: Li Jin <[hidden email]>
Sent: November 22, 2017 3:05 PM
To: Joelle k. Akram; [hidden email]
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


Now I guess you need to understand how KED works. This paper https://doi.org/10.1016/j.envsoft.2013.12.008 may give you some clue. Theoretically, the mae of KED for the training dataset should be 0 due to its nature (hint: exactness).



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 5:38 PM
To: Li Jin; [hidden email]
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Jin,



do you think there is potential evidence of overfitting for KED given the large difference in MAE betwen the train and holdout sets?



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 7:00 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



For both models, the MAE for holdout is larger than that for the training. That is expected.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:49 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



thanks Jin. The reason I am very surprised by the MAE_train and MAE_holdOut differences is due to my comparison of the KED (i.e., Univ krig. code in my initial message post) with Linear Regression.



Please see below for the Linear Regression code where the MAE_training_set = 90.1 and the MAE_holdOut_set = 97.4

On the other hand, KED  gave me MAE_training_set = 1 and the MAE_holdOut_set = 76.5.



Given that KED is a linear model (i.e. Linear Reg + Ord Krig.) I am surprised by these differences. Any insight from your end is appreciated.



cat("\014")

rm(list=ls())

cls <- function() cat(rep("\n",100))

cls()

graphics.off()

setwd("C:/Users/Ravi Persad/Desktop/OwenSound_Region25_UR010")

options(scipen = 999)

graphics.off()







library(sp)

library(gstat)

data(meuse)

dataset= meuse

set.seed(999)



# Split Meuse Dataset into Training and HoldOut Sample datasets

Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))



Training_sample = dataset[Training_ids,]

Holdout_sample_allvars = dataset[-Training_ids,]



holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])

Hold_out_sample = holdoutvars_df[-Training_ids,]



coordinates(Training_sample) <- c('x','y')

coordinates(Hold_out_sample) <- c('x','y')



# Semivariogram modeling

m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)

m <- vgm("Exp")

m <- fit.variogram(m1, m)





# Apply Linear regression to Training dataset

train_model <- lm(log(zinc)~lead+copper+elev+dist, Training_sample)

prediction_training_data <- expm1(predict(train_model,newdata =Training_sample ))



# Apply Linear Regression to Hold Out dataset

prediction_holdout_data <- expm1(predict(train_model,newdata =Hold_out_sample ))



# Computing Predictive errors for Training and Hold Out samples respectively

training_prediction_error_term <- Training_sample$zinc - prediction_training_data

holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error

mae <- function(error)

{

  mean(abs(error))

}



# Mean Absolute Error metric :

# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set

print(mae(training_prediction_error_term)) #Error for Training sample set

print(mae(holdout_prediction_error_term)) #Error for Hold out sample set





________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]>>
Sent: November 21, 2017 6:36 PM
To: Li Jin; Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: RE: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



BTW, to your question, the first MAE is measuring the goodness of fit, the second measuring the predictive accuracy. The second paper below has partially address this.

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Li Jin
Sent: Wednesday, 22 November 2017 12:22 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]>
Subject: [DKIM] Re: [R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Although regression models are transparent, their predictive accuracy is poor in many cases, especially in environmental modelling, because of non-linear relationships and interactions. If your modelling purpose is to generate spatial predictions, I would suggest try spm first.
As to the assessment of predictive models, MAE has its limitations and you may be interested in https://doi.org/10.1016/j.envsoft.2016.02.004 and https://doi.org/10.1371/journal.pone.0183250.

From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 12:13 PM
To: Li Jin; [hidden email]<mailto:[hidden email]>
Subject: Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:[hidden email]]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: [hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]<mailto:[hidden email]%3cmailto:[hidden email]>>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo