Using stsls() with predict.sarlm()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Using stsls() with predict.sarlm()

Banabak, Selim
Dear All,


I am using the "spatialreg" package to estimate a spatial autoregressive model for a large number of observations (~75.000) and would thus prefer to use stsls() over lagsarlm() to gain estimation speed. In a second step however, I want to use the model for out of sample prediction with predict.sarlm().

As predict.sarlm() is not designed for stsls() outputs I tried to modify the stsls() function outputs in a way that the prediction function would be able to handle them.


Now I have two question:

- Is it, from a theoretical point of view, legitimate to just use the IV estimates instead of the ML estimates for the predictors described in

Goulard et al. (2017)?

- Is the following modification of the original stsls() function in combination with the predict.sarlm() an acceptable approach to implement the prediction methods suggested by Goulard et al. (2017) with an IV estimator? (Unfortunately I could not come up with a way to test this myself)



Modifications + test predictions in R:

[modifications are marked with "## additional outputs start ##" and  "## additional outputs end ##" ]



library("spdep")
library("spatialreg")

# modify stsls function outputs
stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
                  na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
  if (!inherits(listw, "listw"))
    stop("No neighbourhood list")

  if (is.null(zero.policy))
    zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
  stopifnot(is.logical(zero.policy))
  if (!inherits(formula, "formula")) formula <- as.formula(formula)
  mt <- terms(formula, data = data)
  mf <- lm(formula, data, na.action=na.action, method="model.frame")
  na.act <- attr(mf, "na.action")
  if (!is.null(na.act)) {
    subset <- !(1:length(listw$neighbours) %in% na.act)
    listw <- subset(listw, subset, zero.policy=zero.policy)
  }

  y <- model.extract(mf, "response")
  if (any(is.na(y))) stop("NAs in dependent variable")
  X <- model.matrix(mt, mf)
  if (any(is.na(X))) stop("NAs in independent variable")
  if (robust) {
    if (is.null(HC)) HC <- "HC0"
    if (!any(HC %in% c("HC0", "HC1")))
      stop("HC must be one of HC0, HC1")
  }
  # modified to pass zero.policy Juan Tomas Sayago 100913
  Wy <- lag.listw(listw, y, zero.policy=zero.policy)
  dim(Wy) <- c(nrow(X),1)
  colnames(Wy) <- c("Rho")

  #    WX <- lag.listw(W,X[,2:ncol(X)])
  n <- NROW(X)
  m <- NCOL(X)
  xcolnames <- colnames(X)
  K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
  if (m > 1) {
    WX <- matrix(nrow=n, ncol=(m-(K-1)))
    if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
    for (k in K:m) {
      wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
      if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
      if (any(is.na(wx)))
        stop("NAs in lagged independent variable")
      WX[,(k-(K-1))] <- wx
      if(W2X) WWX[, (k - (K - 1))] <- wwx
    }
    if(W2X) inst <- cbind(WX, WWX)
    else inst <- WX
  }
  if (K == 2 && listw$style != "W") {
    # modified to meet other styles, email from Rein Halbersma
    wx1 <- as.double(rep(1, n))
    wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
    if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
    if (m > 1) {
      inst <- cbind(wx, inst)
      if(W2X) inst <- cbind(wwx, inst)
    } else {
      inst <- matrix(wx, nrow=n, ncol=1)
      if(W2X) inst <- cbind(inst, wwx)
    }
    #        colnames(inst) <- xcolnames

  }
  #    if (listw$style == "W") colnames(WX) <- xcolnames[-1]
  result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
                 legacy=legacy)
  result$zero.policy <- zero.policy
  result$robust <- robust
  if (robust) result$HC <- HC
  result$legacy <- legacy
  result$listw_style <- listw$style
  result$call <- match.call()

  ## additional outputs ##
  result$rho <- result$coefficients[1]
  result$coefficients <- result$coefficients[2:length(result$coefficients)]
  result$type <- "lag"
  result$X <- X
  result$y <- y
  result$s2<-result$sse / result$df

  ## additional outputs end ##

  class(result) <- "stsls"
  result
}


# update original function in spatialreg
tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
environment(stsls2) <- environment(tmpfun)
attributes(stsls2) <- attributes(tmpfun)
assignInNamespace("stsls", stsls2, ns="spatialreg")


#-# test prediction with updated function

# data
data(oldcol)

# set seed
set.seed(123)

# draw observations to use as test data
outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)

# create train and test dataframes
S.sample <- COL.OLD[-outsamp, ] # train
O.sample <- COL.OLD[outsamp, ] # test

S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train

# run regression
testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
                     nb2listw(S.sample.nb, style="W"))

# in and outsample prediction
S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
                                       pred.type = "TC", power = FALSE)


O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
                                           listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)

O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
                                           listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)




I would be very glad to get some feedback on both the general idea and the implementation!


Best regards,

Selim Banabak


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Using stsls() with predict.sarlm()

Roger Bivand
Administrator
On Mon, 13 Jul 2020, Banabak, Selim wrote:

> Dear All,
>
>
> I am using the "spatialreg" package to estimate a spatial autoregressive
> model for a large number of observations (~75.000) and would thus prefer
> to use stsls() over lagsarlm() to gain estimation speed. In a second
> step however, I want to use the model for out of sample prediction with
> predict.sarlm().
>
> As predict.sarlm() is not designed for stsls() outputs I tried to modify
> the stsls() function outputs in a way that the prediction function would
> be able to handle them.
>
>
> Now I have two question:
>
> - Is it, from a theoretical point of view, legitimate to just use the IV
>   estimates instead of the ML estimates for the predictors described in
>   Goulard et al. (2017)?

You would need to go through the underlying equations. Just using the
point estimates of the coefficients and their standard errors is one
approach, but there are few studies beyond Goulard et al. Maybe look at
Suesse and co-author https://doi.org/10.1016/j.csda.2017.11.004,
https://doi.org/10.1080/00949655.2017.1286495 (also ML - much of the
STSLS/GMM literature avoids looking at important problems).

While estimating and fitting larger data sets is more time-consuming with
ML, using LU or Cholesky decomposition is practical when the spatial
weights are sparse (this applies to STSLS too). Predicting is also
constrained when n is large, as inverting the nxn matrix may be needed.

Check whether Stata knows how to predict from STSLS, and check usage of
sphet::spreg rather than spatialreg::stsls. Consider contacting the
maintainer of sphet if there is no response on this list, as it would make
more sense to explore predict methods for more modern sphet
implementations than legacy ones in spatialreg.

>
> - Is the following modification of the original stsls() function in
>   combination with the predict.sarlm() an acceptable approach to
>   implement the prediction methods suggested by Goulard et al. (2017)
>   with an IV estimator? (Unfortunately I could not come up with a way to
>   test this myself)
>

You would need to do the matrix math separately - maybe contacting the
author of the spatialreg implementations, Martin Gubri, would also make
sense.

Interesting topic!

Roger

>
>
> Modifications + test predictions in R:
>
> [modifications are marked with "## additional outputs start ##" and  "## additional outputs end ##" ]
>
>
>
> library("spdep")
> library("spatialreg")
>
> # modify stsls function outputs
> stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
>                  na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
>  if (!inherits(listw, "listw"))
>    stop("No neighbourhood list")
>
>  if (is.null(zero.policy))
>    zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
>  stopifnot(is.logical(zero.policy))
>  if (!inherits(formula, "formula")) formula <- as.formula(formula)
>  mt <- terms(formula, data = data)
>  mf <- lm(formula, data, na.action=na.action, method="model.frame")
>  na.act <- attr(mf, "na.action")
>  if (!is.null(na.act)) {
>    subset <- !(1:length(listw$neighbours) %in% na.act)
>    listw <- subset(listw, subset, zero.policy=zero.policy)
>  }
>
>  y <- model.extract(mf, "response")
>  if (any(is.na(y))) stop("NAs in dependent variable")
>  X <- model.matrix(mt, mf)
>  if (any(is.na(X))) stop("NAs in independent variable")
>  if (robust) {
>    if (is.null(HC)) HC <- "HC0"
>    if (!any(HC %in% c("HC0", "HC1")))
>      stop("HC must be one of HC0, HC1")
>  }
>  # modified to pass zero.policy Juan Tomas Sayago 100913
>  Wy <- lag.listw(listw, y, zero.policy=zero.policy)
>  dim(Wy) <- c(nrow(X),1)
>  colnames(Wy) <- c("Rho")
>
>  #    WX <- lag.listw(W,X[,2:ncol(X)])
>  n <- NROW(X)
>  m <- NCOL(X)
>  xcolnames <- colnames(X)
>  K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
>  if (m > 1) {
>    WX <- matrix(nrow=n, ncol=(m-(K-1)))
>    if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
>    for (k in K:m) {
>      wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
>      if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
>      if (any(is.na(wx)))
>        stop("NAs in lagged independent variable")
>      WX[,(k-(K-1))] <- wx
>      if(W2X) WWX[, (k - (K - 1))] <- wwx
>    }
>    if(W2X) inst <- cbind(WX, WWX)
>    else inst <- WX
>  }
>  if (K == 2 && listw$style != "W") {
>    # modified to meet other styles, email from Rein Halbersma
>    wx1 <- as.double(rep(1, n))
>    wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
>    if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
>    if (m > 1) {
>      inst <- cbind(wx, inst)
>      if(W2X) inst <- cbind(wwx, inst)
>    } else {
>      inst <- matrix(wx, nrow=n, ncol=1)
>      if(W2X) inst <- cbind(inst, wwx)
>    }
>    #        colnames(inst) <- xcolnames
>
>  }
>  #    if (listw$style == "W") colnames(WX) <- xcolnames[-1]
>  result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
>                 legacy=legacy)
>  result$zero.policy <- zero.policy
>  result$robust <- robust
>  if (robust) result$HC <- HC
>  result$legacy <- legacy
>  result$listw_style <- listw$style
>  result$call <- match.call()
>
>  ## additional outputs ##
>  result$rho <- result$coefficients[1]
>  result$coefficients <- result$coefficients[2:length(result$coefficients)]
>  result$type <- "lag"
>  result$X <- X
>  result$y <- y
>  result$s2<-result$sse / result$df
>
>  ## additional outputs end ##
>
>  class(result) <- "stsls"
>  result
> }
>
>
> # update original function in spatialreg
> tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
> environment(stsls2) <- environment(tmpfun)
> attributes(stsls2) <- attributes(tmpfun)
> assignInNamespace("stsls", stsls2, ns="spatialreg")
>
>
> #-# test prediction with updated function
>
> # data
> data(oldcol)
>
> # set seed
> set.seed(123)
>
> # draw observations to use as test data
> outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)
>
> # create train and test dataframes
> S.sample <- COL.OLD[-outsamp, ] # train
> O.sample <- COL.OLD[outsamp, ] # test
>
> S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train
>
> # run regression
> testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
>                     nb2listw(S.sample.nb, style="W"))
>
> # in and outsample prediction
> S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
>                                       pred.type = "TC", power = FALSE)
>
>
> O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
>                                           listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)
>
> O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
>                                           listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)
>
>
>
>
> I would be very glad to get some feedback on both the general idea and the implementation!
>
>
> Best regards,
>
> Selim Banabak
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway