Getting averaged variable importance after bootsrapping

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Getting averaged variable importance after bootsrapping

Hounkpatin Ozias
Dear All,



I am using a Bootstrapping approach with the Cubist model like the example
below for 3 bootsraps.



library(Cubist)

library(ithir) # install.packages("ithir", repos="
http://R-Forge.R-project.org <http://r-forge.r-project.org/>")

library(caret)



# Point data

data(HV_subsoilpH)



# subset data for modeling

training <- sample(nrow(HV_subsoilpH), 0.7 * nrow(HV_subsoilpH))

cDat <- HV_subsoilpH[training, ]



# Number of bootstraps

nbag <- 3



# Fit cubist models for each bootstrap

for (i in 1:nbag) {

  trainingREP <- sample.int(nrow(cDat), 1.0 * nrow(cDat),replace = TRUE)

  fit_cubist <- cubist(x = cDat[trainingREP, 4:ncol(cDat),

                       y = cDat$pH60_100cm[trainingREP],
cubistControl(rules = 5,

                                       extrapolation = 5), committees = 3)

}



It is possible to get the variable importance (percentages  in variable
usage in the models) after running the models. Because of the random
sampling at each run of the model, the variable importance is different. A
robust estimate may be determined by taking the average of all the
percentages of usage for each specific variable involved in the models.



However using varImp(fit_cubist) only gives variable importance for the
final model.



Is there anyway to extract the variable importance for each model and
arrange them finally to get the final table as presented below? I am
actually running the models 100 times. It is possible to do  it manually by
saving to file each model and then calls  in each model but the workload is
too high when you have 100 number of bootstraps.



 The final table I am expecting should like the table below.


Expected table-----------------------------------------------

*Variables*

*VImp*

*cubist.type*

MRVBF

62.5

Vimp.cubist1

AACN

72.5

Vimp.cubist1

NDVI

41

Vimp.cubist1

MRVBF

81.5

Vimp.cubist2

Elevation

48.5

Vimp.cubist2

NDVI

62.5

Vimp.cubist2

MRVBF

78

Vimp.cubist3

Hillshading

62

Vimp.cubist3

TWI

57

Vimp.cubist3



I will appreciate any help.



Best Regards,

Ozias

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo