
Hi rsiggeo,
I am looking at the spatial distribution of poor households in a region
comprising a gradient of urbanrural postcodes. The data are counts and
they fit a negative binomial distribution, rather than a poisson
distribution.
I am applying DCluster (ver. 0.13, windows) and would be grateful for
advice on a few topics.
1. GAM
A) As I understand it the default setting is based on a poisson
distribution. This creates some not implausible clusters, but I wonder
whether I could set the opgam, so that it uses a negative binomial
distribution (for which I have the parameters for the ?disease?
variable; size and mu) or to use a bootstrap procedure instead. Some of
the internal functions, like opgam.iscluster.negbin, seem to support
this, but I am uncertain about how to incorporate them.
B) To reduce the multiple testing problem (Waller & Gotway 2004,
?Applied Spatial Statistics for Public Health Data?, Wiley, p.208) I
wonder whether to set radius to <50% of step size, e.g. 100m radius in a
300m grid, so that the smallest circles won't touch?
2. BesagNewell
I am getting results with ?poisson? (almost everything becomes a cluster
 possibly because the sites are clumped and not randomly distributed)
and with ?permutation?, but wonders how the ?negbin? is used? Not like
this:
> bnresults<opgam(pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,
+ iscluster=bn.iscluster, set.idxorder=TRUE, k=20, model="negbin",
+ R=100, mle=calculate.mle(pcpoor) )
> > Error in rnbinom(n, size, prob) : invalid arguments
3. Kulldorff & Nagarwalla
Again I struggle with the parameters. Not like this:
> #K&N's method over the centroids
> mle<calculate.mle(pcpoor, model="negbin")
> > Error in while (((abs(m  m0) > tol * (m + m0))  (abs(v  v0) > tol
* :
missing value where TRUE/FALSE needed
> knresults<opgam(data=pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,
+ iscluster=kn.iscluster, fractpop=.5, R=100, model="negbin", mle=mle)
> > Error in rnbinom(n, size, prob) : invalid arguments
4. Turnbull. Is Turnbull analysis possible in DCluster yet?. Some
references in the manual, but haven?t been able to locate it.
5. General
A) I am considering increasing the study area (p.t. working with 1262
postcode points) and wonder what the limits might be for a desktop pc. I
gather that the distance matrices (created by tripack or spdep) could be
a limiting factor? Would it be an idea to run this step first and once
the table is created run the cluster detection algorithm?
B) I wonder whether permutations always are superior to standard stats.
Distributions, and if not, then why not?
Best wishes, Jakob
Jakob Petersen
GISc student (MSc)
Birkbeck, University of London
