88GEOSTATS: large dataset in geographic coordinates

  • Timothy H. Keitt
    Feb 10, 1997

      I would like some advise on tools for doing spatial analysis on two
      large, multivariate spatial datasets. (I've checked several of the
      packages listed in the ai-geostats home page, but its not clear if they
      will do what I want.) Both of the datasets are in the form of text
      files and the spatial locations are given in geographic (lat-lon)
      coordinates. There are more than 3,000 points in each set and the
      coverage is most of North America. I would like to do an analysis of
      spatial autocorrelation on both datasets, and an analysis of their
      cross-correlation, i.e., to test the hypothesis that one of the
      datasets is influencing the other.

      The main stumbling block has been calculating the distances among the
      lat-lon coordinate pairs. I have been using a combination of PERL and
      the "geod" program from the PROJ.4 distribution. Unfortunately,
      "geod" is written in such a way that it is extremely difficult to call
      repeatedly from within PERL. (If someone could provide documentation
      for the PROJ.4 library routines, I would consider encapsulating them
      in a perl module.) I can't just dump all the pair-wise comparisons
      and then run them through "geod" because it would require at least 2GB
      to store the intermediate data. (Probably more, I ran out of
      memory/disk space long before it finished.)

      So here are a couple of questions:

      1) If I only need +/-1km precision in my distances, is there an
      alternative to PROJ.4, i.e., a simple formula for calculating the

      2) Can I transform the data points into some Cartesian coordinate
      system and then use simple linear distances? PROJ.4 has many planar
      projections, but its not clear to me that distances wouldn't become
      distorted over an area the size of North America.

      3) Are there any (free Unix ;-) geostats programs that work in
      geographic coordinates and can process large data sets?

      Finally, are there standard methods to test for spatial dependence of
      one dataset on another?


