Fishes of Texas Project Documentation

SDM Class 02

Models available through the Modeling section of the FoTX website include Model class 01 and 02. Future derivations and alternate model versions will be indicated as subsequent classes. Below are specific methods of construction for model class 02.

What's different with this new model class?

Model class 02 includes only models for Species of Greatest Conservation Need (SGCN) within the Texas Conservation Action Plan. Updates to the modeling construction and analysis methodology include incorporation of i.) additional of records through an updated Fishes of Texas database, ii.) additional and more hydrologically relevant environmental covariates (see Table below), and iii.) a method to account for survey sampling bias. Additional details regarding model production methodology and the use and interpretation of these models are provided below.

Suggested Interpretation

Species Distribution Models (SDMs) predict the potential geographic distribution of a species based on occurrence points of a species and predictive environmental variables; they are sometimes interpreted as approximating the ecological niche for that species (Guisan & Thuiller 2005). For the last decade SDMs have often been constructed using machine-learning algorithms that use the co-occurrence of species occurrence points and environmental data to predict the environmental conditions in which a species is likely to occur. These are then projected back to geographical space to obtain the potential distribution. When constraints on dispersal due to geography or behavior (Margules & Sarkar 2007) are taken into account in model development, the realized distributions are predicted (Pawar et al. 2007).



This technique converts disparate occurrence records into continuous probabilities of occurrence that predict habitat suitability. SDMs are therefore more amenable to diverse mathematical analyses as performed in geographical information systems than are the raw occurrence data (Guisan & Thuiller 2005), which typically lack proper temporal and spatial representation for direct use in most comparative or trend analyses commonly used for assessing changes in biological communities. Through the incorporation of these disparate and temporally diverse historical occurrence data with environmental variables accounting for only broad-scale physiological and biogeographical constraints, we propose that SDMs constructed as described here provide a robust and quantifiable estimation of historical habitat suitability (Labay et al. 2011).

Some things to keep in mind while using and viewing models:

  • The models do not directly incorporate anthropogenic influence such as dams or land use. Model results should be carefully interpreted with this in mind; for example, modeling results for a species found primarily in lotic habitat that indicate high probabilities within a reservoir should be viewed as potential occurrence probability in the absence of reservoir conditions.
  • The models do not directly incorporate biotic interactions.
  • The model images (jpgs) available through this site (here) only display the occurrence probability range of 0.5 to 1, which generally indicates high probability of occurrence. To view the full range of probabilities, the ascii file must be downloaded, incorporated into a GIS and symbolized as desired. It must be noted that the symbology (e.g., range of values shown, whether displayed as categories or stretched, color scale) used to view the models has a large influence on interpretation. We display only high probabilities (0.5 - 1) in an effort to highlight the modeling technique's capabilities in identifying primary suitable habitat.


Model Construction

The general construction protocol that was used for models has been previously published (Sarkar et al. 2010, Labay et al. 2011), so the description here will be cursory. A wide variety of machine-learning algorithms have been used for SDM construction (reviewed in Elith et al. 2006). This project used a maximum entropy algorithm incorporated in the Maxent software package (Phillips et al. 2006; Phillips & Dudik 2008) because it directly provides probabilistic output (unlike the genetic algorithm of GARP (Stockwell 1999)) that can be used without further treatment for subsequent analyses, and because a variety of recent studies have concluded that its performance is superior to those of other methods (Elith et al. 2006, Wisz et al. 2008). Maxent was parameterized following published recommendations (Phillips et al. 2006), with models replicated 100 times withholding randomly in each replicate 40% of localities as "test" records, with the remaining 60% serving as model-training" records. Model performance was evaluated using a (threshold-independent) receiver operating characteristic (ROC) analysis and 11 internal binomial analyses of "training" and "test" occurrence omission. The ROC analysis characterizes model performance at all possible thresholds using the area under the curve (AUC), a measure of model performance independent of any threshold (Hanley & McNeil 1982). An optimal model with perfect discrimination would have an AUC of 1 while a model that predicted species occurrences at random would have an AUC of 0.5 (Hanley & McNeil 1982).

Biological and Environmental Data
Occurrence data input consists of FoTX records. Records with > one km potential georeferencing error (radius, see Georeferencing and Geographic Units) were excluded to assure input occurrences closely corresponded in spatial resolution to environmental layers used in modeling. This spatial error threshold of one km approximately matches the grid cell resolution of 30 arc-seconds (which approximates one km at the Equator), but is slightly larger than the longitudinal boundary of the average cell size (0.73 km2) due to geographic projection at the latitude of Texas. However, the maximum entropy algorithm used for analysis (see Model Construction above) has been shown not to be affected by spatial errors in occurrence datasets with standard deviations up to five km (Hernandez et al. 2006, Wisz et al. 2008). Occurrence records before 1950 were similarly excluded so that occurrence data were temporally congruent with climatic variables used (see Table 1 below). Finally, since model performance stabilizes with respect to accuracy of prediction at about 10 records when using the maximum entropy model construction algorithm (Phillips & Dudik 2008, Phillips et al. 2006), models were produced only for those species for which we had a minimum of 10 occurrences corresponding to at least 10 unique cells on the environmental layer grids.

SDM-Environmental Layers used in model class 02

Layer categoryDescriptionVariable codeSource
TopologicalSlopeslopeNational Hydrology Dataset V21
Topologicalcompound topological index (ln(acc.flow/tan[slope]))cti30-arc second DEM
Climateannual mean temperaturebio_1Wordclim variable 1
Climatemean diurnal range (mean of monthly (max temp - min temp))bio_2Wordclim variable 2
Climateisothermality (P2/P7)(*100)bio_3Wordclim variable 3
Climate(temperature seasonality (sd *100)bio_4Wordclim variable 4
Climatemax temperature of warmest monthbio_5Wordclim variable 5
Climatemin temperature of coldest monthbio_6Wordclim variable 6
Climatetemperature annual range (P5-P6)bio_7Wordclim variable 7
ClimateMean Temperature of Wettest Quarterbio_8Wordclim variable 8
ClimateMean Temperature of Driest Quarterbio_9Wordclim variable 9
ClimateMean Temperature of Warmest Quarterbio_10Wordclim variable 10
ClimateMean Temperature of Coldest Quarterbio_11Wordclim variable 11
Climateannual precipitationbio_12Wordclim variable 12
Climateprecipitation of wettest monthbio_13Wordclim variable 13
Climateprecipitation of driest monthbio_14Wordclim variable 14
Climateprecipitation seasonality (coefficient of variation)bio_15Wordclim variable 15
Climateprecipitation of wettest quarterbio_16Wordclim variable 16
Climateprecipitation of driest quarterbio_17Wordclim variable 17
Climateprecipitation of warmest quarterbio_18Wordclim variable 18
Climateprecipitation of coldest quarterbio_19Wordclim variable 19
Geographicfresh water ecoregionfeowThe Nature Conservancy
Hydrologicupstream distance (arbolate sum)arbolatesuNational Hydrology Dataset V21
Hydrologicmaximum elevationmaxelevsmoNational Hydrology Dataset V21
Hydrologicdistance to Gulf of MexicopathlengthNational Hydrology Dataset V21
Hydrologicpotential evapotranspirationpet0001National Hydrology Dataset V21
Hydrologicannual precipitation of catchmentppt0001National Hydrology Dataset V21
Hydrologicannual flow with reference gage regression appliedq0001cNational Hydrology Dataset V21
Hydrologicmean runoff in area upstreamrunoffvcNational Hydrology Dataset V21
Hydrologicmodified Strahler Stream OrderStreamordeNational Hydrology Dataset V21
Hydrologicannual temperature at catchmenttemp0001National Hydrology Dataset V21
Hydrologictotal upstream cumulative drainage areatotdasqkmNational Hydrology Dataset V21
Hydrologicvelocity for q0001cv0001cNational Hydrology Dataset V21

References

Elith, J., C. H. Graham, R. P. Anderson, M. Dudik, S. Ferrier, et al. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

González, C., O. Wang, S. E. Strutz, C. González-Salazar, V. Sánchez-Cordero, et al. 2010. Climate change and risk of Leishmaniasis in North America: Predictions from ecological niche models of vector and reservoir species. PLoS Neglected Tropical Diseases 4: e585.

Guisan, A., and W. Thuiller. 2005. Predicting species distribution: offering more than simple habitat models. Ecology Letters 8:993-1009.

Hanley, J. A., and B. J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29.

Hernandez, P. A., C. H. Graham, L. L. Master, and D. L. Albert. 2006. The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773-785.

Illoldi-Rangel, P., T. Fuller, M. Linaje, C. Pappas, V. Sánchez-Cordero, et al. 2008. Solving the maximum representation problem to prioritize areas for the conservation of terrestrial mammals at risk in Oaxaca. Diversity and Distributions 14:493-508.

Labay, B. J., A. E. Cohen, B. Sissel, D. A. Hendrickson, F. D. Martin, and S. Sarkar. 2011. Assessing historical fish community composition using surveys, historical collection data, and species distribution models. PLoS ONE 6: e25145.

Labay, B. J., and D. A. Hendrickson. 2014. Final Report: Conservation assessment and mapping products for GPLCC priority fish taxa. Submitted to the United States Department of Interior, Fish & Wildlife Service, Great Plains Landscape Conservation Cooperative; The University of Texas at Austin, December 31st, 2014. (link).

Labay, B. J., D. A. Hendrickson, A. E. Cohen, T. H. Bonner, R. S. King, L. J. Kleinsasser, G. W. Linam, and K. O. Winemiller. 2015. Can species distribution models aid bioassessment when reference sites are lacking? Tests based on freshwater fishes. Environmental management 56(4):835-846.

Margules, C. R., and S. Sarkar. 2007. Systematic conservation planning. Cambridge University Press, Cambridge, UK

Pawar, S., M. S. Koo, C. Kelley, M. F. Ahmed, S. Chaudhuri, et al. 2007. Conservation assessment and prioritization of areas in Northeast India: priorities for amphibians and reptiles. Biological Conservation 136:346-361.

Phillips, S. J., R. P. Anderson, and P. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

Phillips, S. J., and M. Dudik. 2008. Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31:161-175.

Sarkar, S., V. Sánchez-Cordero, M. Londoño, and T. Fuller. 2009. Systematic conservation assessment for the Mesoamerica, Chocó, and Tropical Andes biodiversity hotspots: a preliminary analysis. Biodiversity and Conservation 18:1793-1828.

Sarkar, S., S. E. Strutz, D. M. Frank, C. Rivaldi, B. Sissel, et al. 2010. Chagas disease risk in Texas. PLoS Neglected Tropical Diseases 4: e836. Accessed 8 October 2010.

Stockwell, D. 1999. The GARP modelling system: problems and solutions to automated spatial prediction. International Journal of Geographical Information Science 13:143-158.

Wisz, M. S., R. J. Hijmans, J. Li, A. T. Peterson, C. H. Graham, et al. 2008. Effects of sample size on the performance of species distribution models. Diversity and Distributions 14:763-773.

Visit the site at www.fishesoftexas.org