For a long period of time, the ability for individuals and organizations to analyze geospatial data was limited to those who could afford expensive software (such as TerrSet, ERDAS, ENVI, or ArcGIS). R has emerged as an alternative to these expensive software packages. R is an open source statistical programming language that enables users to create reproducible, easy to understand, and efficient workflows. Its widespread use has inspired many individuals (many, if not most of whom are researchers and data scientists) to create packages that expand the capabilities of the language- including in GIS and raster image analysis. This project highlights some of the abilities of the R programming language to work with geospatial data- all made possible through these packages. Specifically, the focus will be on the data cleaning, interpolation, and advanced analysis of time-series data.
This tutorial, developed collaboratively between myself and Priyanka Verma, seeks to gain insights from particulate matter air pollutant trends in the New York Metropolitan Area. Specifically, we look at particulate matter that is less than 2.5 micrometers (PM2.5), and only data obtained during winter months, due to the higher level of PM2.5 during that season.
Our study area includes part of the New-York Metropolitan Area, including New York City, Long Island, counties in Upstate New York, and large portions of New Jersey. These areas were determined by the New York Core Based Statistical Area (NY CBSA).
PM2.5 data has been provided by the Environmental Protection Agency (EPA), and can be found through the open data portal: https://www.epa.gov/outdoor-air-quality-data/download-daily-data.
The CBSA boundary shapefile can be obtained from the United States Census Bureau.
The full tutorial has been hosted on GitHub, which I have found to be a much easier way of sharing code and code output. You can find the tutorial at https://freygeospatial.github.io/PM25-TimeSeries-R-Tutorial/, which contains the code and output in an easy-to-follow format. The main repository can be found at https://github.com/verma-priyanka/PM25_NYC, which contains the actual files (including raw data) used in our analysis.