Data provided at NASS Quick Stats is very useful in understanding the history and state of agricultural production at aggregate levels (county and state). Also, they have been used widely in various studies, including the estimation of climate change impacts on crop yield. In this blog post, I will show how to download NASS Quick Stats data from within R using the rnassqs package.
While their website is useful, they allows you to download \(50,000\) records (rows of data) at a time. This can be quite annoying when your target dataset is much larger, which means you have to do manual queries (lots of clicking) many many times. Now, rnassqs makes use of the Quick Stats API which puts the same limitation of the maximum of \(50,000\) records at a time. This of course means that rnassqs is also bound by the limitation. However, since it’s an R program, you can easily write a loop in a way that each download involves less than \(50,000\) records, obviating the need to manually implementing queries on the website repeatedly. But, before we get started, I need to mention that you can visit their ftp website to bulk-download the entire data available at the website. So, if you do not mind a large data sitting on your computer, you should stop reading and just go ahead and download the entire dataset.
At the time of writing this post, the package is not available on the CRANs. So, here is how you install the package: 1) load the devtools package first (install it first if you have not) and 2) install the rnassqs package as follows:
library(devtools)
install_github('potterzot/rnassqs')
Let’s load all the libraries we use in this blog post:
#--- rnassqs package ---#
library(rnassqs)
#--- other packages ---#
library(magrittr) # for code piping
library(data.table)
## No encoding supplied: defaulting to UTF-8.
We first set the query parameters. Here I would like to have a data of corn yield (bu/acre) at the county level in 1995.
#--- set the query parameters ---#
params <- list(
'year__GE'=1995,
'agg_level_desc'='COUNTY',
'short_desc'='CORN, GRAIN - YIELD, MEASURED IN BU / ACRE'
)
We then can use the nassqs() function to download the data we want. But, before doing that you need to obtain the API key from here.
#--- download the data ---#
corn_all <- nassqs(params=params, key='your key comes here') %>%
data.table()
Here is a look at some of the variables of the dataset we just downloaded.
corn_all %>%
setnames(names(.),tolower(names(.))) %>%
.[,.(year,state_name,state_alpha,state_fips_code,county_name,agg_level_desc,short_desc,value,unit_desc)]
## year state_name state_alpha state_fips_code
## 1: 2017 ALABAMA AL 01
## 2: 2016 ALABAMA AL 01
## 3: 2015 ALABAMA AL 01
## 4: 2014 ALABAMA AL 01
## 5: 2013 ALABAMA AL 01
## ---
## 45177: 1999 WYOMING WY 56
## 45178: 1998 WYOMING WY 56
## 45179: 1997 WYOMING WY 56
## 45180: 1996 WYOMING WY 56
## 45181: 1995 WYOMING WY 56
## county_name county_code agg_level_desc
## 1: COLBERT 033 COUNTY
## 2: COLBERT 033 COUNTY
## 3: COLBERT 033 COUNTY
## 4: COLBERT 033 COUNTY
## 5: COLBERT 033 COUNTY
## ---
## 45177: OTHER (COMBINED) COUNTIES 998 COUNTY
## 45178: OTHER (COMBINED) COUNTIES 998 COUNTY
## 45179: OTHER (COMBINED) COUNTIES 998 COUNTY
## 45180: OTHER (COMBINED) COUNTIES 998 COUNTY
## 45181: OTHER (COMBINED) COUNTIES 998 COUNTY
## short_desc value unit_desc
## 1: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 195.3 BU / ACRE
## 2: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 126.2 BU / ACRE
## 3: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 159.1 BU / ACRE
## 4: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 191.5 BU / ACRE
## 5: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 182.2 BU / ACRE
## ---
## 45177: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 0 BU / ACRE
## 45178: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 90 BU / ACRE
## 45179: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 110 BU / ACRE
## 45180: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 94 BU / ACRE
## 45181: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 95 BU / ACRE
Simple loop to get more years of corn data or other crops
It is very simple to get more years of county-level corn data by looping.
get_nass_data <- function(y){
params <- list(
'year__GE'=y,
'agg_level_desc'='COUNTY',
'short_desc'='CORN, GRAIN - YIELD, MEASURED IN BU / ACRE'
)
corn_all <- nassqs(params=params, key='your key comes here') %>%
data.table()
return(corn_all)
}
corn_data <- lapply(1995:2015,get_nass_data) %>%
rbindlist()
It should be clear by now how you can loop over crop types if you want other crops than corn.
Session Information
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.11.8 magrittr_1.5 rnassqs_0.2.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 bookdown_0.7 digest_0.6.18 rprojroot_1.3-2
## [5] R6_2.3.0 jsonlite_1.5 backports_1.1.2 evaluate_0.12
## [9] httr_1.3.1 blogdown_0.9 stringi_1.2.4 rmarkdown_1.10
## [13] tools_3.5.1 stringr_1.3.1 xfun_0.4 yaml_2.2.0
## [17] compiler_3.5.1 htmltools_0.3.6 knitr_1.20