Downloading data from the NASS Quick Stats website using R

Aug 18, 2018 software R, Federal data

Data provided at NASS Quick Stats is very useful in understanding the history and state of agricultural production at aggregate levels (county and state). Also, they have been used widely in various studies, including the estimation of climate change impacts on crop yield. In this blog post, I will show how to download NASS Quick Stats data from within R using the rnassqs package.

While their website is useful, they allows you to download $50, 000$ records (rows of data) at a time. This can be quite annoying when your target dataset is much larger, which means you have to do manual queries (lots of clicking) many many times. Now, rnassqs makes use of the Quick Stats API which puts the same limitation of the maximum of $50, 000$ records at a time. This of course means that rnassqs is also bound by the limitation. However, since it’s an R program, you can easily write a loop in a way that each download involves less than $50, 000$ records, obviating the need to manually implementing queries on the website repeatedly. But, before we get started, I need to mention that you can visit their ftp website to bulk-download the entire data available at the website. So, if you do not mind a large data sitting on your computer, you should stop reading and just go ahead and download the entire dataset.

At the time of writing this post, the package is not available on the CRANs. So, here is how you install the package: 1) load the devtools package first (install it first if you have not) and 2) install the rnassqs package as follows:

library(devtools)
install_github('potterzot/rnassqs')

Let’s load all the libraries we use in this blog post:

#--- rnassqs package ---#
library(rnassqs)

#--- other packages ---#
library(magrittr) # for code piping
library(data.table)

## No encoding supplied: defaulting to UTF-8.

We first set the query parameters. Here I would like to have a data of corn yield (bu/acre) at the county level in 1995.

#--- set the query parameters ---#
params <- list(
    'year__GE'=1995, 
  'agg_level_desc'='COUNTY',
  'short_desc'='CORN, GRAIN - YIELD, MEASURED IN BU / ACRE'
  )

We then can use the nassqs() function to download the data we want. But, before doing that you need to obtain the API key from here.

#--- download the data ---#
corn_all <- nassqs(params=params, key='your key comes here') %>% 
    data.table()

Here is a look at some of the variables of the dataset we just downloaded.

corn_all %>% 
    setnames(names(.),tolower(names(.))) %>% 
    .[,.(year,state_name,state_alpha,state_fips_code,county_name,agg_level_desc,short_desc,value,unit_desc)]

##        year state_name state_alpha state_fips_code
##     1: 2017    ALABAMA          AL              01
##     2: 2016    ALABAMA          AL              01
##     3: 2015    ALABAMA          AL              01
##     4: 2014    ALABAMA          AL              01
##     5: 2013    ALABAMA          AL              01
##    ---                                            
## 45177: 1999    WYOMING          WY              56
## 45178: 1998    WYOMING          WY              56
## 45179: 1997    WYOMING          WY              56
## 45180: 1996    WYOMING          WY              56
## 45181: 1995    WYOMING          WY              56
##                      county_name county_code agg_level_desc
##     1:                   COLBERT         033         COUNTY
##     2:                   COLBERT         033         COUNTY
##     3:                   COLBERT         033         COUNTY
##     4:                   COLBERT         033         COUNTY
##     5:                   COLBERT         033         COUNTY
##    ---                                                     
## 45177: OTHER (COMBINED) COUNTIES         998         COUNTY
## 45178: OTHER (COMBINED) COUNTIES         998         COUNTY
## 45179: OTHER (COMBINED) COUNTIES         998         COUNTY
## 45180: OTHER (COMBINED) COUNTIES         998         COUNTY
## 45181: OTHER (COMBINED) COUNTIES         998         COUNTY
##                                        short_desc value unit_desc
##     1: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 195.3 BU / ACRE
##     2: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 126.2 BU / ACRE
##     3: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 159.1 BU / ACRE
##     4: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 191.5 BU / ACRE
##     5: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE 182.2 BU / ACRE
##    ---                                                           
## 45177: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE     0 BU / ACRE
## 45178: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE    90 BU / ACRE
## 45179: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE   110 BU / ACRE
## 45180: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE    94 BU / ACRE
## 45181: CORN, GRAIN - YIELD, MEASURED IN BU / ACRE    95 BU / ACRE

Simple loop to get more years of corn data or other crops

It is very simple to get more years of county-level corn data by looping.

get_nass_data <- function(y){
    params <- list(
        'year__GE'=y, 
    'agg_level_desc'='COUNTY',
    'short_desc'='CORN, GRAIN - YIELD, MEASURED IN BU / ACRE'
  )

  corn_all <- nassqs(params=params, key='your key comes here') %>% 
        data.table()
    
    return(corn_all)
}

corn_data <- lapply(1995:2015,get_nass_data) %>% 
    rbindlist()

It should be clear by now how you can loop over crop types if you want other crops than corn.

Session Information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.11.8 magrittr_1.5      rnassqs_0.2.2    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0      bookdown_0.7    digest_0.6.18   rprojroot_1.3-2
##  [5] R6_2.3.0        jsonlite_1.5    backports_1.1.2 evaluate_0.12  
##  [9] httr_1.3.1      blogdown_0.9    stringi_1.2.4   rmarkdown_1.10 
## [13] tools_3.5.1     stringr_1.3.1   xfun_0.4        yaml_2.2.0     
## [17] compiler_3.5.1  htmltools_0.3.6 knitr_1.20