Opioid Prescribing by State

A detailed guide on how I scraped opioid prescription history from the CDC and created an animated map to highlight changing behavior.

R
dataviz
scraping
maps
Author
Affiliation

Mirza S. Khan

Vanderbilt University

Published

January 28, 2021

For some reason, I really enjoy making maps as a way to illustrate state and regional variation. One area that has appropriately gotten a lot of press is opioid prescribing. The CDC is a great data resource, and maintains state and county-level prescribing data.

Unfortunately, I did not find that the data was available in an easily readable format, e.g. CSV, JSON, etc. What follows is a walk-through of how I went about scraping the CDC website to get this data.

Scraping

May I scrape?

It is good practice to check a site’s robots.txt to make sure that they are cool with you accessing and scraping their data. The polite package makes this pretty simple.

library(polite)

bow(url = "https://www.cdc.gov/",
    user_agent = "M Khan",
    force = TRUE)
<polite session> https://www.cdc.gov/
    User-agent: M Khan
    robots.txt: 44 rules are defined for 4 bots
   Crawl delay: 5 sec
  The path is scrapable for this user-agent

Test case with 2017

Now that I see that it is ok for me to scrape the data, I’d like to start out with a single case - the year 2017 - to test things out before replicating it for all available years.

Tools:

  • ralger package
  • SelectorGadget Chrome extension
  • Cmd+Shift+C to inspect elements in the browser
library(ralger)

opioidRx_2017 <- ralger::table_scrap("https://www.cdc.gov/drugoverdose/rxrate-maps/state2017.html")
Warning: The `fill` argument of `html_table()` is deprecated as of rvest 1.0.0.
An improved algorithm fills by default so it is no longer needed.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
# Tidy up the column names
col_names <- c("state", "state_abbrev", "opioid_Rx_rate")
colnames(opioidRx_2017) <- col_names 

opioidRx_2017 %>% 
  head() %>% 
  gt::gt()
state state_abbrev opioid_Rx_rate
United States US 59.0
Alaska AK 52.0
Alabama AL 108.8
Arkansas AR 106.1
Arizona AZ 60.8
California CA 39.8

Things seem reasonable for 2017, I can now apply the same process for all of the available years.

Scrape each year’s data

Now that we’ve got all of our links, we can perform the same function we did above with the 2017 for all the years.

Scrape all of the links:

opioid_df <- opioid_links %>% 
  mutate(data_df = map(link, possibly(ralger::table_scrap, otherwise = NULL, quiet = FALSE)))

I’d like to point out that I used possibly. This is to make sure that the mapping function is all for naught if there is an error of some sort. Check out the R4DS section on Iteration: Dealing with Failure, which also covers safely and quietly.

  • Explanation of possibly arguments used here:
    • otherwise = NULL, i.e. if there is a failure, it will assign a value of NULL.
    • quiet = FALSE because I wanted a verbose output to see what’s going on
# Notice how 'data_df' column is just a nested df
opioid_df %>% 
  head(3)
# A tibble: 3 × 3
   year link                                                        data_df 
  <dbl> <chr>                                                       <list>  
1  2006 https://www.cdc.gov/drugoverdose/rxrate-maps/state2006.html <tibble>
2  2007 https://www.cdc.gov/drugoverdose/rxrate-maps/state2007.html <tibble>
3  2008 https://www.cdc.gov/drugoverdose/rxrate-maps/state2008.html <tibble>

For each year, I have a corresponding tibble that contains state level opioid prescribing data. I will have to use unnest() to expand this.

complete_df <- opioid_df %>% 
  unnest(cols = c(data_df)) %>% 
  rename(
    "state" = "State",
    "state_abbrev" = "State Abbreviation",
    "opioid_Rx_rate" = "Opioid Dispensing Rate per 100"
  ) %>% 
  # Name of abbreviation column changed from pre- and post-2017
  mutate(state_abbrev = coalesce(state_abbrev, Abbreviation)) %>% 
  select(-Abbreviation, -link)

complete_df %>% 
  head() %>% 
  gt()
year state state_abbrev opioid_Rx_rate
2006 Alabama AL 115.6
2006 Alaska AK 63.4
2006 Arizona AZ 74.3
2006 Arkansas AR 98.3
2006 California CA 51.0
2006 Colorado CO 62.2

Mapping

Above, I outlined how I went about scraping the CDC website for opioid prescribing data. Now, I’ll describe how I took this data and made a map and used gganmiate to make a GIF that better shows the state-level trends from 2006-2019.

First, a blank canvas:

# library(sf)        # follow installation instructions at: https://r-spatial.github.io/sf/
# library(albersusa)
# library(gganimate)

ggplot() + geom_sf(data = albersusa::usa_sf()) +
  ggthemes::theme_map() + coord_sf(crs = albersusa::us_aeqd_proj)

You may notice that the relevant geometric information for each state is provided in albersusa::usa_sf(). Thus, we must map this to the data of interest with a join, e.g. *_join, merge, etc.

A map using 2006 data

I’ll start by making a map of prescribing rates for 2006.

usa_sf() %>% 
  inner_join(complete_df %>% filter(year == 2006), # filter for the year 2006
             by = c("iso_3166_2" = "state_abbrev")) %>% 
  ggplot() +
  geom_sf(aes(fill = opioid_Rx_rate)) +
  scale_fill_gradient2(name = "Opioid Rx rate",
                       low = "blue", high = "red", midpoint = median(complete_df$opioid_Rx_rate)) +
  ggthemes::theme_map() +
  coord_sf(crs = albersusa::us_aeqd_proj) +
  theme(legend.position = "right") +
  labs(title = "Opioid Prescribing Rates in 2006",
       subtitle = "Rates reported as retail opioid prescriptions per 100 persons",
       caption = "by Mirza Khan (@mirzakhan_) \n 
       Source: CDC National Center for Injury Prevention and Control")
old-style crs object detected; please recreate object with a recent sf::st_crs()

Animated Map

Now, I’d like to use gganimate to get a sense of how things are changing from 2006 to 2019, i.e. better, worse, the same?

This is simpler than one would think. All I added to the previous map were transition_states(year) and to the title, I replaced 2006 with {closest_state}.

#remotes::install_github("thomasp85/transformr")

usa_sf() %>% 
  inner_join(complete_df, by = c("iso_3166_2" = "state_abbrev")) %>% 
  ggplot() +
  geom_sf(aes(fill = opioid_Rx_rate)) +
  scale_fill_gradient2(name = "Opioid Rx rate",
                       low = "blue", high = "red", midpoint = median(complete_df$opioid_Rx_rate)) +
  ggthemes::theme_map() +
  coord_sf(crs = albersusa::us_aeqd_proj) +
  transition_states(year) +
  theme(legend.position = "right") +
  labs(title = "Opioid Prescribing Rates in {closest_state}",
       subtitle = "Rates reported as retail opioid prescriptions per 100 persons",
       caption = "by Mirza Khan (@mirzakhan_) \n 
       Source: CDC National Center for Injury Prevention and Control")
old-style crs object detected; please recreate object with a recent sf::st_crs()

#anim_save("opioid_Rx.gif")

Session Info

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.2 (2021-11-01)
 os       macOS Catalina 10.15.4
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-10-06
 pandoc   2.19.2 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 albersusa   * 0.4.1      2022-10-06 [1] Github (hrbrmstr/albersusa@07aa87f)
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 backports     1.4.1      2021-12-13 [1] CRAN (R 4.1.0)
 broom         0.7.12     2022-01-28 [1] CRAN (R 4.1.2)
 cachem        1.0.6      2021-08-19 [1] CRAN (R 4.1.0)
 cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.1.0)
 checkmate     2.0.0      2020-02-06 [1] CRAN (R 4.1.0)
 class         7.3-19     2021-05-03 [1] CRAN (R 4.1.2)
 classInt      0.4-8      2022-09-29 [1] CRAN (R 4.1.2)
 cli           3.4.1      2022-09-23 [1] CRAN (R 4.1.2)
 codetools     0.2-18     2020-11-04 [1] CRAN (R 4.1.2)
 colorspace    2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
 crayon        1.5.1      2022-03-26 [1] CRAN (R 4.1.2)
 curl          4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 DBI           1.1.3      2022-06-18 [1] CRAN (R 4.1.2)
 dbplyr        2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.0)
 dplyr       * 1.0.8      2022-02-08 [1] CRAN (R 4.1.2)
 e1071         1.7-11     2022-06-07 [1] CRAN (R 4.1.2)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
 fansi         1.0.3      2022-03-24 [1] CRAN (R 4.1.2)
 farver        2.1.1      2022-07-06 [1] CRAN (R 4.1.2)
 fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 forcats     * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)
 foreign       0.8-81     2020-12-22 [1] CRAN (R 4.1.2)
 fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.0)
 generics      0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
 gganimate   * 1.0.8      2022-09-08 [1] CRAN (R 4.1.2)
 ggplot2     * 3.3.6      2022-05-03 [1] CRAN (R 4.1.2)
 ggthemes      4.2.4      2021-01-20 [1] CRAN (R 4.1.0)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
 gt          * 0.4.0      2022-02-15 [1] CRAN (R 4.1.2)
 gtable        0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 haven         2.4.3      2021-08-04 [1] CRAN (R 4.1.0)
 hms           1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
 htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
 htmlwidgets   1.5.4      2021-09-08 [1] CRAN (R 4.1.0)
 httr          1.4.3      2022-05-04 [1] CRAN (R 4.1.2)
 jsonlite      1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
 KernSmooth    2.23-20    2021-05-03 [1] CRAN (R 4.1.2)
 knitr         1.39       2022-04-26 [1] CRAN (R 4.1.2)
 labeling      0.4.2      2020-10-20 [1] CRAN (R 4.1.0)
 lattice       0.20-45    2021-09-22 [1] CRAN (R 4.1.2)
 lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
 lpSolve       5.6.16     2022-09-04 [1] CRAN (R 4.1.2)
 lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
 magick        2.7.3      2021-08-18 [1] CRAN (R 4.1.0)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.1.2)
 maptools      1.1-4      2022-04-17 [1] CRAN (R 4.1.2)
 memoise       2.0.1      2021-11-26 [1] CRAN (R 4.1.0)
 modelr        0.1.8      2020-05-19 [1] CRAN (R 4.1.0)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 polite      * 0.1.2      2022-08-09 [1] CRAN (R 4.1.2)
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 progress      1.2.2      2019-05-16 [1] CRAN (R 4.1.0)
 proxy         0.4-27     2022-06-09 [1] CRAN (R 4.1.2)
 purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
 ralger      * 2.2.4      2021-03-17 [1] CRAN (R 4.1.0)
 ratelimitr    0.4.1      2018-10-07 [1] CRAN (R 4.1.0)
 Rcpp          1.0.9      2022-07-08 [1] CRAN (R 4.1.2)
 readr       * 2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
 readxl        1.3.1      2019-03-13 [1] CRAN (R 4.1.0)
 reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
 rgdal         1.5-32     2022-05-09 [1] CRAN (R 4.1.2)
 rgeos         0.5-9      2021-12-15 [1] CRAN (R 4.1.0)
 rlang         1.0.6      2022-09-24 [1] CRAN (R 4.1.2)
 rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
 robotstxt     0.7.13     2020-09-03 [1] CRAN (R 4.1.0)
 rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
 rvest         1.0.2      2021-10-16 [1] CRAN (R 4.1.0)
 s2            1.1.0      2022-07-18 [1] CRAN (R 4.1.2)
 sass          0.4.1      2022-03-23 [1] CRAN (R 4.1.2)
 scales        1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 selectr       0.4-2      2019-11-20 [1] CRAN (R 4.1.0)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.0)
 sf          * 1.0-8      2022-07-14 [1] CRAN (R 4.1.2)
 sp            1.5-0      2022-06-05 [1] CRAN (R 4.1.2)
 spiderbar     0.2.4      2021-05-16 [1] CRAN (R 4.1.0)
 stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
 stringr     * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
 tibble      * 3.1.7      2022-05-03 [1] CRAN (R 4.1.2)
 tidyr       * 1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
 tidyselect    1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
 tidyverse   * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)
 transformr    0.1.4.9000 2022-10-06 [1] Github (thomasp85/transformr@edea9ce)
 tweenr        2.0.2      2022-09-06 [1] CRAN (R 4.1.2)
 tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
 units         0.8-0      2022-02-05 [1] CRAN (R 4.1.2)
 usethis       2.1.5      2021-12-09 [1] CRAN (R 4.1.0)
 utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs         0.4.2      2022-09-29 [1] CRAN (R 4.1.2)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
 wk            0.6.0      2022-01-03 [1] CRAN (R 4.1.2)
 xfun          0.31       2022-05-10 [1] CRAN (R 4.1.2)
 xml2          1.3.3      2021-11-30 [1] CRAN (R 4.1.0)
 yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)

 [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

──────────────────────────────────────────────────────────────────────────────