Opioid Prescribing by State

A detailed guide on how I scraped opioid prescription history from the CDC and created an animated map to highlight changing behavior.

Mirza S. Khan (Vanderbilt University)
2021-01-28

For some reason, I really enjoy making maps as a way to illustrate state and regional variation. One area that has appropriately gotten a lot of press is opioid prescribing. The CDC is a great data resource, and maintains state and county-level prescibing data.

Unfortunately, I did not find that the data was available in an easily readable format, e.g. CSV, JSON, etc. What follows is a walk-through of how I went about scraping the CDC website to get this data.

Scraping

May I scrape?

It is good practice to check a site’s robots.txt to make sure that they are cool with you accessing and scraping their data. The polite package makes this pretty simple.

library(polite)

bow(url = "https://www.cdc.gov/",
    user_agent = "M Khan",
    force = TRUE)
<polite session> https://www.cdc.gov/
    User-agent: M Khan
    robots.txt: 44 rules are defined for 4 bots
   Crawl delay: 5 sec
  The path is scrapable for this user-agent

Test case with 2017

Now that I see that it is ok for me to scrape the data, I’d like to start out with a single case - the year 2017 - to test things out before replicating it for all available years.

Tools:

library(ralger)

opioidRx_2017 <- ralger::table_scrap("https://www.cdc.gov/drugoverdose/maps/rxstate2017.html")

# Tidy up the column names
col_names <- c("state", "state_abbrev", "opioid_Rx_rate")
colnames(opioidRx_2017) <- col_names 

opioidRx_2017 %>% 
  head() %>% 
  gt::gt()
state state_abbrev opioid_Rx_rate
United States US 59.0
Alaska AK 52.0
Alabama AL 108.8
Arkansas AR 106.1
Arizona AZ 60.8
California CA 39.8

Things seem reasonable for 2017, I can now apply the same process for all of the available years.

Get the links for each year

We find that the links for the years 2006 to 2017 are available at this link. I can use the same SelectorGadget method to identify and subsequently scrape the links for each of these years.

# Collect list of sublinks
sublinks <- ralger::scrap(
  "https://www.cdc.gov/drugoverdose/maps/rxrate-maps.html", 
  node = ".mb-3+ .mb-3 a")

Looks like I’ll need to append each of these with “https://www.cdc.gov/drugoverdose/maps/rxcounty”. Let’s create a dataframe that has the year and the corresponding link.

opioid_links <- tibble::tibble(year = readr::parse_number(sublinks),
               link = purrr::map_chr(sublinks, 
                                     ~glue::glue("https://www.cdc.gov/drugoverdose/maps/rxstate{.x}.html")))

opioid_links %>% 
  head(3) %>% 
  gt()
year link
2006 https://www.cdc.gov/drugoverdose/maps/rxstate2006.html
2007 https://www.cdc.gov/drugoverdose/maps/rxstate2007.html
2008 https://www.cdc.gov/drugoverdose/maps/rxstate2008.html

Scrape each year’s data

Now that we’ve got all of our links, we can perform the same function we did above with the 2017 for all the years.

Scrape all of the links:

opioid_df <- opioid_links %>% 
  mutate(data_df = map(link, possibly(ralger::table_scrap, otherwise = NULL, quiet = FALSE)))

I’d like to point out that I used possibly. This is to make sure that the mapping function is all for naught if there is an error of some sort. Check out the R4DS section on Iteration: Dealing with Failure, which also covers safely and quietly.

# Notice how 'data_df' column is just a nested df
opioid_df %>% 
  head(3)
# A tibble: 3 x 3
   year link                                           data_df        
  <dbl> <chr>                                          <list>         
1  2006 https://www.cdc.gov/drugoverdose/maps/rxstate… <df[,3] [51 × …
2  2007 https://www.cdc.gov/drugoverdose/maps/rxstate… <df[,3] [51 × …
3  2008 https://www.cdc.gov/drugoverdose/maps/rxstate… <df[,3] [51 × …

For each year, I have a corresponding tibble that contains state level opioid prescribing data. I will have to use unnest() to expand this.

complete_df <- opioid_df %>% 
  unnest(cols = c(data_df)) %>% 
  rename(
    "state" = "State",
    "state_abbrev" = "State Abbreviation",
    "opioid_Rx_rate" = "Opioid Dispensing Rate per 100"
  ) %>% 
  # Name of abbreviation column changed from pre- and post-2017
  mutate(state_abbrev = coalesce(state_abbrev, Abbreviation)) %>% 
  select(-Abbreviation, -link)

complete_df %>% 
  head() %>% 
  gt()
year state state_abbrev opioid_Rx_rate
2006 Alabama AL 115.6
2006 Alaska AK 63.4
2006 Arizona AZ 74.3
2006 Arkansas AR 98.3
2006 California CA 51.0
2006 Colorado CO 62.2

Mapping

Above, I outlined how I went about scraping the CDC website for opioid prescribing data. Now, I’ll describe how I took this data and made a map and used gganmiate to make a GIF that better shows the state-level trends from 2006-2019.

First, a blank canvas:

# library(sf)        # follow installation instructions at: https://r-spatial.github.io/sf/
# library(albersusa)
# library(gganimate)

ggplot() + geom_sf(data = albersusa::usa_sf()) +
  ggthemes::theme_map() + coord_sf(crs = albersusa::us_aeqd_proj)

You may notice that the relevant geometric information for each state is provided in albersusa::usa_sf(). Thus, we must map this to the data of interest with a join, e.g. *_join, merge, etc.

A map using 2006 data

I’ll start by making a map of prescribing rates for 2006.

usa_sf() %>% 
  inner_join(complete_df %>% filter(year == 2006), # filter for the year 2006
             by = c("iso_3166_2" = "state_abbrev")) %>% 
  ggplot() +
  geom_sf(aes(fill = opioid_Rx_rate)) +
  scale_fill_gradient2(name = "Opioid Rx rate",
                       low = "blue", high = "red", midpoint = median(complete_df$opioid_Rx_rate)) +
  ggthemes::theme_map() +
  coord_sf(crs = albersusa::us_aeqd_proj) +
  theme(legend.position = "right") +
  labs(title = "Opioid Prescribing Rates in 2006",
       subtitle = "Rates reported as retail opioid prescriptions per 100 persons",
       caption = "by Mirza Khan (@mirzakhan_) \n 
       Source: CDC National Center for Injury Prevention and Control")