I love long camping adventures, but I also love camping vicariously through couples like the Watsons, who make adventure their life. The Watsons are particularly cool because they’ve used their talents to make an awesome website with beautiful maps of their journey over the years, as well as data about it.

I was curious about how much this kind of lifestyle costs, so I did some quick plotting. But first, I needed get the data into R.

Data exploration - set-up

While it’s possible to grab data directly from the web, I decided to go the copy and paste approach. I copied the table of camping locations and prices from their website and saved it into a text file. Then, I read it in and used dplyr to clean it.

library(tidyverse)
library(lubridate)
library(plotly)

data_raw <- readLines("data/data.txt")

date_seq <- tibble(arrived = ymd("2012-06-14") + 1:2142)

data <- tibble(text = data_raw[-1]) %>% 
    mutate(lines = ceiling(1:n()/3)) %>% 
    group_by(lines) %>% 
    summarize(text = str_c(text, collapse = "\t")) %>% 
    separate(text, 
             into = c("arrived", "campground", "city", "type", "price", "extra"), 
             sep = "\t") %>% 
    select(-lines, -extra) %>% 
    mutate(arrived = mdy(arrived),
           price = str_extract(price, "[0-9]+") %>% as.numeric()) %>% 
    separate(city, c("city", "state"), sep = ", ", extra = "merge") %>% 
    right_join(date_seq) %>% 
    fill(-arrived)

print(data)
## # A tibble: 2,142 x 6
##    arrived    campground                 city           state  type  price
##    <date>     <chr>                      <chr>          <chr>  <chr> <dbl>
##  1 2012-06-15 Chapin Rd                  Essex          Vermo~ Priv~    0.
##  2 2012-06-16 Long Point State Park      Three Mile Bay New Y~ Stat~   30.
##  3 2012-06-17 Long Point State Park      Three Mile Bay New Y~ Stat~   30.
##  4 2012-06-18 Keuka Lake                 Keuka Park     New Y~ Stat~   26.
##  5 2012-06-19 Keuka Lake                 Keuka Park     New Y~ Stat~   26.
##  6 2012-06-20 Keuka Lake                 Keuka Park     New Y~ Stat~   26.
##  7 2012-06-21 Keuka Lake                 Keuka Park     New Y~ Stat~   26.
##  8 2012-06-22 Sampson State Park         Romulus        New Y~ Stat~   30.
##  9 2012-06-23 Sampson State Park         Romulus        New Y~ Stat~   30.
## 10 2012-06-24 Buckaloons Recreation Area Irvine         Penns~ Nati~   23.
## # ... with 2,132 more rows

Cumulative cost

With the data in R, I could do some plotting! I started with the cumulative cost of the trip - how much the Watsons have spent so far on accomodation alone.

plot_cumcost <- data %>% 
    mutate(cum_cost = cumsum(price)) %>% 
    ggplot(aes(x = arrived, y = cum_cost, text = arrived)) +
    geom_point() +
    xlab("Date") + ylab("Cumulative cost ($)") +
    theme_bw()

ggplotly(plot_cumcost, tooltip = "text")

From June 2012 to April 2018 (almost 6 years!), they’ve spent just over $40,000 on housing. Considering that the average individual in coastal California spends at least $1000 on housing per month ($12,000/year), that’s pretty good!

Cost per month

Most of us think of housing costs in terms of the cost per month, so let’s redo the graph a little:

data_monthly <- data %>% 
    mutate(month = month(arrived),
           year = year(arrived),
           date = ymd(paste(year, month, 1, sep = "-")),
           date_lab = paste(year, month, sep = "-")) %>% #for labeling
    group_by(date, date_lab) %>% 
    summarize(month_price = sum(price))

plot_monthly <- data_monthly %>% 
    ggplot(aes(x = date, y = month_price)) +
    geom_line() +
    geom_point(aes(text = date_lab)) +
    xlab("Date") + ylab("Cost per month ($)") +
    theme_bw()

ggplotly(plot_monthly, tooltip = c("month_price", "text"))

Now we see that early 2017 was an expensive time for the Watsons, but otherwise, they were generally keeping their accomodation costs to less than $1000/month!

#remove the first and last month since they might be incomplete
data_monthly_clean <- data_monthly[-c(1, nrow(data_monthly)),]

mean(data_monthly_clean$month_price)
## [1] 608.9855

In fact, the mean cost is about $600.

Types of accomodation

Now let’s look at the types of accomodation, and how they broke down in terms of price. I started off using a boxplot, but I added on geom_jitter because it gave me a better sense of the distribution. Here, I’ve also plotted each campground just once.

plot_types <- data %>% 
    distinct(campground, .keep_all = TRUE) %>% 
    mutate(type = reorder(type, price)) %>%
    ggplot(aes(x = type, y = price, text = campground)) +
    geom_boxplot(fill = "palegreen") +
    geom_jitter(width = .3) +
    coord_flip() +
    xlab("Price ($)") + ylab("Accomodation type") +
    theme_bw()

ggplotly(plot_types, tooltip = c("text", "price"))

Some of these categories seem to be more detailed than necessary, so let’s group some together. “Boondocking” by definition denotes $0, but maybe we can gain some insight if we group these categories into federal, state, private, and miscellaneous government land.

data_refact <- data %>% 
    distinct(campground, .keep_all = TRUE) %>% 
    mutate(type = fct_collapse(type,
                               Federal = c("Army Corps of Engineers",
                                           "BLM Boondocking",
                                           "BLM Campground",
                                           "National Forest",
                                           "National Forest Boondocking",
                                           "National Park",
                                           "National Park Boondocking",
                                           "Tennessee Valley Authority"),
                               State = c("Montana Fish & Wildlife",
                                         "State Forest Campground",
                                         "State Park",
                                         "State Park Boondocking"),
                               Private = c("Parking Lot",
                                           "Private Residence",
                                           "Private RV Park",
                                           "House Rental"),
                               OtherGov = c("City Park",
                                            "County Park"))) 

#let's first save it for future use...
write.csv(data_refact, "data/data_refact.csv")

plot_refact <- data_refact %>% 
    mutate(type = reorder(type, price)) %>%
    ggplot(aes(x = type, y = price, text = campground, color = type)) +
    geom_jitter(width = .3) +
    coord_flip() +
    xlab("Price ($)") + ylab("Accomodation type") +
    theme_bw()

ggplotly(plot_refact, tooltip = c("text", "price")) %>% 
    hide_legend()

Here we can kind of see that federal and state parks are similar in price, but federal land has more boondocking options. The private category now spans the cheapest and most expensive extremes because it includes both parking lots (at Walmarts and casinos, for the most part) and house rentals.

State comparison

Perhaps what we really need is a comparison across states. A state like California, for example, can be very expensive (at least on the coast), especially compared to a state with a lot of federal land, like Nevada. To keep things simple, I filtered out the wild variation found in the private and other government categories, keeping just State and Federal. I then plotted as before, but this time by state:

plot_states <- data_refact %>% 
    filter(type %in% c("State", "Federal"),
           !str_detect(state, "Ontario|British")) %>% #filter out Canada
    mutate(state = reorder(state, price)) %>% 
    ggplot(aes(x = state, y = price, color = type, text = campground)) +
    geom_jitter() +
    coord_flip() +
    xlab("Price ($)") + ylab("State") +
    theme_bw()

ggplotly(plot_states)

California’s looking pretty crazy! It has both a lot of expensive camping options (I can definitely attest to this) and really cheap (free) options.

Otherwise, the West (as expected) is generally pretty cheap camping, mostly thanks to amazing federal land and its boondocking splendor.

Geocoding

Geocoding - getting the lat/longs from the description of a location - is an amazing but somewhat expensive process (in terms of time/API calls). The mutate_geocode function from the ggmap package goes row by row through your location description column and makes calls to the Google Maps API (or Data Science Toolkit). In my case, it could not find all values the first time around, so I had to repeat the process several times until I had lat/longs for all my locations. You can see that in my original script.

Here, I’ve cleaned the script up a bit, and here’s what it looks like:

library(ggmap)
data_geo <- data_refact %>% 
    distinct() %>% 
    mutate(full_loc = paste(campground, city, state, sep = ", ")) %>% 
    mutate_geocode(full_loc)

With this information, it’s easy to go onto the next step, and map the locations! A big thanks to this blogpost by Jesse Sadler for getting me started on the geocoding process.

Mapping

We’ll need a few new packages to make this work: the leaflet package for interactive maps, and the viridis package for adding color!

library(leaflet)
library(viridis)

I’m most interested in the prices of the campsites, so I’ll categorize them into four groups: “free”, “cheap” (less than $12/night, as defined by freecampsites.net), “moderate” (about double “cheap”), and “expensive” (as far as sleeping in a tent is concerned).

data_geo <- data_geo %>% 
    mutate(price_cat = case_when(
        price == 0 ~ "free",
        price > 0 & price <= 12 ~ "cheap",
        price > 12 & price <= 25 ~ "moderate",
        price > 25 ~ "expensive"))

Along with the categories, I need to define a color palette. Here, I’ve just used the viridis pre-sets.

pal <- colorFactor(viridis_pal()(4), 
                   domain = c("free", "cheap", "moderate", "expensive"))

With just a few lines of code, my map is ready to go. Voila!

m <- leaflet(data_geo) %>%
    addTiles() %>%  
    addCircleMarkers(color = ~pal(price_cat),
                     popup = paste(data_geo$campground,
                                   data_geo$price,
                                   sep = ", $"))
m 



In green are the free sites, purple is “cheap”, yellow is “moderate”, and blue is “expensive”. If you click on a site, you can see the campground (or sometimes parking lot/airbnb) name and the price.