Plotting Weekly Mobile Retention from the Localytics API using R and ggplot2

Part of building mobile web apps is understanding the myriad of mobile analytics and in part visualizing the data to shed light on trends that my otherwise be difficult to see in tabular data or even a colorful cohort table. I’ve been building a dashboard using R, RStudio, Shiny, and Shiny Dashboards aggregating data from MSSQL, Postgres, Google Analytics, and Localytics.

Within the Product section of the dashboard I’ve included a retention chart and found some great articles at R-Blogger like this one. The retention data comes from the Localytics API which I discussed previously though getting the data into the proper format took a few steps. Let’s start with the data, here’s an example of the REST response from Localytics looks like for a weekly retention cohort:

{
"results": [
{
"birth_week": "2014-09-08",
"users": 1,
"week": "2014-12-29"
},
{
"birth_week": "2014-09-29",
"users": 1640,
"week": "2014-12-29"
},
{
"birth_week": "2014-10-06",
"users": 2988,
"week": "2014-12-29"
},
{
"birth_week": "2014-10-13",
"users": 4747,
"week": "2014-12-29"
},
{
"birth_week": "2014-10-20",
"users": 2443,
"week": "2014-12-29"
},
…

Below is the main function to fetch the Localytics sample data and convert it into a data frame that’s suitable for plotting. Now, admittedly I’m not an R expert so there may well be better ways to slice this JSON response but this is a fairly straight forward approach. Essentially, this fetches the data, converts it from JSON to an R object, extracts the weeks, preallocates a matrix and then iterates over the data filling the matrix to build a data frame.

retentionDF <- function() {
  # Example data from: http://docs.localytics.com/dev/query-api.html#query-api-example-users-by-week-and-birth_week
  localyticsExampleJSON <- getURL('https://gist.githubusercontent.com/strefethen/180efcc1ecda6a02b1351418e95d0a29/raw/1ad93c22488e48b5e62b017dc5428765c5c3ba0f/localyticsexampledata.json')
  cohort <- fromJSON(localyticsExampleJSON)
  weeks <- unique(cohort$results$week)
  numweeks <- length(weeks)
  # Take the JSON response and convert it to a retention matrix (all numeric for easy conversion to a dataframe) like so:
  # Weekly.Cohort Users Week.1
  # 1    2014-12-29  7187   4558
  # 2    2015-01-05  5066     NA
  i <- 1
  # Create a matrix big enough to hold all of the data
  m <- matrix(nrow=numweeks, ncol=numweeks + 1)
  for (week in weeks) {
    # Get data for all weeks of this cohort
    d <- cohort$results[cohort$results$birth_week==week,][,2]
    lencohort <- length(d)
    for (n in 1:lencohort) {
      # Skip the first column using "+ 1" below which will be Weekly.Cohort (date)
      m[i,n + 1] <- d[n]
    }
    i <- i + 1
  }
  # Convert matrix to a dataframe
  df <- as.data.frame(m)
  # Set values of the first column to the cohort dates
  df$V1 <- weeks
  # Set the column names accordingly
  colnames(df) <- c("Weekly.Cohort", "Users", paste0("Week.", rep(1:(numweeks-1))))
  return(df)
}

To make things easy I put together a gist and if you’re using R you can runGist it yourself. It requires several other packages so be sure to check the sources in case you’re missing any.
Fair warning the Localytics API demo has very limited data so the chart, let’s just say simplistic however given many weeks worth of data it will fill out nicely (see example below).

> library(shiny)
> runGist("180efcc1ecda6a02b1351418e95d0a29")
Localytics Retention Plot Example
Retention Chart