Data from #tidytuesday week of 2019-07-30 (source)
Load R packages
library(tidyverse) library(RColorBrewer) library(scales)
steam_raw <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-30/video_games.csv")
steam_games <- steam_raw %>% # VARIABLE FOR AGE OF GAME mutate(release_year = substring(release_date, 8, 12), # EXTRACT YEAR release_year = as.numeric(str_trim(release_year)), release_year = case_when(release_year == 5 ~ 2015, # INCORRECT DATA POINT TRUE ~ release_year), age = 2019 - release_year) %>% # VARIABLE FOR MIN/MAX NUMBER OF OWNERS mutate(max_owners = str_trim(word(owners, 2, sep = "\\..")), max_owners = as.numeric(str_replace_all(max_owners, ",", "")), min_owners = str_trim(word(owners, 1, sep = "\\..")), min_owners = as.numeric(str_replace_all(min_owners, ",", ""))) %>% # REMOVE VALUES WITH INCONSISTENT RELEASE_DATE FORMAT (n=37) filter(age < 15) %>% # FILTER OUT STUDIO SOFTWARE filter(price < 150)
Question: how many people still play games that are X years old (on Steam) ?
steam_games %>% select(age) %>% ggplot() + geom_histogram(aes(as.factor(age)), binwidth = 1, stat = 'count', fill = 'steelblue', alpha = 0.8) + scale_y_continuous(labels = comma_format()) + labs(x = "Years Since Release", y = NULL, title = "Count of active games on Steam since release", caption = "by: @eeysirhc\nsource: Steam Spy") + theme_light()
Question: What is the distrubtion of game ownership for the top publishers ?
steam_games %>% select(-number, -release_date) %>% mutate(max_owners = as.factor(max_owners / 1000000)) %>% group_by(publisher) %>% mutate(n = n()) %>% ungroup() %>% filter(n >= 80, publisher != 'NA') %>% mutate(publisher = paste0(publisher, " (", n, ")"), publisher = reorder(publisher, desc(n))) %>% ggplot(aes(publisher, max_owners, color = publisher)) + geom_jitter(show.legend = FALSE, size = 3, alpha = 0.5) + scale_color_brewer(palette = 'RdBu') + theme_light() + labs(x = NULL, y = "Estimated Game Ownership per Million", title = "Distribution of ownership for top game publishers", subtitle = "(n) indicates total titles released", caption = "by: @eeysirhc\nsource: Steam Spy") + theme(panel.background = element_rect(fill = 'grey95'), plot.background = element_rect(fill = 'grey95'), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1))
- Mass Market: whether it is their distribution network or brand name, publishers such as Ubisoft (117) and Square Enix (85) manage to consistently break into the 1M+ user base categories.
- Some Hits Mostly Miss: majority of the top publishers will fall in this category like Strategy First (126) or Degica (81) where a few game titles will break 50K or 100K users but majority will fall in the 20K bucket.
- Niche Market: publishers like Big Fish Games (n=265) and Choice of Games (n=92) have a lot of titles under their belts where each game acquires no more than 20K users. It is possible they do not want to compete against major publishers like Ubisoft or SEGA but instead choose to churn out games which cater to a super niche audience.