TidyTuesday: Cocktails pt.2

This is part 2 of TidyTuesday: Cocktails. Below shows how we can use #rstats to write a cocktail recommendation system that takes in a drink and returns a few other cocktails based on similarly mixed ingredients. Load libraries library(tidyverse) library(recommenderlab) Download and parse data Note: please check out part 1 for deatils on processing steps bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') bc <- bc_raw %>% mutate(ingredient = str_to_lower(ingredient)) %>% distinct() %>% select(name, ingredient) bc_tidy <- bc %>% filter(!...

May 28, 2020 · Christopher Yee

TidyTuesday: Cocktails

Data from #tidytuesday week of 2020-05-26 (source) If you are looking for the R script then you can find it here Load packages library(tidyverse) library(ggrepel) library(FactoMineR) Download data bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') Data processing Standardize cases bc_raw %>% count(ingredient, sort = TRUE) %>% filter(str_detect(ingredient, "red pepper sauce")) ## # A tibble: 2 x 2 ## ingredient n ## <chr> <int> ## 1 Hot red pepper sauce 4 ## 2 hot red pepper sauce 1 Let’s fix that by making all ingredient values to lower case:...

May 26, 2020 · Christopher Yee

R script for the CausalImpact package

Google has an amazing #rstats package called CausalImpact to predict the counterfactual: what would have happened if an intervention did not occur. This is a quick technical post to get someone up and running rather than a review of its literature, usage, or idiosyncrasies Load libraries library(tidyverse) library(CausalImpact) Download (dummy) data df <- read_csv("https://raw.githubusercontent.com/Eeysirhc/random_datasets/master/cimpact_sample_data.csv") df %>% sample_n(5) ## # A tibble: 5 x 3 ## date experiment_type revenue ## <date> <chr> <dbl> ## 1 2020-04-02 control 309....

May 19, 2020 · Christopher Yee

TidyTuesday: Volcano Eruptions (python)

Data from #tidytuesday week of 2020-05-12 (source) but plotting in python. Load modules import pandas as pd import matplotlib.pyplot as plt import seaborn as sns Download and parse data volcano_raw = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/volcano.csv") volcano = volcano_raw[['primary_volcano_type', 'elevation']].sort_values(by='elevation', ascending=False) Visualize dataset sns.set(style="darkgrid") plt.figure(figsize=(20,15)) p = sns.boxplot(x=volcano.elevation, y=volcano.primary_volcano_type) p = sns.swarmplot(x=volcano.elevation, y=volcano.primary_volcano_type, color=".35") plt.xlabel("Elevation") plt.ylabel("") plt.title("What is the average elevation by volcano type?", x=0.01, horizontalalignment="left", fontsize=20) plt.figtext(0.9, 0.08, "by: @eeysirhc", horizontalalignment="right") plt....

May 12, 2020 · Christopher Yee

TidyTuesday: Animal Crossing

Data from #tidytuesday week of 2020-05-05 (source) Load packages library(tidyverse) library(ggfortify) Download data villagers_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/villagers.csv') Process data villagers <- villagers_raw %>% select(gender, species, personality) %>% mutate(species = str_to_title(species)) %>% group_by(gender, species, personality) %>% summarize(n = n()) %>% mutate(pct_total = n / sum(n)) %>% ungroup() Visualize data villagers %>% ggplot(aes(personality, pct_total, fill = gender, color = gender, group = gender)) + geom_polygon(alpha = 0.5) + geom_point() + coord_polar() + facet_wrap(~species) + labs(x = NULL, y = NULL, color = NULL, fill = NULL, title = "Animal Crossing: villager personality traits by species & gender", caption = "by: @eeysirhc\nsource:VillagerDB") + theme_bw() + theme(legend....

May 6, 2020 · Christopher Yee

Exploratory data analysis on COVID-19 search queries

The team at Bing were generous enough to release search query data with COVID-19 intent. The files are broken down by country and state level granularity so we can understand how the world is coping with the pandemic through search. What follows is an exploratory analysis on how US Bing users are searching for COVID-19 (a.k.a. coronavirus) information. tl;dr COVID-19 search queries generally fall into five distinct categories: 1....

May 5, 2020 · Christopher Yee

TidyTuesday: Beer Production

Data from #tidytuesday week of 2020-03-31 (source) Load packages library(tidyverse) library(gganimate) library(gifski) Download data beer_states_raw <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv") Clean data beer_total <- beer_states_raw %>% # FILL NULL VALUES WITH 0 replace(., is.na(.), 0) %>% # REMOVE LINE ITEM FOR 'TOTAL' filter(state != 'total') %>% # COMPUTE TOTAL BARRELS PER YEAR BY STATE group_by(year, state) %>% summarize(total_barrels = sum(barrels)) %>% ungroup() Create rankings beer_final <- beer_total %>% group_by(year) %>% mutate( # CALCULATE RANKINGS BY TOTAL BARRELS PRODUCED EACH YEAR rank = min_rank(-total_barrels) * 1....

April 14, 2020 · Christopher Yee

Script to track COVID-19 cases in the US

A couple weeks ago I shared an #rstats script to track global coronavirus cases by country. The New York Times also released COVID-19 data for new cases in the United States, both at the state and county level. You can run the code below on a daily basis to get the most up to date figures. Feel free to modify for your own needs: library(scales) library(tidyverse) library(gghighlight) state <- read_csv("https://raw....

March 30, 2020 · Christopher Yee