TidyTuesday: Cocktails pt.2

This is part 2 of TidyTuesday: Cocktails. Below shows how we can use #rstats to write a cocktail recommendation system that takes in a drink and returns a few other cocktails based on similarly mixed ingredients. Load libraries library(tidyverse) library(recommenderlab) Download and parse data Note: please check out part 1 for deatils on processing steps bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') bc <- bc_raw %>% mutate(ingredient = str_to_lower(ingredient)) %>% distinct() %>% select(name, ingredient) bc_tidy <- bc %>% filter(!...

May 28, 2020 · Christopher Yee

TidyTuesday: Cocktails

Data from #tidytuesday week of 2020-05-26 (source) If you are looking for the R script then you can find it here Load packages library(tidyverse) library(ggrepel) library(FactoMineR) Download data bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') Data processing Standardize cases bc_raw %>% count(ingredient, sort = TRUE) %>% filter(str_detect(ingredient, "red pepper sauce")) ## # A tibble: 2 x 2 ## ingredient n ## <chr> <int> ## 1 Hot red pepper sauce 4 ## 2 hot red pepper sauce 1 Let’s fix that by making all ingredient values to lower case:...

May 26, 2020 · Christopher Yee

TidyTuesday: Volcano Eruptions (python)

Data from #tidytuesday week of 2020-05-12 (source) but plotting in python. Load modules import pandas as pd import matplotlib.pyplot as plt import seaborn as sns Download and parse data volcano_raw = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/volcano.csv") volcano = volcano_raw[['primary_volcano_type', 'elevation']].sort_values(by='elevation', ascending=False) Visualize dataset sns.set(style="darkgrid") plt.figure(figsize=(20,15)) p = sns.boxplot(x=volcano.elevation, y=volcano.primary_volcano_type) p = sns.swarmplot(x=volcano.elevation, y=volcano.primary_volcano_type, color=".35") plt.xlabel("Elevation") plt.ylabel("") plt.title("What is the average elevation by volcano type?", x=0.01, horizontalalignment="left", fontsize=20) plt.figtext(0.9, 0.08, "by: @eeysirhc", horizontalalignment="right") plt....

May 12, 2020 · Christopher Yee

TidyTuesday: Animal Crossing

Data from #tidytuesday week of 2020-05-05 (source) Load packages library(tidyverse) library(ggfortify) Download data villagers_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/villagers.csv') Process data villagers <- villagers_raw %>% select(gender, species, personality) %>% mutate(species = str_to_title(species)) %>% group_by(gender, species, personality) %>% summarize(n = n()) %>% mutate(pct_total = n / sum(n)) %>% ungroup() Visualize data villagers %>% ggplot(aes(personality, pct_total, fill = gender, color = gender, group = gender)) + geom_polygon(alpha = 0.5) + geom_point() + coord_polar() + facet_wrap(~species) + labs(x = NULL, y = NULL, color = NULL, fill = NULL, title = "Animal Crossing: villager personality traits by species & gender", caption = "by: @eeysirhc\nsource:VillagerDB") + theme_bw() + theme(legend....

May 6, 2020 · Christopher Yee

TidyTuesday: Beer Production

Data from #tidytuesday week of 2020-03-31 (source) Load packages library(tidyverse) library(gganimate) library(gifski) Download data beer_states_raw <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv") Clean data beer_total <- beer_states_raw %>% # FILL NULL VALUES WITH 0 replace(., is.na(.), 0) %>% # REMOVE LINE ITEM FOR 'TOTAL' filter(state != 'total') %>% # COMPUTE TOTAL BARRELS PER YEAR BY STATE group_by(year, state) %>% summarize(total_barrels = sum(barrels)) %>% ungroup() Create rankings beer_final <- beer_total %>% group_by(year) %>% mutate( # CALCULATE RANKINGS BY TOTAL BARRELS PRODUCED EACH YEAR rank = min_rank(-total_barrels) * 1....

April 14, 2020 · Christopher Yee

TardyThursday: College Tuition, Diversity & Pay

The differences between this unsanctioned #tardythursday and the official #tidytuesday: These will publish on Thursday (obviously) The dataset will come from a completely different week of TidyTuesday For a surprise, I’ll code with either #rstats or python (similar to #makeovermonday) Load modules import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Download and parse data df_raw=pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/salary_potential.csv") df=df_raw[['state_name', 'early_career_pay', 'mid_career_pay']].groupby('state_name').mean().reset_index() Visualize dataset sns.set(style="darkgrid") plt.figure(figsize=(20,15)) g=sns....

March 19, 2020 · Christopher Yee

MakeoverMonday: Women in the Workforce

Goal of #makeovermonday is to transform some of my #rstats articles and visualizations to their python equivalent. Original plot for this #tidytuesday dataset can be found here. Load modules import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Download and parse data df_raw = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/jobs_gender.csv", sep=',', error_bad_lines=False, index_col=False, dtype='unicode') # FILTER ONLY FOR 2016 df_raw = df_raw[df_raw['year']=='2016'] df_raw = df_raw[['major_category', 'total_earnings_male', 'total_earnings_female', 'total_earnings', 'total_workers', 'workers_male', 'workers_female']] # REMOVE NULL VALUES df_raw = df_raw....

February 17, 2020 · Christopher Yee

TidyTuesday: Adoptable Dogs

Data from #tidytuesday week of 2019-12-17 (source) Quick post to showcase the amazing {reticulate} package which has made my life so much easier! Who said you had to choose between R vs Python? Load packages library(tidyverse) library(reticulate) R then Python Grab and parse data df_rdata <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_moves.csv") df_rdata <- df_rdata %>% filter(inUS == 'TRUE') %>% select(location, total) df_rdata %>% head() ## # A tibble: 6 x 2 ## location total ## <chr> <dbl> ## 1 Texas 566 ## 2 Alabama 1428 ## 3 North Carolina 2627 ## 4 South Carolina 1618 ## 5 Georgia 3479 ## 6 California 1664 Plot data import pandas as pd import seaborn as sns import matplotlib....

December 17, 2019 · Christopher Yee