Visualizing the relationship between quality score & CPC

The SEM industry has published a lot of information about the importance of improving quality score to lower average cost per click (CPC). Most of those articles, however, just share a table with quality score in one column and its associated % increase/decrease to average CPC in the other. Although helpful I think it misses the mark on underscoring the magnitude of how much QS can help CPC. We will do something different: the python code below will take that data and visualize the impact to average CPC for a given quality score....

August 11, 2020 · Christopher Yee

Star Wars: exploring Lucas vs Disney era ticket sales

With the end of the latest Star Wars trilogy, I wanted to compare, contrast, and explore Lucas vs Disney era domestic box office revenue. The analysis and python code below will parse weekly ticket sales from Box Office Mojo, adjust revenue numbers for inflation, visualize, and attempt to uncover insights from the data. TL;DR The top 3 revenue generating films (inflation-adjusted) are the first movie for each trilogy Disney era films do not make it past week 20 compared to the Lucas era On average, Lucas era movies generate 80% of their revenue within the first 10 weeks of release while Disney takes 2....

August 1, 2020 · Christopher Yee

Examining drug effectiveness studies via simulation

One of my dogs was recently diagnosed with an enlarged heart so the vet prescribed some medicine to mitigate the problem. The box came with a pamphlet which included the company’s effectiveness study for the drug, Vetmedin. I thought it would be fun to visualize one portion of the study with simulation. What follows is the #rstats code I used to examine and review the drug’s efficacy based on the reported results....

July 10, 2020 · Christopher Yee

Algorithm to prioritize home improvement projects

I moved to Los Angeles with my wife in October 2019 where we had a list of home improvement projects we wanted to complete or things to purchase. The problem we faced was disagreement on where to start since we had to juggle costs and compromise on what was most important at the time. For example, if we focused too much on lower ticket purchases we would delay projects that had potential to improve our home value....

July 2, 2020 · Christopher Yee

10x SEM performance: unlock the power of your own data

Excerpt from my original article about using first-party data for custom SEM bidding: With each passing year the utility of third-party cookies continues to decline as it faces barriers from web browsers and government regulation. The current system still works and there will be alternatives but it is always best to start preparing for the new status quo. FT Optimize believes that the best way to maximize SEM ad revenue is to use first-party data....

June 25, 2020 · Christopher Yee

On Business Value vs Technical Knowledge

The purpose of this article is to elaborate and visualize (surprise!) my comment I left over on Reddit: Professional data scientists: did you overcome the feeling of never knowing enough? If so, how? I think this concept can be applied to any field - not just data science. My personal advice that has worked for me to quell any “insecurities” is frame your mindset in terms of business value vs technical knowledge....

June 15, 2020 · Christopher Yee

Recreating plots in R: intro to bootstrapping

Objective: recreate and visualize the 500K sampling distribtuion of means from this intro to bootstrapping in statistics post using R. Load libraries library(tidyverse) library(rsample) Download data df <- read_csv("https://statisticsbyjim.com/wp-content/uploads/2017/04/body_fat.csv") Bootstrap resampling 500K df_bs <- df %>% bootstraps(times = 500000) %>% mutate(average = map_dbl(splits, ~ mean(as.data.frame(.)$`%Fat`))) Visualize sampling distribution of means df_bs %>% ggplot(aes(average)) + geom_histogram(binwidth = 0.1, alpha = 0.75, color = 'white', fill = 'steelblue') + scale_x_continuous(limits = c(25, 32)) + scale_y_continuous(labels = scales::comma_format()) + labs(title = "Histogram of % Fat", subtitle = "500K bootstrapped samples with 92 observations in each", x = "Average Mean", y = "Frequency") + theme_minimal() ...

June 1, 2020 · Christopher Yee

TidyTuesday: Cocktails pt.2

This is part 2 of TidyTuesday: Cocktails. Below shows how we can use #rstats to write a cocktail recommendation system that takes in a drink and returns a few other cocktails based on similarly mixed ingredients. Load libraries library(tidyverse) library(recommenderlab) Download and parse data Note: please check out part 1 for deatils on processing steps bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') bc <- bc_raw %>% mutate(ingredient = str_to_lower(ingredient)) %>% distinct() %>% select(name, ingredient) bc_tidy <- bc %>% filter(!...

May 28, 2020 · Christopher Yee