Christopher Yee

Deciphering Hopper's Data Puzzle

I like to browse company career pages once in awhile to see what positions they have open. In my opinion, this provides a glimpse into what they are investing in for the next few years. Hopper is one company which stands out but the reason I am writing this is a puzzle they included in the job description: At Hopper, every dataset tells a story. Do you have what it takes to decipher the clues? bit.ly/2q6U8dq ...

Using R & GSC data to identify stale content

My friend John-Henry Scherck recently tweeted his process on how to refresh stale content: Put together a quick video on how to refresh stale content using nothing more than Google Search Console and a word doc. Check out the full video here: https://t.co/Vva4Zm4mNn pic.twitter.com/74Fm2oIz4c — John-Henry Scherck (@JHTScherck) January 21, 2020 I imagine this can be broken down into five distinct parts: Stale content selection Understanding keyword intent Actually refreshing the content Internal link optimization Publish This short guide will focus on the first aspect where we’ll use #rstats to remove the manual work associated with stale candidate selection. ...

[Updated] Top Industries from Inc.5000 Companies

Changelog Originally published on September 10th, 2019 Built a Shiny app for this Full code can be found on GitHub One of my favorite online marketers, (the) Glen Allsopp, tweeted the following: Over the past few weeks I've went through every site in the Inc. 5000. My mind has been blown multiple times. Don't click if you're easily distracted. Enjoy! https://t.co/mHVK8rvb9X pic.twitter.com/BoEb3qQ7LZ — Glen Allsopp (@ViperChill) August 27, 2019 The public spreadsheet contains four fields: ...

Visualizing intraday SEM performance with R

Aside from the base bid, Google SEM campaign performance can be influenced by contextual signals from the customer. These include but are not limited to: device, location, gender, parental status, household income, etc. For this post we’ll focus on ad schedule (or intraday) and visualize how time of day and day of week is performing. Load data library(tidyverse) # ANONYMIZED SAMPLE DATA df <- read_csv("https://raw.githubusercontent.com/Eeysirhc/random_datasets/master/intraday_performance.csv") Spot check our data df %>% sample_n(20) ## # A tibble: 20 x 5 ## account day_of_week hour_of_day roas conv_rate ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 Account 3 Tuesday 5 0.509 0.0183 ## 2 Account 2 Friday 4 1.11 0.0401 ## 3 Account 2 Sunday 11 1.07 0.0309 ## 4 Account 3 Saturday 18 1.09 0.0301 ## 5 Account 1 Thursday 19 0.303 0.0165 ## 6 Account 1 Tuesday 8 0.362 0.0230 ## 7 Account 2 Saturday 4 0.722 0.0340 ## 8 Account 3 Friday 10 0.653 0.00844 ## 9 Account 2 Wednesday 8 0.448 0.0262 ## 10 Account 1 Saturday 9 0.858 0.0467 ## 11 Account 1 Saturday 18 0.266 0.0136 ## 12 Account 1 Saturday 8 0.871 0.0349 ## 13 Account 2 Friday 14 0.546 0.0196 ## 14 Account 1 Sunday 5 0.0444 0.00889 ## 15 Account 3 Wednesday 21 0.530 0.0248 ## 16 Account 1 Tuesday 16 0.801 0.0451 ## 17 Account 2 Monday 2 0.884 0.0230 ## 18 Account 2 Wednesday 19 0.772 0.0275 ## 19 Account 3 Monday 21 0.444 0.0367 ## 20 Account 1 Tuesday 3 0 0 Clean data Convert to factors The day_of_week is a character and time_of_day is a double data type. We need to transform them to factors so they don’t surprise us later. ...

Code Answers to SQL Murder Mystery

Pretty fun murder mystery from @knightlab - can you find the killer using #SQL?https://t.co/vXcMtY2b1c — Christopher Yee (@Eeysirhc) December 20, 2019 CLUE #1 There is a murder in SQL City on 2018-01-15. select * from crime_scene_report where type = 'murder' and city = 'SQL City' and date = '20180115' CLUE #2 Witness 1 lives in the last house on Northwestern Dr. Witness 2 is named Annabel and lives somehwere on Franklin Ave. ...

TidyTuesday: Adoptable Dogs

Data from #tidytuesday week of 2019-12-17 (source) Quick post to showcase the amazing {reticulate} package which has made my life so much easier! Who said you had to choose between R vs Python? Load packages library(tidyverse) library(reticulate) R then Python Grab and parse data df_rdata <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_moves.csv") df_rdata <- df_rdata %>% filter(inUS == 'TRUE') %>% select(location, total) df_rdata %>% head() ## # A tibble: 6 x 2 ## location total ## <chr> <dbl> ## 1 Texas 566 ## 2 Alabama 1428 ## 3 North Carolina 2627 ## 4 South Carolina 1618 ## 5 Georgia 3479 ## 6 California 1664 Plot data import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # note the r. before the df_rdata value fig = sns.barplot(x="total", y="location", data=r.df_rdata, orient="h") plt.xlabel("Adoptable Dogs Available") plt.ylabel("") plt.figtext(0.9, 0.03, "by: @eeysirhc", horizontalalignment="right") plt.figtext(0.9, 0.01, "source: The Pudding", horizontalalignment="right") plt.show(fig) ...

Calculating & estimating annual salaries with R

A couple weeks ago, a friend asked me about my base annual salary during my time as Square’s SEO Lead. Rather than spitting out a number, I thought it would be more interesting to see if we could answer her question using #rstats. tl;dr This is what I posted on Twitter: Ok #bayesian twitter: helping a friend with salary negotiations and this incorporates what she wants, job boards, confirmed salaries, etc……how do I validate if this model is a load of crock or not? pic.twitter.com/WUfcdHBtUX ...

Connect R to Amazon Redshift Database

This is a quick technical post for anyone who needs full CRUD capabilities to retrieve their data from a Redshift table, manipulate data in #rstats and sending it all back up again. Dependencies Load libraries library(tidyverse) library(RPostgreSQL) # INTERACT WITH REDSHIFT DATABASE library(glue) # FORMAT AND INTERPOLATE STRINGS Amazon S3 For this data pipeline to work you’ll also need the AWS command line interface installed. # RUN THESE COMMANDS INSIDE TERMINAL brew install awscli aws configure # ANSWER QUESTIONS access / secret / zone Read data Set connection You’ll need to replace with your own database credentials below: ...