My friend John-Henry Scherck recently tweeted his process on how to refresh stale content:
Put together a quick video on how to refresh stale content using nothing more than Google Search Console and a word doc.— John-Henry Scherck (@JHTScherck) January 21, 2020
Check out the full video here: https://t.co/Vva4Zm4mNn pic.twitter.com/74Fm2oIz4c
I imagine this can be broken down into five distinct parts:
- Stale content selection
- Understanding keyword intent
- Actually refreshing the content
- Internal link optimization
This short guide will focus on the first aspect where we’ll use #rstats to remove the manual work associated with stale candidate selection.
That's it! Fairly manual, but hopefully straightforward. Let me know what you think or if you have any questions.— John-Henry Scherck (@JHTScherck) January 21, 2020
library(tidyverse) library(searchConsoleR) scr_auth()
The code below will grab 100K results for the last five full weeks of data but feel free to revise as you see fit.
df <- as_tibble(search_analytics("https://www.christopheryee.org/", Sys.Date() - 35, # START DATE Sys.Date() - 3, # END DATE c("page", "query"), searchType = "web", rowLimit = 1e5))
This is where we’ll exclude brand terms and filter only on keywords with more than 2K impressions & average position between 5 to 15.
keywords <- df %>% group_by(query) %>% summarize(impressions = sum(impressions), position = mean(position)) %>% filter(!grepl("brand_term", query)) %>% # EXCLUDE BRAND TERMS HERE arrange(dsec(impressions)) %>% filter(impressions >= 2000, position >= 5 & position < 15) %>% select(query)
Dedupe landing pages
There may be instances where a page will have multiple keywords.
We can remove duplicates here by sorting keywords with highest clicks for each page.
pages <- df %>% inner_join(keywords) %>% # JOIN OUR KEYWORDS DATASET group_by(query) %>% arrange(desc(clicks)) %>% mutate(candidate = row_number()) %>% ungroup() %>% filter(candidate == 1) %>% select(page)
Fun fact: I often use candidate = row_number() as a quick hack to filter the “top” or “bottom” criteria for a given dataset
df %>% inner_join(pages) %>% mutate(ctr = (clicks / impressions) * 100) %>% # STANDARDIZE CTR arrange(desc(page, impressions)) %>% distinct(.)
From here you can then take the keywords and move on to the understanding keyword intent phase.
- Full script can be found on GitHub
- If you enjoyed this post, you may be interested in my getting started with R guide using Google Search Console data