TF-IDF Explained: With Help From US Presidents

by Christopher Yee on September 25, 2017

TF-IDF, or Term Frequency-Inverse Document Frequency, has long been utilized by search engines to score and rank a document’s relevance for any given search query.  In spite of this, I think it continues to be a misunderstood or under-the-radar concept in the broader SEO world due to 1) “keyword density” being much easier to explain and 2) it’s like a word salad when you read it for the very first time.

With the help from past US Presidents and their State of the Union addresses, I’ll attempt to explain this numerical statistic in its simplest form.

The Basics

At its core, TF-IDF is used to quantify how important a word is to a document when compared with a larger collection of text.  This essentially gives less prominence to a word that has been used more frequently, and more weight to a word which has been used less across a known text corpus.  The beauty of this calculation is it efficiently removes commonly used words like “the”, “but”, “for”, etc. yet it can distill the document down to its primary lexical components.

For example, SEO of  yesteryear dictates: “if you want to rank for that keyword then you need to mention it X times on the page.”  This obviously isn’t the case but let’s run with this as a working example.

If we take the State of the Union addresses from George Washington, Abraham Lincoln, Dwight D. Eisenhower, Bill Clinton, George W. Bush & Barack Obama, we can plot out their term frequencies to get something like this:

Common words among the last three US Presidents? America, American(s), people, tax and jobs.  “World” is a high frequency term after Eisenhower presumably because US foreign policy placed more emphasis on the international realm post-WW2, instead of its reclusive pre-war status.

Read more…


Moz: Put Your Money Where Your [Diversity] Mouth Is

by Christopher Yee on August 13, 2017

I attended MozCon 2017 last month where it’s always a blast to reconnect with old colleagues and make new friends.  That being said, this isn’t going to be your typical feel good post or conference recap.  Instead, it’s going to be an observation about conference diversity, specifically MozCon.


I attended my first MozCon back in 2013 with SEOgadget (now BuiltVisible) where it was also Moz’s first time hosting the conference at the Washington State Convention Center in Seattle.  I don’t recall the exact numbers but I’d venture a guess it was anywhere between 800-1K attendees.  It didn’t feel too large where you’d get lost in the crowd but it was intimate enough.

Intimate to the point where you would be acutely aware of the fact that you’re a minority – not something I was used to as a San Francisco native.  In fact, the only other Asian I saw was Stephanie Chang who was working for Distilled at the time.  Why do I bring this up?  Because in 2013, the SEO industry was just starting to go “mainstream” so the demographic makeup naturally skewed toward the White Male, making it easier to remember other Asians.

Read more…

Extracting Links from a Page with Ruby and Nokogiri

by Christopher Yee on February 13, 2014

Scraper is a pretty good Chrome extension I use on a regular basis to quickly extract links from a page. Unfortunately, there can be rare instances where it actually takes more effort to use.

For example, if I wanted to retrieve all links from Hewlett-Packard’s HTML sitemap, I would need to create multiple Google spreadsheets to capture that data because of the way the page is structured. In this particular case, I’d have to scrape the page a total of 14 times to account for the different sections.


Read more…

A Year of Webkit2png

by Christopher Yee on January 1, 2014

When I joined SEOgadget last year, my first blog post was about using webkit2png for site audits, stalking and more.  What I didn’t mention was my 2013 new years resolution – to track the home page of three websites for the entire year with webkit2png.

The following videos come from the home pages of Macy’s, Yahoo and Amazon with a years worth of images compiled together.  It’s nothing too crazy but feel free to turn on your favorite jam, sit back, relax and view them for your pleasure.

Enjoy and have an amazing 2014!  =]

Read more…

Crayon Syntax Highlighter Themes

by Christopher Yee on October 21, 2013

If you write a technical blog post about optimizing source code for SEO or programming scripts, I highly recommend the Crayon Syntax Highlighter for WordPress users – it gives your examples a nice, snazzy look to it.  The plugin includes 25 default themes but I couldn’t find a good preview gallery for them anywhere so I decided to list them all out below.  Enjoy!


This is the "Ado" theme.

Arduino Ide

This is the "Arduino Ide" theme.

Cg Cookie


This is the "Classic" theme.


This is the "Eclipse" theme.

Read more…

Updated aHrefs Link Analysis Script

by Christopher Yee on March 18, 2013

I updated my aHrefs bulk link analysis script to improve its functionality by adding two features.

  1. The script now returns the results in a CSV file called ahrefs_results.csv
  2. Introduces the .map Ruby enumerable for a “cleaner” syntax

The source code for this Ruby script can be found at my Github repository.

My next task is defining individual functions to eliminate any code redundancy and ultimately speed up the API calls.  Stay tuned!

Joining the SEOgadget Family

by Christopher Yee on February 28, 2013

This post is super late but if you didn’t know already I left my short tenure with Macy’s earlier this month and joined the SEOgadget family!

You can read my first SEOgadget blog post here.

I’m helping out Laura Lippay expand the US office so I’ll be getting a taste of both agency and startup life.  Business is already booming and I’ve got so much work ahead we are looking to hire another Organic Search Strategist.  Yes, that’s right – I need a partner in crime!

Read more…

Analyze & Strategize SEO using Logarithmic Charts

by Christopher Yee on January 25, 2013

Just like the natural world SEO traffic adheres to a Power Law, more commonly known as the long-tail or the 80-20 rule for you MBAs.  Applying this to search, it means approximately 80% of your organic traffic is attributed to the top 20% of your keywords.

When you visualize this type of data though an inherent problem occurs…

…the number of visits for the head terms far exceed that of the torso and tail keywords, thus rendering the graph useless.  And if you wanted a YoY comparison of your SEO performance?

No actionable insight – a fallacy of linear graphs.

Read more…

Bulk Link Analysis with the aHrefs API using Ruby

by Christopher Yee on December 20, 2012

One of my top blog posts this year is the bulk URL checker and has become my staple tool for HTTP checks en masse when I don’t want to fire up third party software.  This accomplishment got me hooked (on coding) and to keep my momentum going I decided to write a Ruby script which interfaced with the aHrefs API and emulate their batch analysis tool.

If you’re only interested in the bulk link analysis script then you can find it here.

It may look simple but it took me a couple hours to complete and needs some cleaning up on my part.  This file is a good start for anyone who needs a quick analysis about a list (big or small) of URLs and their links.  Regardless, I’ll continue to build upon it so it resembles that of the aHrefs tool.

What Does It Do?

The script is built with the Ruby programming language and analyzes a list of URLs in bulk by accessing the aHrefs API then responds with the appropriate data (exact match).

The current version returns the target URL and the following link metrics to the page: total backlinks, linking root domains, unique IP addresses, .COM links, .EDU links and .GOV links.  It will also indicate the remaining API calls in your account upon completion.

Read more…

Google Zeitgeist 2012: Year in Review

by Christopher Yee on December 14, 2012

2012 was such an amazing year for me I can’t even begin to describe how grateful I am for everything that has happened. Anyways, check out the “Year in Review” video by Google below…they always get me all teary-eyed.

Let’s hope for an even better 2013. =]

If you liked this video you may also want to check out the one from last year.