By Cory Collins
25 Apr 2016

Q&A with Paul Haahr, Ranking Engineer at Google, SMX West 2016

Recently the SEO industry received a rare bit of transparency and insight from a Google staff member. Specifically a member of their ranking engineering division: a person who directly contributes to Google's algorithm.

Paul Haahr, a Software Engineer at Google for the last 14 years, gave a presentation at SMX West in March about how Google works, from his perspective as a Ranking Engineer. A few pieces of coverage:

This post will cover the Q&A Danny Sullivan led with Paul after his presentation, who was joined by Gary Iylles, a Webmaster Trends Analyst at Google. Full video below:

In the Q&A 16 questions were asked. Let's take them one at a time.

Note: I'll be paraphrasing answers based upon my own understanding. If you want to watch the full answer, I suggest you watch the video. Each question has the video embedded to the point where Danny asks the question.

1. Shards are parts of the overall index in various places?

The index is the sum of all the shards together. 

Google has a large network of very big machines, and they pick the shard sizes to fill these very big machines. It sounds like shards are Google's way to deal with the problem of scale, making the size of the index manageable.

2. How does RankBrain fit into all this?

 
  1. RankBrain is provided certain subsets of signals in the algorithm (unclear which signals).
  2. RankBrain is a machine learning (deep learning? Paul corrects himself to deep learning) system that has its own ideas on how to combine signals and understand documents.
  3. Google understands how RankBrain works (after much effort), but they don't understand what it's doing exactly. 
  4. RankBrain uses much of what Google has published about deep learning.
  5. One layer of what RankBrain is doing is word2vec and word embeddings.
  6. RankBrain initiates after the late post-retrieval phase in the life of a query. (Paul refers to this as a "box").

3. How does RankBrain know the authority of a page? How does it know the quality?

RankBrain has robust training data, meant to improve this functionality. It sees signals in addition to queries and web pages.

To me, it sounds as if RankBrain doesn't need to determine authority or quality of a page. It's fed that information already.

4. What conversion goals does Google have when testing ranking algorithm refinements? Are there consistent goals that all updates are measured against?

Google used to have a metric called "Next Page Rate" which basically measured how often people clicked into the second page of results. The concept being the first page wasn't great results. However, this could be easily gamed/manipulated.

White space, specifically, would reduce the likelihood of people clicking to page two of search.

[cut scene in the video]

Note: A similar question is asked later, at the 8:50 mark. Scroll down to question #8 to see the video and response. 

5. One of the first things you do is determine if a query contains an entity. Was that something you did five years ago?

That began the same time as the Knowledge Graph and Knowledge Panels. It's key to those processes.

It wasn't something Google was doing before they launched the Knowledge Graph in 2011.

6. If someone is logged into any Google app, do you differentiate by the information you gather? Can being in Google Now versus Google Chrome impact search?

The real question is if you're logged in or not.

If you're logged in Google brings in search personalization. Google wants to provide a consistent search experience for users, based upon your interests, as well as what's being shown in Google Now cards.

As long as you're logged in, and haven't turned off search personalization, you will have personalization in you searching experiences.

You're more likely to have search follow you across your devices than bookmarks.

7. Does Google deliver different results for the same query at different times during the day? Local maps seem to change with business hours.

Neither Paul nor Gary were sure, although both seemed to think operating hours wouldn't affect a query.

Google would make a point to show closing hours (and operating hours) if the business was closed, but neither seemed to think hours would affect whether or not a map is present in the search results. 

Simply because a business is closed (or near closing) doesn't mean the searcher isn't interested in their physical location.

8. How does Google determine positive or negative changes in experiments with human raters? Is there a winners/losers report by queries?

Google has a summary report on each experiment, how the experiment performed according to a bunch of different metrics (which vary depending upon experiment), all of which includes every query involved.

There are classifications on wins and losses. In the previously mentioned example—the fertilizer query which displayed a map—it was categorized as a win. Humans review the metrics and results. However, in this case Paul referred to himself as catching the poor results which are reported as a win.

Paul makes a point to say human raters are great by and large, but do make mistakes. Specifically, human raters get excited about certain features, even if the features don't add value.

9. What's happening with Panda and Penguin?

Paul doesn't have an answer. He does make a point to say that Panda and Penguin are both factored in the scoring and retrieval "box".

Danny redirects the focus to Gary, who is infamous for repeatedly saying in the last six months that Penguin was near launch. In fact, Gary said Penguin would launch before the new year (January 2016).

Obviously, Gary's predictions haven't panned out.

Gary reports he's given up on reporting a time when Penguin will launch. He knows engineers are specifically working on it, but after being wrong three times he's not willing to say a date or timeframe.

Paul mentions again the long iteration cycle of launching new ranking signals and algorithms.

10: You talked about a launch that took two years. Was that Penguin?

The two year launch Paul discussed was not Penguin.

The launch was a half-ranking-half-feature launch. It was their first attempt at spelling corrections which took over half the SERPs, showing results for a misspelling, instead of a "did you mean" function.

The first launched iteration of that feature required considerable rewrites (presumably to fit into the algorithm).

11: You mention the expertise of a given author. How are you identifying and tracking the author authority for topics?

Paul can't go into any detail here. However, human raters in experiments are tasked to do this manually for pages they see. Google compares their own metrics to what the human raters find, thereby validating (or invalidating) their own metrics.

12: Is author authority used as a direct or indirect ranking factor? 

There is no simple answer: Paul can't say yes or no. It's more complicated than the question implies.

13: Should we continue to bother with rel=author?

Gary says there's at least one team that continues to look at using the rel=author tag.

Gary wouldn't recommend creating the tag for new pages, but also wouldn't recommend pulling the rel=author tag from old pages. The tag doesn't hurt anything, and it may be used for something in the future.  

14: How do you avoid quality raters from having a brand familiarity bias?

Human raters, before the experiments, are asked to do research, but Paul does acknowledge they often have a bias.

Paul says there are metrics in place which are intended to counteract that bias, and that those metrics are specifically not in the quality signal.

Interestingly, Paul says offhandedly: "I haven't begun to go through all the metrics we actually look at."

The implication, then, is that there are many metrics beyond relevance and quality that are looked at within experiments.

Paul makes a point to say there are many small sites that get a quality rating, "because the raters do a thorough job. They seem to be good at figuring this out."

15: Is Click Through Rate (CTR) a ranking signal?

Paul confirms CTR is used in experiments, as well as in personalization.

The metric is challenging to use in any circumstance, though.

Gary chimes in to say even with controlled groups it's hard to correctly interpret engagement.

Paul agrees that many experiments that have been done that have misleading live metrics. The example he cites is the snippets, as well as "Next Page Rate" referred to in question #4.

Paul also cites a long-running live experiment which swapped results #2 & #4 in search results. It was randomized and only for .02% of users. The result? Many more people clicked on the #1 result. Paul explains this:

"They see #1--they don't know whether they like it or not--they look at two, which is really much worse than #2 was, they give up because the result that should have been at #4 and was actually at #2 was so bad they click at #1."

— Paul Haahr at SMX West 2016, explaining a Google live experiment leading to unconventional click metrics.

Another interesting bias Paul cites is that position #10 gets "way more clicks" than positions #8 & #9 together. Why? Because it's the last result before the next page, and no one wants to click to the next page.

Even still, #10 performs worse than position #7.

The point of all this? CTR is an extremely hard signal to use, often the result of odd biases and unpredictable human behavior.

16: What are you reading right now?

Paul reads "a lot of journalism and very few books." He also listens to a lot of audio books on his commute between San Francisco and Mountain View.

Books Paul mentions:

...and that's a wrap!

Questions? Comments? Thoughts? Leave them below!

Cory Collins

Cory Collins is the Business Development Manager at Page One Power and has been with the agency since 2012. Cory is an SEO strategist, writer, runner, and outdoor enthusiast residing in Boise, Idaho, with his wife, daughter, and (too) many pets.