Most people have no idea how Google works.
It’s easy to take technology for granted, ignoring the underlying nuts and bolts. The wider spread the technology, the more likely it’s taken for granted. If you asked most people how Google actually worked, they’d be tempted to say ‘magic.’
Google’s mystique is twofold:
- The science, technology, economics, manpower, and physical hardware that go into Google are incredibly advanced. Seriously, it’s almost not credible what it takes to power Google. It’s hard to wrap the mind around how Google works and everything that must go into it.
- Search engines have been around since nearly the dawn of the internet. So, while not exactly prehistoric, people tend to equate the two, when really they’re two separate entities.
- Google isn’t talking—lately they’ve begun to communicate more, but the majority of what makes Google Google is their overwhelming stature and mystique. To allow us a glimpse behind the red curtain might just give too much away. Not to mention the amount of web manipulation possible if they released their underlying data.
Many high level, sophisticated, well-educated technology users only have the foggiest of ideas how Google actually works. This is partly due to Google’s success; they work so well, so efficiently, and with so little fuss that no one ever wonders how they’re working. Very rarely does the common person need to expend any energy thinking about Google.
The Nuts And Bolts
To help demystify, explain, and hopefully educate you on how Google works, as well as why the SERPs display the results that they do, Google made this video:
To sum it up, free of all technical jargon:
- Google has software (typically referred to as spiders) robots that continually scan (or ‘crawl’) the web, indexing pages into giant database centers owned by Google. These are housed all across the world, with an emphasis in the United States.
- These database centers are essentially colossal warehouses full of servers, each center costing up to $600 million. They store Google’s index of the web.
- A user queries Google using a set phrase or keywords
- Google takes those keywords, adds synonyms and sends them to the nearest data center.
- The data center simultaneously sends it to hundreds of servers, scanning their indexes for keyword matches
- Fun fact: You’re not actually searching the web when you ‘Google’, you’re searching Google’s index of the web. Since they’re continually refreshing their data, this isn’t really noticeable.
- The servers all return their indexed matches, which Google then filters using their famous algorithm.
- No one really knows what exactly goes into Google’s algorithm, although we do know the algorithm was originally based upon links.
- Google then removes all duplicate pages, promotes a few local websites to the top, and finds ads relevant to the search terms. Final results are displayed, with ads either on the sidebar or in the first two spots, depending upon the popularity of the keyword.
All this take place in less than a second. Pretty astounding, when you consider everything that has to happen.
Now, that’s the basics behind a Google search. But let’s take a deeper, in-depth look at Google and Google’s processes.
Google, Data Centers and Indexes
The first, and perhaps most important thing to understand about Google, is that Google isn’t searching the live web in response to queries. Instead, they’re scanning their own indexed information contained on their private servers.
This is what allows them to be so quick in their response time. They don’t have to go out and find new information; rather, they simply check their own indexed information.
In order to be able to do this though they essentially have to scan, index, and file every page on the internet. Seriously, they try and scan and index every page. To hold this vast and sprawling information they house giant data centers all across the US and globe. Some interesting facts about these data centers:
- Google’s data centers are extremely energy efficient, using between 50-100 megawatts of power.
- They’re often found by major sources of water, which Google uses for high tech cooling purposes.
- The data centers are up to 500,000 square feet, with a net worth up to $600 million each.
- The servers are hosted in standard shipping containers, each hold approximately 1,100 servers.
- Each data center can house 45,000+ servers.
- In 2008 it was estimated they had somewhere around 200,000 total servers. At Google’s growth rate, that could easily be in the million range by now.
- Google prefers to over-invest in hardware, assuming that hardware malfunctions, crashes, and glitches can and will happen, but shouldn’t affect productivity.
Back in early 2000, when Google was really starting to take off, Google took 3 full months to crawl the web. That means, depending on where Google was at in their cycle, the indexes could have been months out of date. That means new, exciting, and fresh content could be hard to find.
Then Google launched an update called “Fritz”.
Update Fritz changed Google’s method of index updating. Prior to Fritz, they would take whole data centers down for a night, updating their content. This was very trackable since it affected entire regions, and would have a huge impact on their results page. It was commonly known as ‘the Google Dance’.
Update Fritz changed that, making it so that a part of each data center is taken down and updated each night. In effect, making Google is updates continually, causing an ‘ever-flux’.
A great metaphor for using Google would be using a public library’s old referencing system. If the library was the internet and the referencing system is extremely up to date, fast, and effective. You query Google’s index, and they show you a list of relevant and popular sites where you might find more information.
After querying Google, the nearest data center is contacted, involving 700-1000 servers. Really, a staggering amount of servers. These servers search their indexes, finding keyword matches, synonyms of the keywords, and making a list.
Google uses two types of indexes (that we know of):
- One index that stores page titles and link data
- One index that stores on-page content
Google searches both these indexes and compiles a list of any keyword matches.
That list is then filtered down and ranked using relevancy clues, authority, and 200 other ranking signals.
We know authority is a factor. Google uses their patented Page Rank tool to quantify authority. This authority is determined largely by links. Any website linking to another website is a so called vote of confidence, thereby granting a portion of authority to the linked website.
The idea behind this is that relevant, useful, high quality content will be linked to by other websites. Therefore, any websites receiving a high amount of links must be a great site. Obviously, there’s room for abuse here. Which explains the need for relevancy along with the 200 other factors.
Relevancy is important because Google is actually a tool. A lot of webmasters and SEO practitioners tend to forget this. That’s why here at Page One Power we always make sure to practice our slogan FTBOM (we pronounce it foot bomb), which means “For the Betterment of Mankind”. The idea is to only create the best content possible, which will better mankind in some tangible way.
So, no matter how great a site, or how much authority it has, if it isn’t relevant to the search term than the user will be dissatisfied. Relevancy is every bit as important as domain authority; perhaps even more so. One way Google has admitted to sorting for relevancy is by keyword searching. As Matt Cutts explains, keyword relevancy can be found by:
- Keyword proximity – how close are the two keywords to each other?
- Keyword frequency – although it shouldn’t be spammed, obviously the more often the keyword appears, the higher likelihood the page being relevant.
What about are those other 200 signals? Well, you can be sure Google isn’t talking. That is their so called “secret sauce”. And odds are it’s an ever changing recipe.
For example, Google search quality engineer Patrick Riley said:
“On most Google queries, you’re actually in multiple control or experimental groups simultaneously… Essentially, all the queries are involved in some test.”
Google is a quickly evolving, nearly mythical company. To think of the sheer amount of traffic they receive, and their ability to not only cope with it, but to then scan thousands of databases and filter that data into a list of highly relevant, authoritative websites in less than a second is nothing short of magic. Well, magic and a whole lot of technology, science, economics, manpower and creative energy.