Let’s be honest. It’s easy to take technology for granted, ignoring the underlying nuts and bolts. The wider spread the technology, the more likely it’s taken for granted. If you asked most people how Google actually worked, they’d be tempted to say ‘magic.’ Magic, indeed.
Google’s mystique is twofold:
• The science, technology, economics, manpower, and physical hardware that go into Google are incredibly advanced. Seriously, it’s almost not credible what it takes to power Google. It’s hard to wrap the mind around how Google works and everything that must go into it.
• Search engines have been around since nearly the dawn of the internet. So, while not exactly prehistoric, people tend to equate the two, when really they’re two separate entities.
• Google isn’t talking—lately they’ve begun to communicate more, but the majority of what makes Google Google is their overwhelming stature and mystique. To allow us a glimpse behind the red curtain might just give too much away. Not to mention the amount of web manipulation possible if they released their underlying data.
Many high level, sophisticated, well-educated technology users only have the foggiest of ideas how Google actually works. This is partly due to Google’s success; they work so well, so efficiently, and with so little fuss that no one ever wonders how they’re working. Very rarely does the common person need to expend any energy thinking about Google.
To help demystify, explain, and hopefully educate you on how Google works, as well as why the SERPs display the results that they do, I refer to this video:
To sum it up in the easiest possible manner, free of all technical jargon, Google works like this:
• Google has software (typically referred to as spiders) robots that continually scan (or ‘crawl’) the web, indexing pages into giant database centers owned by Google. These are housed all across the world, with an emphasis in the United States.
• These database centers are essentially colossal warehouses full of servers, each center costing up to $600 million. They store Google’s index of the web.
• A user queries Google using a set phrase or keywords
• Google takes those keywords, adds synonyms and sends them to the nearest data center.
• The data center simultaneously sends it to hundreds of servers, scanning their indexes for keyword matches
o Fun fact: You’re not actually searching the web when you ‘Google’, you’re searching Google’s index of the web. Since they’re continually refreshing their data, this isn’t really noticeable.
• The servers all return their indexed matches, which Google then filters using their famous algorithm.
o No one really knows exactly what Google’s algorithm is made up of. However, there have been thousands of theories and a few confirmed facts. The predetermined authority of the website is one factor, the page’s relevancy to the keyword another. Google has stated that over 200 different factors make up the rank each page receives in the results page.
• Google then removes all duplicate pages, promotes a few local websites to the top, and finds ads relevant to the search terms. Final results are displayed, with ads either on the sidebar or in the first two spots, depending upon the popularity of the keyword.
All this take place in less than a second. Pretty astounding, when you consider everything that has to happen.
Now, that’s the basics behind a Google search. But let’s take a deeper, in-depth look at Google and Google’s processes.
Here’s a great infographic that greatly explains the Google process.
The first, and perhaps most important thing to understand about Google, is that Google isn’t searching the live web in response to queries. Instead, they’re scanning their own indexed information contained on their private servers.
This is what allows them to be so quick in their response time. They don’t have to go out and find new information; rather, they simply check their own indexed information.
In order to be able to do this though they essentially have to scan, index, and file every page on the internet. Seriously, they try and scan and index every page. To hold this vast and sprawling information they house giant data centers all across the US and globe. Some interesting facts about these data centers:
• Google’s data centers are extremely energy efficient, using between 50-100 megawatts of power.
• They’re often found by major sources of water, which Google uses for high tech cooling purposes.
• The data centers are up to 500,000 square feet, with a net worth up to $600 million each.
• The servers are hosted in standard shipping containers, each hold approximately 1,100 servers.
• Each data center can house 45,000+ servers.
• In 2008 it was estimated they had somewhere around 200,000 total servers. At Google’s growth rate, that could easily be in the million range by now.
o Google prefers to over-invest in hardware, assuming that hardware malfunctions, crashes, and glitches can and will happen, but shouldn’t affect productivity.
These data centers, and massive array of servers is necessary because Google has spiders continually crawling the web, updating and refreshing their indexes on a continual basis.
Back in 2000, when Google was really starting to take off, Google took 3 full months to crawl the web. That means, depending on where Google was at in their cycle, the indexes could have been months out of date. That means new, exciting, and fresh content could be hard to find.
Then Google launched an update called “Fritz”.
Update Fritz changed Google’s method of index updating. Prior to Fritz, they would take whole data centers down for a night, updating their content. This was very trackable since it affected entire regions, and would have a huge impact on their results page. It was commonly known as ‘the Google Dance’.
Update Fritz changed that, making it so that a part of each data center is taken down and updated each night. In effect, making Google is updates continually, causing an ‘ever-flux’.
A great metaphor for using Google would be using a public library’s old referencing system. If the library was the internet and the referencing system is extremely up to date, fast, and effective. You query Google’s index, and they show you a list of relevant and popular sites where you might find more information.
After querying Google, it immediately communicates with the nearest data center, involving 700-1000 servers. Really, a staggering amount of servers. These servers search their indexes, finding keyword matches, synonyms of the keywords, and making a list.
Google uses two types of indexes (that we know of):
• One index that stores page titles and link data
• One index that stores on-page content
Google searches both these indexes and compiles a list of any keyword matches.
That list is then filtered down and ranked using relevancy clues, authority, and 200 other ranking signals.
We know authority is a factor. Google uses their patented Page Rank tool to quantify authority. This authority is determined largely by links. Any website linking to another website is a so called vote of confidence, thereby granting a portion of authority to the linked website.
The idea behind this is that relevant, useful, high quality content will be linked to by other websites. Therefore, any websites receiving a high amount of links must be a great site. Obviously, there’s room for abuse here. Which explains the need for relevancy along with the 200 other factors.
Relevancy is important because Google is actually a tool. A lot of webmasters and SEO practitioners tend to forget this. That’s why here at Page One Power we always make sure to practice our slogan FTBOM (we pronounce it foot bomb), which means “For the Betterment of Mankind”. The idea is to only create the best content possible, which will better mankind in some tangible way.
So, no matter how great a site, or how much authority it has, if it isn’t relevant to the search term than the user will be dissatisfied. Relevancy is every bit as important as domain authority; perhaps even more so. One way Google has admitted to sorting for relevancy is by keyword searching. As Matt Cutts explains, keyword relevancy can be found by:
• Keyword proximity – how close are the two keywords to each other?
• Keyword frequency – although it shouldn’t be spammed, obviously the more often the keyword appears, the higher likelihood the page being relevant.
What about are those other 200 signals? Well, you can be sure Google isn’t talking. That is their so called “secret sauce”. And odds are it’s an every changing recipe.
For example, Google search quality engineer Patrick Riley said:
“On most Google queries, you’re actually in multiple control or experimental groups simultaneously… Essentially, all the queries are involved in some test.”
Google is a quickly evolving, nearly mythical company. To think of the sheer amount of traffic they receive, and their ability to not only cope with it, but to then scan thousands of databases and filter that data into a list of highly relevant, authoritative websites in less than a second is nothing short of magic. Well, magic and a whole lot of technology, science, economics, manpower and creative energy.
Hopefully you’ve learned a little more about how Google works, and can appreciate the nuances of just how Google operates.
[author] [author_image timthumb='on']http://pageonepower.com/wp-content/uploads/2012/10/photo-11.jpg[/author_image] [author_info] Cory Collins is a head writer, web content developer and team leader at Boise’s Page One Power. Collins is passionate about SEO, link building, and other white hat practices, and writes about it for Page One Power’s Link Building News and countless online publications. Connect with Collins on Twitter or Google+.[/author_info] [/author]