Google doesn't work that way, unfortunately. And tends to hide how search really works from us mere mortals to keep us guessing. But occasionally, out of the mouths of babes and innocents, comes mana from Heaven (if I may indulge in a tag-team of Biblical idioms, writing this as I am over the Easter long weekend).
A couple of days ago was just such an occasion.
Google's Ranking Factors - Where Obfuscation is the State of the Nation
What we do know, and have known for many years, is that the list of what Google uses to rank websites is changing all the time, and expanding exponentially. So much so that even those of us who work in search full-time are constantly getting whip-lash from trying to keep up with the speed of the changes flying past us.
Google's Big 3 Ranking Signals - Cutting to the Chase
To which the SEO industry as a whole went, 'Huh? RankWhat?'
Google did this, as Google are want to do, in a totally off-the-cuff manner. Sergey Brin and Larry Page didn't call a press conference. The WebSpam team didn't thaw out Matt Cutts from his cryogenic slumber to tweet the news to the world. Nor did they so much as send out a press release. Rather, they introduced the world to RankBrain via a Bloomberg Business article titled: Google Turning Its Lucrative Web Search Over to AI Machines, published online on October 26, 2015.
RankBrain - What is it and Why Should I Care?
Still Confused? Don't Be. RankBrain (or RankBrian as I just accidentally typed - now there's a parapraxis if ever there was one!) is a machine-learning artificial intelligence system used by Skynet (err, sorry, Google) to process and interpret complicated or previously misunderstood search queries into a mathematical form it can understand. These misunderstood or 'new' search queries apparently make up about 15% of all search, which is a staggering amount when you think about how many searches are done on Google every day, around the world.
Now, so far as we know, RankBrain is self-learning rather than self-aware. Which is probably for the best, given unleashing an A.I program on the internet can potentially have disastrous consequences...
You Said RankBrain Was Part of Hummingbird?
This is where the Hummingbird algorithm comes into play. Hummingbird reads 'between the lines' of the search and answers the questions, 'How do I get my small business website on the first page of Google', by musing, 'Okay, this searcher wants to rank their website on the first page of Google. Who provides that service? Oh, yes. Search Engine Optimisation companies do. So they're looking for an SEO company.' And because the searcher hasn't specified a location, Hummingbird will assume the searcher is looking for an SEO company close to where they are. And Google works that out by pinging the device the search was made on (desktop, tablet or smartphone) and going, 'Ah, you're sitting in a Gloria Jeans cafe in North Sydney searching on your iPhone 6s Plus, so you must be looking for an SEO company close to North Sydney'.
Hummingbird as it is understood by the SEO community, can now also be viewed as the over-arching name given to Google's core search technology. For years there was no over-arching name; and if anything Google search as a whole was incorrectly thought to be called PageRank. But in mid-2013 Google released Hummingbird. And at time of writing, it is the understanding of the SEO community that Hummingbird is both 'it's own algorithm' and the name given to Google search as a whole. For example, Panda, Penguin, Pigeon, Payday, EMD, Pirate, Mobile Friendly and Top Heavy are all uber important algorithms in their own right, but are all ultimately part of Hummingbird. RankBrain similarly falls into this category. It uses artificial intelligence (or A.I) to embed vast amounts of written language into mathematical entities know as 'Vectors'. These vectors allow Google to interpret words or phrases it isn't familiar with, and extrapolate data-sets. Or in layman's terms, it allows Google to work out what the hell you're asking, even if you've typed the question into Google like a drunk intern at a PR company who's just returned home after attending Fashion Week for the first time and can't spell for toffee because you've drunk 2 bottles of wine and 7 shots of Tequila.
RankBrain - The Technical View
RankBrain is Not Self-Learning After All
IIlyes went on to say: “We’ll keep experimenting with and testing new models, and we’ll make updates as we come up with models that do a better job than the existing one. That could be about refreshing the data or developing new neural net architectures.”
Thought Vectors and the Word2vec Connection
Or to simplify it even further:
- CBOW predicts the word given its context.
- Skip-Gram predicts the context given a word.
Whether RankBrain is actually using Word2vec (or a variation of the methodology) or not, is ultimately by-the-by. All we really need to know is that RankBrain is converting words and phrases into vectors, which it then uses to allow for a deeper understanding of the data it's reviewing. The implications of all this being that if Google is able to convert a typed question into vectors to better understand the inherent meaning behind the question, then Google will be able to significantly improve the search results it shows.
In linguistics, Distributional Hypothesis states that words used together in a sentence tend to infer similar meanings. This is a theory born from Statistical Semantics. The distributional hypothesis for this infers that the more semantically similar two words are, the more distributionally alike they’ll be, and the higher probability that they occupy similar linguistic contexts.
Or to simplify it even further:
- Distributional Hypothesis states that a word is more often than not characterised by the company it keeps.
Latent Semantic Indexing
Scalability in Vectors
Cliff Notes look out!
Professor Geoff Hinton and Artifical Neural Networks
(Ah...the plot thickens...)
Neural Net reporter, Jack Clark of Bloomberg (who wrote the original Bloomberg Business article) attempted to get clarification on the whole Word2ve Connection issue:
"They wouldn’t explicitly confirm that it (RankBrain) is Word2vec, but everything we discussed indicated it’s likely doing something roughly equivalent to Word2vec, and is also doing similar conversions for sequences which is likely connected to Sequence to Sequence learning (PDF: http://papers.nips.cc/paper/5346-sequence-to-sequence-learni…).
It also links to Geoff Hinton’s stuff on Thought Vectors which implicitly involves word2vec."
When pushed on the subject, a Google spokesperson said “It’s related to Word2vec in that it uses ’embeddings’ — looking at phrases in high-dimensional space to learn how they’re related to one another.” Which is about as much clarification on the subject as we could rightfully expect to get.
For those with a mind to dig deeper into Rankbrain, there's a Patent Application by Googe pending, which includes this:
Further RankBrain Reading
- Distilling the knowledge in a neural network, Hinton, G. E., Vinyals, O., and Dean, J. (PDF)
- Investigating Google RankBrain and Query Term Substituions, by the always erudite Bill Slawski.
- Deep Learning, Nature, LeCun, Y., Bengio, Y. and Hinton, G. E. (PDF)
The 1st and 2nd Most Important Ranking Factors in Google Search
This we didn't need last week's announcement to answer; because I, and many other SEO types, have known the answer to this question for years. The first most important thing (in the post Panda world) is QUALITY ORIGINAL CONTENT. The second most important thing (in the post Penguin world) is QUALITY LINKS.
You'll note I added the word 'Quality' to both those Google ranking factors? Good. Because if you think all you have to do to rank in Google is flood your website with cheap content (no doubt sourced out of Eastern Europe), and point thousands of low value links at it (no doubt sourced via an el-cheapo company in India), then you're probably the proud owner of a time machine*, and are kickin' it back in 2010 listening to Ke$ha's mega-hit, 'TiK ToK', or else queuing up with the kids to see Toy Story 3 at the cinema for the umpteenth time. I say this as nobody with half a brain in their head would run that strategy today. Because if they did, their website would have the living crap kicked out of it by both Panda and Penguin, and find itself buried on page 35 of their particular Google search. And given that 92% of people don't go past the first page of Google...good luck with getting the phone to ring from there.
So Who Announced the New Number #1 and #2 Ranking Factors?
"I can tell you what they are. It's content. And it's links pointing at your site'.
Andrey Lipattsev - The New Voice of Google Search?
And there you were thinking this Google business was complicated!