
Good Google - Writing For The Most
Powerful Robot In The World
by Jim Hedger
Google "...is big. Really Big.
You just won't believe how vastly
hugely mind-bogglingly big it is."
(excerpt from The Hitchhiker's Guide
to the Galaxy)
Google is the most powerful information
resource humans have ever constructed.
The power of any major search tool
boggles the mind but considering the
vastness of Google's complex simplicity
can truly hurt one's brain. With over
8-billion references in its rapidly
growing, organically generated index,
Google sets the standards other search
engines follow. Benefiting from a three
year reign as the undisputed leader of
search, Google has had a very good year
and looks poised to make 2005 an even
better year.
In 2004, Google introduced more new and
improved applications for its users than
any other tech company, posted one of
the most successful IPO's in business
history in a most unorthodox
Dutch-Auction format, and met or
exceeded any challenges its rivals threw
at.
Google is no longer just a search
engine, it is an advertising machine.
Drawing about 90% of its revenues from
paid advertising and contextual
ad-delivery, Google has had two major
focuses this quarter. The first is
increasing the number of places
paid-advertising might show up. The
second is to develop new products and
features that will retain current user
loyalty and win new users from the other
search firms. Both initiatives rely
heavily on Google's reputation for
delivering fast, free and relevant
search results. Google has the world's
largest database of indexed websites and
it acquires site information through its
spider GoogleBot.
GoogleBot is probably the most
well-known spider working the web today.
It is also likely among the most
analyzed applications ever written. On
one level, GoogleBot is quite simple and
can be depended on to act in a very
specific manner. GoogleBot lives to
follow links. GoogleBot will often chase
down a link-path until it can no longer
work its way deeper into a site. It will
also work its way through any site
linked to from any other site. Google
finds the majority of new sites in its
index by following links from
established sites. If a link exists,
Google will (A) find it, (B) follow it,
(C ), record every bit of information it
can possibly record, and (D) weigh that
information against a fairly rigid
algorithm to determine the perceived
topic or theme of a site for future
reference. If a site in Google's index
is modified or changes, Google will
re-spider the site as quickly as it
possibly can.
GoogleBot's mission is to create a
snap-shot of the World Wide Web and
store it across Google's network of data
centers around the world. When you
reference information from Google, the
results you see reflect Google's most
recent snap-shot of the web. Parts of
that snap-shot might be hours or even
weeks old but overall the index is
updating itself every minute of every
day, 24/7. The fastest way to see
exactly what Google views as the most
recent version of your site is to click
on the "Cached" link generally below the
main link-reference Google displays for
your site.
How GoogleBot behaves as it acquires
sites is one thing. What Google does
with the information its bot gathers is
another thing. Google's method of
ranking websites is extremely (and
increasingly) complex. To understand how
Google works today, a brief (and over
simplified) explanation of the principle
of PageRank is in order.
Google was originally developed as a
means of finding information in research
documents at Stanford University where
its inventors Larry Page and Sergey Brin
met as grad students. PageRank was
developed as the basic sorting algorithm
for their search tool (then known as
Backrub) and was based on a very simple
concept, trust.
Page and Brin understood that documents
on the Internet could be linked
together. They speculated that if
someone took the time to code a link (by
hand in those days) to another document
there was likely a relevance between the
two documents. Why else would one
researcher link to another researcher's
work? Simply put, the more incoming
links a particular document has, the
better it would rank when sorted by
PageRank. Given the environment in which
it was developed, Google's genesis
proved to be the perfect tool for
intelligent users. Tranfering that
simplicity from a dorm room at Stanford
to practically every living room and
office space on Earth has been a great
challenge for Google's engineers. While
it is still somewhat based on the
original, "democratic" nature of
PageRank, Google's sorting algorithm has
become infinitely more complicated.
Google continues to weigh the number of
links directed towards a site as
positive indicators that there is
relevant information to be found there.
Since links are the veins and arteries
of the web, links continue to be the
most important factor influencing
Google's perception of the relevance of
a website. As the Google index has grown
so rapidly over the past six years, and
search engine marketers have learned how
to use Google's behaviours to influence
rankings, Google weighs several other
factors when considering the relevance
of a site but the core of the algorithm
remains rooted in PageRank.
Not all Links
are Created Equal
Back in the good old days, seven or
eight years ago at Stanford, one link
could represent one positive vote. As
marketers learned to manipulate links,
Google learned to apply different
standards and measures when looking at
those links and the content of sites in
its index. Today, Google considers
different links in different ways. As a
matter of interest, our recent studies
show that Google displays less
back-links for sites than any other
search engine, leading us to conclude
that Google has become much stricter
about how it views and values incoming
links.
Google looks at a number of factors when
determining the value of a link. Where
the link originates from is as important
as where the link is directed in
Google's eyes. Google, like its rivals,
is trying to find relationships between
documents aside from obvious keywords.
Google has the ability to fundamentally
understand documents in its index and
determine the topic, theme or context of
those documents. This is an important
measure as Google is becoming
increasingly strict about
link-relevance. To receive a highly
positive response from Google, the pages
or sites linked together must somehow
relate to each other in topic as well as
by sharing similar keywords. An
excellent example would be in regional
tourism.
A local tourism bureau will almost
certainly have a website. That site will
link to the sites of member-clients in
its region. Each of those sites
represent businesses dependent on
regional tourism, thus establishing
relevance between the sites. The tourism
bureau becomes the "hub" from which
Google follows links to other, topically
related websites. In this way, the Hub
site becomes a highly positive
link-reference in Google's eyes.
The very best links, in Google's eyes,
come from "authority sites". An
authority site is one that is very well
established and respected such as
mainstream news sites (CNN, TIME,
NYTimes, etc...) other search
directories, industrial leaders
(Macromedia, HP, Pitney Bowes, Nike,
etc...), and other highly credible
sources such as the regional tourism
bureau mentioned above. While a website
doesn't necessarily have to represent a
large corporation to be considered an
authority site, the sheer number of
pages and references, combined with high
visitor numbers generally associated
with large corporate sites helps. Some
personal Blogs, smaller companies and
alternative news sources/blogs have also
enjoyed "authority" status. This status
is, in some ways, flexible and
situational. A link from the tourism
bureau mentioned above will not tend to
help a business outside of its region
unless a tangible relevancy factor is
somehow introduced.
In practical terms, the "authority"
status of a website is irrelevant for
SEOs as the vast majority of sites in
Google's index are just regular, run of
the mill websites run by regular, run of
the mill folks like us. Small
businesses, researchers, governments,
NGOs, musicians, artists, families,
hobbyists and others write websites to
offer the world access to their
information. 99.999999% of these sites
contain links of some sort or another
and the vast majority of those links
lead to topically relevant documents.
While not "authority" sites, Google
still considers these links extremely
important when sorting and ranking
sites. Again, the stress is on topical
relevancy as Google places enormous
value in good, solid links.
Google does not
live on links alone
Much as been written in this article and
thousands of others about Google and
links. If links were the only factor
Google looks at, the SEO business would
not exist and Google's index would be as
off-kilter as a Batman set. As stated in
previous paragraphs, Google has the
ability to read sites and understand
what it is reading. Google is able to
reference a world of information when
figuring out the context of text used in
Titles, Meta Tags, Body Text and Anchor
Links. Since we know that Google is
actually reading and comprehending
content, we need to place specific
content in places we know GoogleBot
likes to look for it. Writing and
placing this information is where SEO
becomes an artful science that stems
from simple common sense. Think about
what Google knows about your website
before it even visits.
It finds your site by following links.
Therefore it "assumes" your site is
topically relevant to the site it
acquired the link to your site from.
Google knows the address of the site,
the URL. It also knows what anchor text
the original linking site used when
phrasing the link to your website.
Keyword enrichment of both elements is
beneficial with Google. In other words,
if you can, use a target keyword phrase
in the URL of your site, and request
that others linking to your site use
your target keyword phrases as the
anchor text of links directed to your
site.
Once Google hits your site, it learns a
lot more very quickly. It sees the
title, tags, text and links, and records
these elements as it moves through the
site. These are the basic elements SEOs
examine and modify when working on your
site.
The first thing GoogleBot sees is the
title of the site. Keyword enriched
titles are very useful but webmasters
are cautioned to be very conservative in
the number of keywords or phrases they
place in the title of a page. We
generally use two or three keyword
phrases when writing titles. Page titles
should be page specific with keywords
focused on the topic of the page. The
second (or third) keyword set in the
title is used to provide an overall
context to the site. For example,
Remember, links provide the pathway for
GoogleBot and other spiders. A final
element that should be included on all
pages is a text-based sitemap that links
to all pages in the site and is linked
to from the Home or INDEX page.
In a nutshell, that's how GoogleBot
examines a site. Here is a quick rundown
of which elements GoogleBot is looking
for:
●
Relevant Incoming Links
●
Good URLs that are not too spammy
●
Easy to follow link paths including a
sitemap
●
Keyword enriched titles
●
Well written Description Meta Tag
●
Well written Keywords Meta Tag (less
important than Description)
●
Topically focused Body Text
●
Keyword Enriched Anchor Text
●
NO SPAM
●
Relevance,
relevance, relevance
==================
About the author
Jim Hedger is a
writer, speaker and search engine
marketing expert based in Victoria
BC. Jim works with a limited group
of clients and provides consultancy
services to
StepForth Search Engine Placement.
He has worked as an SEO for over 5
years and welcomes the opportunity
to share his experience through
interviews, articles and speaking
engagements. Hedger can be reached
at jim.hedger@gmail.com.