Getting into Google

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (690.26 KB, 16 trang )

Chapter 2
Getting into Google
In This Chapter
ᮣ
Knowing the three steps to visibility in Google
ᮣ
Meeting Google’s pet spider and understanding how the crawl works
ᮣ
Keeping Google out of your site
T
his chapter is about getting your site to appear on Google search pages.
I’m not talking about the Google Directory, submission to which is a
simple matter also covered here. The challenge is to appear in search results
based on keywords related to your site. Chapters 3 and 4 focus on becoming
more prominently placed on those search results pages; this chapter is more
elementary but no less crucial for new sites.
The Three-Step Process
Many of the suggestions, tactics, and concepts discussed in this chapter and
Chapter 3 and 4 apply to both getting into Google (the first step) and improv-
ing a site’s status in Google (an ongoing project). Understanding the Google
crawl (this chapter), networking your site (Chapter 3), and site optimization
(Chapter 4) are important topics for newcomers and veterans alike. There’s
no proper order in which to tackle these subjects — they are presented here
in a certain order, but the topics in these three chapters add up to a single
process that maximizes your site’s exposure in Google.
Here is a summary of the ground covered in these three chapters:
ߜ Getting into Google (Chapter 2). Understand how the Google spider
crawls the Web and what the spider looks at. Judge whether to submit a
new page manually to the index or let the spider find it. Find out how to
keep material out of Google.
ߜ Networking your site (Chapter 3). Develop a matrix of incoming links,

which is crucial for building a higher status in Google and effective for
getting into the index at the start.
06_571435 ch02.qxd 5/21/04 11:27 PM Page 21
ߜ Optimizing your site for Google (Chapter 4). Create content, optimize
your page’s
meta
tags, and introduce keywords as the fundamental
building blocks of a highly ranked site. These are golden topics for the
serious Webmaster at all stages of business development, from concep-
tion to customer interaction.
First things first. New sites must get into Google and then work to raise their
profiles. Getting into Google really means getting into the Google index, which
is a database of Web content. Google builds the index by crawling through
the Web collecting pages. When a user searches for keywords, Google doesn’t
actually search the Web — it searches its index.
If your site already appears in Google search results, you might feel tempted
to skip this chapter and head straight for Chapter 3. However, the next two
sections contain useful information about Google’s behavior and ways for
both new and existing sites to leverage its quirks.
Meet Google’s Pet Spider
All search engines operate in the same basic way: they crawl the Web with
automatic software robots called spiders or crawlers, which create searchable
indexes of Web content. Every engine allows visitors to search its index for
keywords and groups of keywords. Search results come in a variety of list for-
mats, but most display a bit of information about each Web page in the list
and a link to that page.
Each engine’s index is unique, thanks to the programming of its spider. The
main element of that programming is the engine’s algorithm, which ranks
pages in an index. This ranking determines the order in which search results
are presented.

Google’s central technology asset is its algorithm — the complex ranking for-
mula that gives people good search results and often seems to be reading
people’s minds when they Google something. The results of Google’s algo-
rithm are summarized in a single ranking statistic called PageRank. Google is
secretive about the software formula from which PageRank is derived, but
the company does promote the importance of PageRank, and offers
Webmasters broad hints for improving a site’s PageRank. Google displays a
general approximation of any page’s rank (on a 0-to-10 scale) in the Google
Toolbar, which is shown in Figure 2-1. Although the exact formulation of
PageRank is a well-protected secret, its basic ingredients are well-known (and
discussed in Chapter 3).
22
Part I: Meeting the Other Side of Google
06_571435 ch02.qxd 5/21/04 11:27 PM Page 22
The Google PageRank is like a carrot dangled before the ambitious gaze of
Webmasters, who devote considerable energy to inching their pages up to a
higher PageRank, thereby moving them up the search results list. Chapter 3 is
devoted to improving your site’s ranking and position on search results pages.
Figure 2-1:
The Google
Toolbar
affords
a rough
glimpse of
any site’s
PageRank,
on a scale
of 0 to 10.
23
Chapter 2: Getting into Google

Search engine integrity
One reason pre-Google search engines declined
in usefulness and popularity as Web-content
portals was the emergence of paid listings.
Hungry for revenue, some engines sold positions
on the search results page to advertisers. This
dilution of objectivity polluted search results and
undermined the essential democracy of the
Web. The distinction blurred between search
engines, which supposedly located what you
wanted, and browser channels, which sent you
to the browser’s business affiliates. Even though
many search engines did not accept paid place-
ment, distrust grew among users.
Google started a renaissance of utility and trust.
Google’s integrity is symbolized by its gunk-free
home page, the spartan design of which lures
the user with the promise of search, and nothing
but search. To be sure, Google accepts adver-
tising, and Parts II and III of this book are all
about Google ads. But Google’s paid content is
clearly separated from search listings. Not
everyone agrees with the ranking of search
results in Google, but nobody thinks that a high
rank can be bought.
06_571435 ch02.qxd 5/21/04 11:27 PM Page 23
Timing Google’s crawl
Google crawls the Web at varying depths and on more than one schedule. The
so-called deep crawl occurs roughly once a month. This extensive reconnais-
sance of Web content requires more than a week to complete and an undis-

closed length of time after completion to build the results into the index. For
this reason, it can take up to six weeks for a new page to appear in Google.
Brand new sites at new domain addresses that have never been crawled before
might not even be indexed at first, depending on considerations explained later
in this chapter.
If Google relied entirely on the deep crawl, its index would quickly become
outdated in the rapidly shifting Web. To stay current, Google launches various
supplemental fresh crawls that skim the Web more shallowly and frequently
than the deep crawl. These supplementary spiders do not update the entire
index, but they freshen it by updating the content of some sites. Google does
not divulge its fresh-crawling schedules or targets, but Webmasters can get an
indication of the crawl’s frequency through sharp observance.
Google has no obligation to touch any particular URL with a fresh crawl. Sites
can increase their chance of being crawled often, however, by changing their
content and adding pages frequently. Remember the shallowness aspect of
the fresh crawl; Google might dip into the home page of your site (the front
page, or index page) but not dive into a deep exploration of the site’s inner
pages. (More than once I’ve observed a new index page of my site in Google
within a day of my updating it, while a new inner page added at the same time
was missing.) But Google’s spider can compare previous crawl results with
the current crawl, and if it learns from the top navigation page that new con-
tent is added regularly, it might start crawling the entire site during its fre-
quent visits.
The deep crawl is more automatic and mindlessly thorough than the fresh
crawl. Chances are good that in a deep crawl cycle, any URL already in the
main index will be reassessed down to its last page. However, Google does
not necessarily include every page of a site. As usual, the reasons and formu-
las involved in excluding certain pages are not divulged. The main fact to
remember is that Google applies PageRank considerations to every single
page, not just to domains and top pages. If a specific page is important to you

and is not appearing in Google search results, your task is to apply every net-
working and optimization tactic described in Chapter 3 to that page. You may
also manually submit that specific page to Google (see the next section).
The terms deep crawl and fresh crawl are widely used in the online marketing
community to distinguish between the thorough spidering of the Web that
Google launches approximately monthly and various intermediate crawls run
at Google’s discretion. Google itself acknowledges both levels of spider activ-
ity, but is secretive about exact schedules, crawl depths, and formulas by
24
Part I: Meeting the Other Side of Google
06_571435 ch02.qxd 5/21/04 11:27 PM Page 24
which the company chooses crawl targets. To a large extent, targets are
determined by automatic processes built into the spider’s programming, but
humans at Google also direct the spider to specific destinations for various
reasons, some of which are discussed in this chapter.
Earlier, I said that the Google index remains static between crawls. Technically,
that’s true. Google matches keywords against the index, not against live Web
content, so any pages put online (or modified) between visits from Google’s
spider remain excluded from (or out of date in) the search results until they are
crawled again. But two factors work against the index remaining unchanged for
long. First, the frequency of fresh crawls keeps the index evolving in a state
that Google-watchers call everflux. Second, some time is required to put crawl
results into the index on Google’s thousands of servers. The irregular heaving
and churning of the index that results from these two factors is called the
Google dance.
To submit or not to submit
You can get your site into the Google index in two simple ways:
ߜ Submit the site manually
ߜ Let the crawl find it
Neither method offers a guarantee. Google accepts URL submissions, but it

doesn’t respond to them nor assure Webmasters that their submissions will
be added to the index. When Google decides to manually add a site, it does
so by sending the spider crawling to the submitted URL to take stock of the
site’s various pages. Characteristically, Google doesn’t inform the Webmaster
that the site has been accepted, and it doesn’t provide a schedule for crawl-
ing accepted sites.
25
Chapter 2: Getting into Google
Google’s hands-off operation
Google is a reasonably communicative com-
pany in certain departments, such as AdWords,
AdSense, and enterprise solutions. And Google
accepts URL submissions for the index, though
it doesn’t acknowledge them. But asking
humans at Google to interfere with the con-
struction of its index is an exercise in futility.
Google builds its index through robotic interac-
tion, for the most part, and prides itself on these
sophisticated automated processes. Google
does not correct a Webmaster’s outdated list-
ings or make any custom change to the index.
The company counts on time and thorough
crawling to solve problems. Google doesn’t
want to hear from you about your index issues.
06_571435 ch02.qxd 5/21/04 11:27 PM Page 25
The key to attracting Google’s spider is getting your page linked on other
sites. Google finds your content by following links to your pages. With no
incoming links (also called backlinks), you are an unreachable island as far as
the Google crawl is concerned. This isolated condition is the natural state of
any new site. Of course, anybody can reach you directly by entering the URL,

but you won’t pluck the spider’s web until you get some other sites to link to
you. See Chapter 3 for a detailed tutorial in creating a backlink network.
Submitting a site might not be a ticket to instant success, but at least it’s
easy. Enter your submitted URL at this address:
www.google.com/addurl.html
Fill in the form (see Figure 2-2) and click the Add URL button, keeping in mind
that the button is misnamed. You are not adding the URL, you are submitting
it. Only the spider can add your site, and only a Google human can tell it to.
If you add a page to a URL already in the Google index, there’s no need to
submit the new page. Under most circumstances, Google will find the new
page the next time your site is crawled in its entirety.
Figure 2-2:
Submitting a
URL to
Google
could hardly
be easier,
but don’t
expect
acknowl-
edgment or
guaranteed
results.
26
Part I: Meeting the Other Side of Google
06_571435 ch02.qxd 5/21/04 11:27 PM Page 26

Getting into Google

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về