Tìm Hiểu về Wordpress - part 31 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1012.71 KB, 10 trang )

8.1.1 SEO Strengths and Weaknesses
Out of the box, WordPress provides great ﬂexibility in terms of organizing and
managing your blog’s content. Much of this ﬂexibility comes by way of WordPress’
category and tag architecture. Each and every post created on your blog may be
assigned to any number of both categories and tags.
Categories are meant to classify content according to broad deﬁnitions, while
tags are used to classify content more speciﬁcally. For example, a post about your
favorite movies may be categorized in the “Favorite Movies” category, while being
tagged for some of the movies featured in the article: “Star Wars,” “The Matrix,”
and “Blade Runner.”
Beyond this central organizing principle, WordPress brings with it many strengths
and weaknesses in terms of how content is organized and made available to both
users and the search engines. Let’s examine some of these SEO factors before
digging into the ﬁne art of optimizing your WordPress-powered site for the
search engines.
8.1.2 Strong Focus on Content
Content, as they say, is king. The Web exists because of it. Users are searching for
it. Search engines are built on it. In order to succeed on the Web, your site should
be focused on delivering useful content above all else. Awesomely, one of the main
goals of WordPress is to make publishing content as easy as possible.
287
Search Engine Optimization
8
288
Once WordPress is set up, getting your content online happens as fast as you can
create it. On the front end, there are hundreds of top-quality themes available,
each focused on organizing and presenting your content with both users and
search engines in mind.
8.1.3 Built-In “nofollow” Comment Links
Perhaps not as useful as originally conceived, nofollow attributes placed on
commentator links have long been thought of as an effective method of improving

the SEO-quality of WordPress-powered sites. For those of you who may be
unfamiliar with the whole “nofollow” thing, for now sufﬁce it to say that nofollow
attributes are placed on links to prevent search engines from following those links
to their targets. Originally, this was intended to serve as a way to conserve valuable
page rank, but after it was revealed that this method no longer works, nofollow
commentator links may be a moot point. We’ll discuss this more in-depth later on
in the chapter.
8.1.4 Duplicate Content Issues
While the organizational strengths of WordPress are great for managing content, it
also comes with a price: duplicate content. Duplicate content is essentially identical
content appearing in more than one place on the Web. Search engines such as
Google are reported to penalize pages or sites associated with too much duplicate
content. Returning to our movie example for a moment, our WordPress-powered
site may suffer in the search rankings because identical copies of our movie article
are appearing at each of the following URLs:
• original article -> />• category view -> />• star-wars tag view -> />• the-matrix tag view -> />• blade-runner tag view -> />WordPress + nofollow
Check out Chapter 7.7.3 for
more information on nofollow,
WordPress, and the search
engines.
289
Yikes! And if that weren’t bad enough, we also see the exact same post content
appearing at these URLs:
• daily archive view -> />• monthly archive view -> />• yearly archive view -> />• author archive view -> />Depending on your particular WordPress theme, this situation could be even worse.
By default, all of your posts are available in identical form at each of the previous
types of URLs. Deﬁnitely not good from a search-engine point of view. Especially if
you are the type of blogger to make heavy use of tags, the number of duplicated
posts could be staggering.
8.2.1 Controlling Duplicate Content
Fortunately, WordPress’ poor handling of duplicate content is easily ﬁxed. In fact,

there are several ways of doing so. In a nutshell, we have plenty of tools and
techniques at our disposal for winning the war on duplicate content:
• meta nofollow tags
• meta noindex tags
• nofollow attributes
• robots directives
• canonical meta tags
• use excerpts for posts
So what do each of these sculpting tools accomplish and how do they help us
eliminate duplicate content? Let’s take a look at each of them.
290
8.2.2 Meta noindex and nofollow Tags
Meta nofollow tags are actually inline link elements located in the <head> section
of your WordPress pages. For example, in your blog’s “header.php” ﬁle, you may
ﬁnd something like this:
<meta name="googlebot" content="index,archive,follow" />
<meta name="msnbot" content="all,index,follow" />
<meta name="robots" content="all,index,follow" />
This code tells search engines – speciﬁcally, Google, MSN/Bing, and any other
compliant search engine – that the entire page should be indexed, followed, and
archived. This is great for single post pages (i.e., the actual “My Favorite Movies”
article posted in our example), but we can use different parameters within these
elements to tell the search engines not to index, follow, or archive our web pages.
Ideally, most bloggers want their main article to appear in the search results. The
duplicate content appearing on the other types of pages may be controlled with
this set of meta tags:
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
Here, we are telling the search engines not to include the page in the search

engine results, while at the same time, we are telling them that it is okay to
crawl the page and follow the links included on the page. This prevents the page
from appearing as duplicate content while allowing link equity to be distributed
throughout the linked pages. Incidentally, we may also tell the search engines to
neither index nor follow anything on the page by changing our code to this:
<meta name="googlebot" content="noindex,noarchive,nofollow" />
Love Juice
Anyone dipping into the murky
waters of SEO will inevitably
discover that there are many
ways to refer to the SEO value
of web pages. PR, Page rank,
link equity, link juice, page
juice, link love, love juice, rank
juice, and just about any other
combination of these terms
is known to refer to the same
thing: the success of a web
page in the search engines.
291
<meta name="robots" content="noindex,nofollow" />
<meta name="msnbot" content="noindex,nofollow" />
So, given these meta tags, what is the best way to use this method to control
duplicate content on your WordPress-powered site? We’re glad you asked. By using
a strategic set of conditional tags in the “header.php” ﬁle for your theme, it is
possible to address search-engine behavior for virtually all types of pages, thereby
enabling you to ﬁne-tune the indexing and crawling of your site’s content. To see
how this is done, consider the following code:
<?php if(is_home() && (!$paged || $paged == 1) || is_single()) { ?>
<meta name="googlebot" content="index,archive,follow,noodp" />

While this is a step in the right direction, there may be a duplicate content issue resulting from the fact that
your post content will appear on every page of your paginated comments. To resolve this issue, place the
following code into your functions.php ﬁle:
// prevent duplicate content for comments
function noDuplicateContentforComments() {
global $cpage, $post;
if($cpage > 1) {
echo "\n".'<link rel="canonical" href="'.get_permalink($post->ID).'" />'."\n";
}
}
add_action('wp_head', 'noDuplicateContentforComments');
This code will generate canonical <head> links for your all of your paginated comments. The search engines
will then use this information to ensure that the original post permalink is attributed as the actual article.
Further Information
For more information and techniques on paged comments and
duplicate content, check out Chapter 7.3.3.
293
So, if you would like to include both tag and category pages
in the search results, you would simply modify the ﬁrst line
of our previous example like so:
<?php if(is_home() && (!$paged || $paged == 1) || is_
category() || is_tag() || is_single()) { ?>
8.2.3 Nofollow Attributes
Another useful tool in the ﬁght against duplicate WordPress
content is the controversial “nofollow” attribute. The
nofollow attribute is placed into hyperlinks like this:
<a href="
rel="nofollow">This is a "nofollow" hyperlink</a>
Links containing the nofollow attribute will not be
“followed” by Google, but may still be indexed in the search

results if linked to from another source. Because such links
are not followed, use of the nofollow attribute is an effective
tool in the reduction and prevention of duplicate content.
For an example of how nofollow can be used to help
eliminate duplicate content, let’s look at a typical author
archive. In the author-archive page view, you will ﬁnd
exact replicas of your original posts (unless you are using
excerpts). This duplicate content is highly avoidable by
simply “nofollowing” any links in your theme that point to
the author-archive page view. Here is how the nofollow link
would appear in your theme ﬁles:
<a href="
rel="nofollow">This link will not be followed to the
author archive</a>
Exclude Admin Pages
from Search Engines
You may also replace or add other types
of pages to your meta-tag strategy by
using any of the following template tags:
• is_home() • is_page()
• is_admin() • is_author()
• is_date() • is_search()
• is_404() • is_paged()
• is_category() • is_tag()
• is_date()
To target any date-based archive page
(i.e. a monthly, yearly, daily or time-based
archive) that is being displayed, use this:
• is_archive()
Remember, there are more than just

date-based Archives. Other types of
Archive pages include sequential displays
of category, tag, and author pages.
is_archive() will target all of these page
types.
And of course there are many more types
of these conditional tags available to
WordPress. See the WordPress Codex for
more information: />294
Ever wanted to keep a few speciﬁc pages out of the
search engines? Here's how to do it using WordPress’
excellent conditional tag functionality.
Just place your choice of these snippets into the <head>
section of your header.php ﬁle and all compliant search
engines (e.g., Google, MSN/Bing, Yahoo!, et al) will
avoid the speciﬁed page(s) like the plague.
This menu of snippets provides many speciﬁc-case
scenarios that may be easily modiﬁed to suit your needs.
Exclude a speciﬁc post
<?php if(is_single('17')) { // your post ID number ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>
Exclude a speciﬁc page
<?php if(is_page('17')) { // your page ID number ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>

Exclude a speciﬁc category
<?php if(is_category('17')) { // your category ID number ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>
Exclude a speciﬁc tag
<?php if(is_tag('personal')) { // your tag name ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>
Exclude multiple tags
<?php if(is_tag(array('personal','family','photos'))) { ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>
Exclude posts tagged with certain tag(s)
<?php if(has_tag(array('personal','family','photos'))) { ?>
<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
<?php } ?>
Exclude Speciﬁc Pages from Search Engines
295
Granted, using nofollow links to control duplicate content is not 100% foolproof.
If the author-archive URL is linked to from any “followed” links, that page still
may be indexed in the search engines. For pages such as the author archives
that probably aren’t linked to from anywhere else, nofollow may help prevent a

potential duplicate-content issue.
8.2.4 Robots.txt Directives
Another useful and often overlooked method of controlling duplicate content
involves the implementation of a robots.txt ﬁle for your site. Robots.txt ﬁles are
plain text ﬁles generally placed within the root directory of your domain.
/>Robots.txt ﬁles contain individual lines of well-established “robots” directives
that serve to control the crawling and indexing of various directories and pages.
Search engines such as Google and MSN that “obey” robots directives periodically
read the robots.txt ﬁle before crawling your site. During the subsequent crawl of
your site, any URLs forbidden in the robots.txt ﬁle will not be crawled or indexed.
Keep in mind, however, that pages prohibited via robots directives continue to
consume page rank. So, duplicate content pages removed via robots.txt may still
be devaluing your key pages by accepting any link equity that is passed via
incoming links.
Even so, with other measures in place, taking advantage of robots.txt directives is
an excellent way to provide another layer of protection against duplicate content
and unwanted indexing by the search engines.
Let’s look at an example of how to make a useful robots.txt ﬁle. First, review the
default directory structure of a WordPress installation in the screenshot (next page).
For a typical WordPress installation located in the root directory, there is no reason
for search engines to index URLs containing any of the core WordPress ﬁles. So we
begin our robots.txt ﬁle by writing:
Yahoo! Disobeys
Sadly, when it comes to search
engines that comply with
robots.txt directives, Yahoo!
falls far short:
/>Not Foolproof
Pages blocked by robots.txt
directives may still appear

within the index if linked to by
“trusted, third-party sources.”
/>296
Disallow: /wp-*
Disallow: *.php
These two lines tell compliant search engines to ignore any URL beginning with
“ or ending with “.php”. Thus, all of our core WordPress
ﬁles are restricted and will not be crawled by compliant search engines.
Now, consider some of the types of WordPress-generated URLs that we don’t want
the search engines to follow or index:
- your site's main feed
- your site's comments feed
- every other type of feed
- every trackback URL on your site
- archive views for every day
- archive views for every month
- archive views for every year
Of course, there are other types of pages which we may also wish to exclude
from the search engines, such as category and tag archives, but you get the idea.
To prohibit robots-compliant search engines from accessing and indexing the
miscellaneous pages listed above, we add these directives to our robots.txt ﬁle:
Disallow: */feed*
Disallow: */trackback*
Disallow: /20*

Tìm Hiểu về Wordpress - part 31 pps

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về