First, let's begin with a simple Sitemaps.xml example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Example Definitions:
<?xml version="1.0" encoding="UTF-8"?>
- Output headers telling the browser it's an XML file (Required)
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
- Encapsulates the file and references the current protocol standard. (Required)
<url>
- Opening tag for the link (Required)
<lastmod>
- The date the url's was last modified which has to be W3C Datetime format (YYYY-MM-DD). This is optional.
<changefreq>
- To notify how frequent the url changes. This option should signify what to expect every time the file is accessed..
Below are the options available for this feature:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
WARNING: Sitemaps.org states and I quote, "Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. Crawlers may periodically crawl pages marked "never" so that they can handle unexpected changes to those pages."
<priority>
- This sets the priority of each url, ranging from 0.0 to 1.0. The default is set to 0.5 if not otherwise used. (Optional)
Escaping Special Characters:
Understanding what kind of links you can have within your sitemap(s) is important. For example, if you use "&" within your url, you will have to change it to "&". Escaping characters within a sitemap is the process as within an XML document. Now this doesn't mean that after search engines crawl your pages, visitors will see "&". It just allows these robots to crawl your pages effectively with no errors.
Here are the escaping characters where changes are needed:
- Character / Escape Code
- Ampersand & / &
- Single Quote ' / '
- Double Quote " / "
- Greater Than > / >
- Less Than < / <
Example (BAD):
http://www.slaldsj.net/?page=2&alpha=No
Example (GOOD):
http://www.slaldsj.net/?page=2&alpha=No
Take a good look at the difference between the characters right after the #2 within the urls.
How Many Links Are Allowed Within One Sitemap?
Google states that they don't want to see more than 50,000 links within one sitemap and no more than 1000 links within an index sitemap file. The key is to structure multiple sitemaps together to layout your entire web site. This is possible by creating an Index Sitemap.
Creating "Index Sitemaps" For Larger Web Sites:
If you have one particular web site that is larger than life, then you'll want to create an index-sitemap, which simply tells search engines that you will be listing multiple sitemaps to break up your content for crawling.
A typical Index Sitemap is structured as follows....
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.bogaboga.net/sitemaps/1.xml</loc>
</sitemap>
<sitemap>
<loc>http://www.bogaboga.net/sitemaps/2.xml</loc>
</sitemap>
</sitemapindex>
The sole purpose is to list as many sitemaps necessary for you to detail every link within your web site. If you feel that your site will never become larger than 50,000 pages - do not worry about creating an index sitemap.
For those of you who need an index sitemap, simply add a new file for every section within your site, as long as it doesn't exceed the link limit or the file doesn't exceed 10mb in size.
Example Definitions:
<?xml version="1.0" encoding="UTF-8"?>
- Output headers telling the browser it's an XML file (Required)
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
- Encapsulates the file and references the current protocol standard. (Required)
These two lines simply outline the page header information needed to activate it as a sitemap(s) xml document.
<sitemap>
- This opens a link to your sitemap.
<loc></loc>
- The location of your sitemap.xml or .php or other.
</sitemap>
- This closes the link to your sitemap.
</sitemapindex>
- This closes the entire file for processing.
LOCATION, LOCATION, LOCATION!
Where you add your sitemaps.xml is of great importance. You can only upload a sitemap(s) within the directory it will be accessing. Here are some examples of what to do:
INDEX Sitemaps: http://www.ouofjjk.net/HERE
Add another sitemap which points to links within a certain directory, for example:
http://www.ouofjjk.net/info/sitemaps.xml
It's important to note that a sitemap must only access the information from /info/ and beyond. No URLs located below that folder can be accessed from the sitemap.
Sub-Domains:
You need to add a unique sitemap within each sub-domain to access any urls within that location. For example...
http://sub-domain.slkfjds.net/sitemaps.xml
PHP Outputing:
To activate a sitemap with PHP, simply replace the first line with the following instead.
<?php
header("Content-type: application/xml");
?>
<?php echo '<?xml version="1.0" encoding="UTF-8"?>';?>
These two lines will tell the browser to activate this .php file as an xml document instead. Now you can create your file and save it as sitemaps.php instead of .xml for database driven links.
MySQL Query:
For those of you who want to take information directly from a pre-setup data base, the code you will need is as follows:
=============================================
<?php
header("Content-type: application/xml");
?>
<?php echo '<?xml version="1.0" encoding="UTF-8"?>';?>
<?php echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';?>
<?php
//Connect to your database
mysql_connect("localhost", "username", "password") or die(mysql_error());
//Get the links from your tables
$main = mysql_query("SELECT linkURL, date FROM YourTable WHERE condition='1' AND condition='2' ORDER BY yourlist DESC LIMIT 50000");
while($myLink = mysql_fetch_array($main)){
// Make the date format
$dateValue = $myLink['date'];
$Date = date("Y-m-d", strtotime($dateValue));
// Escape all unwanted characters. It is completely optional as to whether or not you want to remove these characters. It's your database; make your decision accordingly, depending on how it is set up to deliver the urls for your site.
$link = $myLink['linkURL'];
$r1 = "-";
$link = str_replace(" ", $r1++.'', $link);
$r2 = "&";
$link = str_replace("&", $r2++.'', $link);
$r3 = "";
$link = str_replace(",", $r3++.'', $link);
$r4 = "";
$link = str_replace(".", $r4++.'', $link);
$r5 = "";
$link = str_replace("@", $r5++.'', $link);
$r6 = "";
$link = str_replace("_", $r6++.'', $link);
$r7 = "";
$link = str_replace("(", $r7++.'', $link);
$r8 = "";
$link = str_replace(")", $r8++.'', $link);
$r9 = "";
$link = str_replace("|", $r9++.'', $link);
$r10 = "";
$link = str_replace("/", $r10++.'', $link);
$r11 = "";
$link = str_replace("?", $r11++.'', $link);
$r13 = "";
$link = str_replace("'", $r13++.'', $link);
$r14 = "";
$link = str_replace(":", $r14++.'', $link);
$r15 = "";
$link = str_replace(";", $r15++.'', $link);
$r16 = "";
$link = str_replace("$", $r16++.'', $link);
$r17 = "";
$link = str_replace("©", $r17++.'', $link);
$r18 = "";
$link = str_replace("™", $r18++.'', $link);
?>
<url>
<loc><?php echo $link; ?></loc>
<lastmod><?php echo $Date; ?></lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url><?php } mysql_close(); ?>
</urlset>
=============================================
There you have it. Change whatever you need to accommodate your own database and save the file to what you want it to be in .php format.
For a good reference on all the material discussed here, please visit:
http://www.sitemaps.org/protocol.php
Within the next issue, titled "Google Sitemaps (Pt3) Structuring Large Sitemaps", I will go indepth on how to structure multiple sitemaps with the purpose of indexing a larger, more complicated web site.
About The Author:
Martin Lemieux, president of the Adcidia network, has over 16 years of experience online while maintaining a network of over 35 web sites. For more tips like these, please visit:
Martin's Blog:
http://www.MartinLemieux.ca/
Internet Marketing Blog:
http://www.MartinLemieux.ca/internet-marketing/
Martin's RSS Feed:
http://www.MartinLemieux.ca/xml/
© Copyright, Martin Lemieux - All Rights Reserved. Reprints Accepted.