An XML sitemap is a file that lists all the important URLs on your website that you want search engines to crawl and index. It acts as a communication channel between you and Google, helping ensure your content gets discovered and indexed efficiently.
What is an XML Sitemap?
An XML sitemap is a structured XML file that provides search engines with a list of URLs on your website along with additional metadata about each URL. This metadata can include when the page was last updated, how important it is relative to other pages, and how frequently it changes.
Here's what a basic sitemap entry looks like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yoursite.com/page</loc>
<lastmod>2026-04-01</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Why Do You Need One?
While Google can discover pages through links, a sitemap provides several important benefits:
Improved Discovery
- Ensures Google knows about all your pages
- Particularly useful for new sites with few backlinks
- Helps pages with weak internal linking get found
Faster Indexing
- Signals new content immediately
- Indicates content changes through lastmod dates
- Prioritizes crawling of important pages
Rich Information
- Provides metadata search engines can use
- Supports specialized content (video, images, news)
- Enables language/region targeting with hreflang
"A sitemap is a way of organizing a website, identifying the URLs and the data under each section. It is especially beneficial for large websites where the crawlers might overlook new or recently updated pages."
Sitemaps.org Protocol
Sitemap Structure Explained
Understanding the XML structure helps you create better sitemaps:
Required Elements
| Element | Description | Required |
|---|---|---|
<urlset> |
Parent element encapsulating all URLs | Yes |
<url> |
Parent element for each URL entry | Yes |
<loc> |
The full URL of the page | Yes |
Optional Elements
| Element | Description | Recommended |
|---|---|---|
<lastmod> |
Date the page was last modified (W3C format) | Highly recommended |
<changefreq> |
How often the page changes (daily, weekly, etc.) | Optional (often ignored) |
<priority> |
Relative importance 0.0 to 1.0 | Optional (often ignored) |
How to Create a Sitemap
Option 1: CMS Plugins
Most content management systems have built-in or plugin sitemap generation:
- WordPress: Yoast SEO, Rank Math, All in One SEO
- Shopify: Built-in sitemap at /sitemap.xml
- Wix: Automatically generated
- Squarespace: Auto-generated sitemap
- Drupal: XML Sitemap module
Option 2: Sitemap Generators
For static sites or custom needs:
- Screaming Frog SEO Spider
- XML-sitemaps.com (online generator)
- Sitemap Generator Pro
Option 3: Manual Creation
For small sites, you can create sitemaps manually:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yoursite.com/</loc>
<lastmod>2026-04-02</lastmod>
</url>
<url>
<loc>https://yoursite.com/about</loc>
<lastmod>2026-03-15</lastmod>
</url>
<url>
<loc>https://yoursite.com/contact</loc>
<lastmod>2026-02-20</lastmod>
</url>
</urlset>
Option 4: Programmatic Generation
For dynamic sites, generate sitemaps with code. Here's a simple Python example:
import datetime
from xml.etree.ElementTree import Element, SubElement, tostring
def generate_sitemap(urls):
urlset = Element('urlset')
urlset.set('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9')
for url_data in urls:
url = SubElement(urlset, 'url')
loc = SubElement(url, 'loc')
loc.text = url_data['loc']
if 'lastmod' in url_data:
lastmod = SubElement(url, 'lastmod')
lastmod.text = url_data['lastmod']
return tostring(urlset, encoding='unicode')
Complement Your Sitemap with RSS
While sitemaps are great for discovery, RSS AutoIndex provides real-time notification to Google when you publish new content, ensuring the fastest possible indexation.
Try RSS AutoIndex FreeOptimization Best Practices
1. Include Only Indexable URLs
Your sitemap should only contain URLs that:
- Return 200 status code
- Are not blocked by robots.txt
- Don't have noindex tags
- Are the canonical versions
- Have valuable content
2. Use Accurate lastmod Dates
Only update lastmod when content actually changes:
- Don't update for minor template changes
- Do update for content additions or modifications
- Use the actual date, not the current date
3. Maintain Consistency
- Use the same URL format throughout (trailing slash or not)
- Match your canonical URLs exactly
- Use the same protocol as your site (https)
4. Keep It Clean
- Remove URLs returning errors
- Remove redirected URLs
- Exclude pagination URLs (use rel=next/prev instead)
- Exclude filtered/sorted variations
5. Validate Your Sitemap
Before submitting, validate with:
- Google Search Console sitemap testing
- XML Sitemap Validator online tools
- Your browser (should display XML structure)
Using Sitemap Index Files
For sites with more than 50,000 URLs, use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://yoursite.com/sitemap-posts.xml</loc>
<lastmod>2026-04-02</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-pages.xml</loc>
<lastmod>2026-03-28</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-products.xml</loc>
<lastmod>2026-04-01</lastmod>
</sitemap>
</sitemapindex>
Sitemap Organization Strategies
- By content type: posts, pages, products, categories
- By date: sitemap-2025.xml, sitemap-2026.xml
- By section: blog, shop, help
- By language: sitemap-en.xml, sitemap-es.xml
Common Mistakes to Avoid
1. Including Non-Canonical URLs
Don't include URLs that have canonical tags pointing elsewhere. This creates confusion for search engines.
2. Listing Blocked URLs
Including URLs blocked by robots.txt creates contradictory signals. Remove blocked URLs from sitemaps.
3. Incorrect lastmod Dates
Setting lastmod to the current date every time the sitemap is generated trains Google to ignore this signal.
4. Including Low-Value Pages
Don't include:
- Thin content pages
- Duplicate content
- Pages with no search value
- Internal search results
- Login/account pages
5. Forgetting to Submit
Creating a sitemap isn't enough - you must:
- Submit to Google Search Console
- Reference in robots.txt
- Keep it updated
Ongoing Maintenance
A sitemap isn't a one-time task. Maintain it by:
- Automate updates: Use dynamic generation that updates with your content
- Monitor in Search Console: Check the Sitemaps report regularly
- Audit periodically: Review URLs quarterly for accuracy
- Track discovered vs indexed: Investigate large discrepancies
- Update after major changes: Site migrations, redesigns, etc.
Reference in robots.txt
Always reference your sitemap in robots.txt:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
With our RSS indexing solution, your content is automatically submitted to search engines.
Conclusion
A well-maintained XML sitemap is a fundamental part of technical SEO. It helps search engines discover your content efficiently and ensures your important pages get the attention they deserve.
Key takeaways:
- Include only indexable, canonical URLs
- Use accurate lastmod dates that reflect real changes
- Organize large sites with sitemap index files
- Submit to Search Console and reference in robots.txt
- Maintain and monitor your sitemap regularly
- Validate before submission
Beyond Sitemaps: Instant Indexation
While sitemaps help discovery, RSS AutoIndex takes it further by actively notifying search engines the moment you publish new content.
Get Started Free