Technical SEO 11 min read

How to Create a Perfect XML Sitemap for Google Indexation

An XML sitemap is your website's roadmap for search engines. Learn how to create, optimize, and maintain your sitemap for maximum SEO benefit.

An XML sitemap is a file that lists all the important URLs on your website that you want search engines to crawl and index. It acts as a communication channel between you and Google, helping ensure your content gets discovered and indexed efficiently.

What is an XML Sitemap?

An XML sitemap is a structured XML file that provides search engines with a list of URLs on your website along with additional metadata about each URL. This metadata can include when the page was last updated, how important it is relative to other pages, and how frequently it changes.

Here's what a basic sitemap entry looks like:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/page</loc>
    <lastmod>2026-04-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>
50,000 Maximum URLs per sitemap file. Larger sites need multiple sitemaps with a sitemap index.

Why Do You Need One?

While Google can discover pages through links, a sitemap provides several important benefits:

Improved Discovery

  • Ensures Google knows about all your pages
  • Particularly useful for new sites with few backlinks
  • Helps pages with weak internal linking get found

Faster Indexing

  • Signals new content immediately
  • Indicates content changes through lastmod dates
  • Prioritizes crawling of important pages

Rich Information

  • Provides metadata search engines can use
  • Supports specialized content (video, images, news)
  • Enables language/region targeting with hreflang

"A sitemap is a way of organizing a website, identifying the URLs and the data under each section. It is especially beneficial for large websites where the crawlers might overlook new or recently updated pages."

Sitemaps.org Protocol

Sitemap Structure Explained

Understanding the XML structure helps you create better sitemaps:

Required Elements

Element Description Required
<urlset> Parent element encapsulating all URLs Yes
<url> Parent element for each URL entry Yes
<loc> The full URL of the page Yes

Optional Elements

Element Description Recommended
<lastmod> Date the page was last modified (W3C format) Highly recommended
<changefreq> How often the page changes (daily, weekly, etc.) Optional (often ignored)
<priority> Relative importance 0.0 to 1.0 Optional (often ignored)
Google largely ignores the changefreq and priority tags. Focus on accurate lastmod dates, which Google does use to decide when to recrawl pages.

How to Create a Sitemap

Option 1: CMS Plugins

Most content management systems have built-in or plugin sitemap generation:

  • WordPress: Yoast SEO, Rank Math, All in One SEO
  • Shopify: Built-in sitemap at /sitemap.xml
  • Wix: Automatically generated
  • Squarespace: Auto-generated sitemap
  • Drupal: XML Sitemap module

Option 2: Sitemap Generators

For static sites or custom needs:

  • Screaming Frog SEO Spider
  • XML-sitemaps.com (online generator)
  • Sitemap Generator Pro

Option 3: Manual Creation

For small sites, you can create sitemaps manually:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/</loc>
    <lastmod>2026-04-02</lastmod>
  </url>
  <url>
    <loc>https://yoursite.com/about</loc>
    <lastmod>2026-03-15</lastmod>
  </url>
  <url>
    <loc>https://yoursite.com/contact</loc>
    <lastmod>2026-02-20</lastmod>
  </url>
</urlset>

Option 4: Programmatic Generation

For dynamic sites, generate sitemaps with code. Here's a simple Python example:

import datetime
from xml.etree.ElementTree import Element, SubElement, tostring

def generate_sitemap(urls):
    urlset = Element('urlset')
    urlset.set('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9')

    for url_data in urls:
        url = SubElement(urlset, 'url')
        loc = SubElement(url, 'loc')
        loc.text = url_data['loc']
        if 'lastmod' in url_data:
            lastmod = SubElement(url, 'lastmod')
            lastmod.text = url_data['lastmod']

    return tostring(urlset, encoding='unicode')

Complement Your Sitemap with RSS

While sitemaps are great for discovery, RSS AutoIndex provides real-time notification to Google when you publish new content, ensuring the fastest possible indexation.

Try RSS AutoIndex Free

Optimization Best Practices

1. Include Only Indexable URLs

Your sitemap should only contain URLs that:

  • Return 200 status code
  • Are not blocked by robots.txt
  • Don't have noindex tags
  • Are the canonical versions
  • Have valuable content

2. Use Accurate lastmod Dates

Only update lastmod when content actually changes:

  • Don't update for minor template changes
  • Do update for content additions or modifications
  • Use the actual date, not the current date

3. Maintain Consistency

  • Use the same URL format throughout (trailing slash or not)
  • Match your canonical URLs exactly
  • Use the same protocol as your site (https)

4. Keep It Clean

  • Remove URLs returning errors
  • Remove redirected URLs
  • Exclude pagination URLs (use rel=next/prev instead)
  • Exclude filtered/sorted variations

5. Validate Your Sitemap

Before submitting, validate with:

  • Google Search Console sitemap testing
  • XML Sitemap Validator online tools
  • Your browser (should display XML structure)

Using Sitemap Index Files

For sites with more than 50,000 URLs, use a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yoursite.com/sitemap-posts.xml</loc>
    <lastmod>2026-04-02</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-pages.xml</loc>
    <lastmod>2026-03-28</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-products.xml</loc>
    <lastmod>2026-04-01</lastmod>
  </sitemap>
</sitemapindex>

Sitemap Organization Strategies

  • By content type: posts, pages, products, categories
  • By date: sitemap-2025.xml, sitemap-2026.xml
  • By section: blog, shop, help
  • By language: sitemap-en.xml, sitemap-es.xml
Each individual sitemap in an index can still only contain 50,000 URLs and must be under 50MB uncompressed (or 50MB compressed if using gzip).

Common Mistakes to Avoid

1. Including Non-Canonical URLs

Don't include URLs that have canonical tags pointing elsewhere. This creates confusion for search engines.

2. Listing Blocked URLs

Including URLs blocked by robots.txt creates contradictory signals. Remove blocked URLs from sitemaps.

3. Incorrect lastmod Dates

Setting lastmod to the current date every time the sitemap is generated trains Google to ignore this signal.

4. Including Low-Value Pages

Don't include:

  • Thin content pages
  • Duplicate content
  • Pages with no search value
  • Internal search results
  • Login/account pages

5. Forgetting to Submit

Creating a sitemap isn't enough - you must:

  • Submit to Google Search Console
  • Reference in robots.txt
  • Keep it updated

Ongoing Maintenance

A sitemap isn't a one-time task. Maintain it by:

  1. Automate updates: Use dynamic generation that updates with your content
  2. Monitor in Search Console: Check the Sitemaps report regularly
  3. Audit periodically: Review URLs quarterly for accuracy
  4. Track discovered vs indexed: Investigate large discrepancies
  5. Update after major changes: Site migrations, redesigns, etc.

Reference in robots.txt

Always reference your sitemap in robots.txt:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Conclusion

A well-maintained XML sitemap is a fundamental part of technical SEO. It helps search engines discover your content efficiently and ensures your important pages get the attention they deserve.

Key takeaways:

  • Include only indexable, canonical URLs
  • Use accurate lastmod dates that reflect real changes
  • Organize large sites with sitemap index files
  • Submit to Search Console and reference in robots.txt
  • Maintain and monitor your sitemap regularly
  • Validate before submission

Beyond Sitemaps: Instant Indexation

While sitemaps help discovery, RSS AutoIndex takes it further by actively notifying search engines the moment you publish new content.

Get Started Free