01 What an XML sitemap is for
An XML sitemap is a structured list of the URLs you want Google to crawl and index. It's not a magic ranking lever — including a URL in your sitemap doesn't make it rank — but it's a discovery aid that tells search engines, "These are the pages that matter on this site."
For small, well-linked sites, sitemaps are nice-to-have. For large sites, e-commerce catalogues, news publishers and any site with deep, sparsely-linked content, they're essential.
02 Anatomy of a sitemap
The simplest valid XML sitemap looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-12</lastmod>
</url>
<url>
<loc>https://example.com/products</loc>
<lastmod>2026-04-15</lastmod>
</url>
</urlset>
Each <url> entry needs a <loc> (the URL itself) and optionally <lastmod>, <changefreq>, and <priority>. We'll come to which of those still matter in a moment.
03 Size and URL limits
Sitemap files have hard limits set by the protocol:
- 50,000 URLs maximum per sitemap file.
- 50 MB maximum uncompressed file size.
- UTF-8 encoded.
- URLs must be absolute, including the scheme.
If your site has more than 50,000 URLs, you need a sitemap index — a meta-sitemap that lists multiple individual sitemap files. The index has the same 50,000-entry limit, so a sitemap index of sitemaps can theoretically cover 2.5 billion URLs. (You'd be surprised how many enterprise e-commerce sites genuinely need this.)
04 lastmod, changefreq, priority — what still matters
Of the three optional fields, only one really matters today.
lastmod — yes, use this
Google has been clear: lastmod is a useful signal — when it's accurate. If you stamp every URL with today's date on every sitemap regen, Google ignores the field entirely because it knows you're lying. Use the actual last-meaningful-modification timestamp.
changefreq — Google ignores this
Was once used as a hint about how often to recrawl. No longer used by Google. Bing and other crawlers may still consider it. Generally not worth the effort.
priority — Google ignores this
The priority field (0.0 to 1.0) was supposed to tell crawlers which URLs are most important. Google has explicitly confirmed it ignores priority. Skip it.
05 When and how to split sitemaps
For sites approaching the 50,000-URL limit, split sitemaps by content type:
/sitemap-pages.xml— static pages/sitemap-products.xml— products/sitemap-categories.xml— category pages/sitemap-blog.xml— blog posts
Then a sitemap index references all of them:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-04-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-04-12</lastmod>
</sitemap>
</sitemapindex>
Splitting also helps you diagnose indexing problems: Search Console reports indexed-vs-submitted counts per sitemap, so you can see exactly which content type is having trouble getting indexed.
06 Submitting your sitemap
Three places to register your sitemap:
- Reference it in robots.txt with a
Sitemap:directive at the top. - Submit in Google Search Console under Sitemaps. Resubmit when the file's structure changes (not on every URL update).
- Submit in Bing Webmaster Tools. Same idea, separate property.
07 Common mistakes
- Including non-canonical URLs. Sitemaps should contain only canonical, indexable URLs — the same ones you want indexed. Including redirects, 404s, or noindexed pages confuses Google and gets your sitemap flagged.
- Faked lastmod dates. If every URL's lastmod equals "today," Google stops trusting the field. Stamp the real modification date.
- Forgetting to update. If your CMS doesn't auto-regenerate the sitemap on publish, you're shipping new content invisibly.
- Wrong absolute URLs. Mixing HTTP and HTTPS, or www and non-www, fragments your sitemap. Stick to one canonical host.
08 How to audit your sitemap
A clean sitemap audit covers:
- Sitemap is reachable and returns 200.
- Every URL in the sitemap returns 200 (no redirects, no 404s).
- Every URL is indexable (not noindexed, not blocked by robots.txt).
- Every URL is canonical to itself.
- lastmod values are realistic.
Smart SEO Audit's sitemap mode lets you point at a sitemap URL and audit every page it lists in one batch — invaluable for large sites where manually checking thousands of URLs isn't an option.
? Frequently asked questions
Do I need an XML sitemap if my site is small?
Even small sites benefit, but sitemaps matter most for large sites, new sites with few backlinks, sites with deep or poorly-linked pages, and sites with rich media. A sitemap doesn't guarantee indexing — it helps Google discover URLs efficiently and signals which pages you consider important.
What should I exclude from my XML sitemap?
Only include canonical, indexable URLs that return a 200 status. Exclude noindexed pages, non-canonical duplicates, redirected URLs, 404s and parameter variants. A sitemap full of non-indexable or broken URLs wastes crawl budget and erodes Google's trust in the file — keep it clean and current.
How do I submit my sitemap to Google?
Add its URL to your robots.txt with a Sitemap: line, and submit it directly in Google Search Console under the Sitemaps report. Resubmit after major changes. Keep the sitemap automatically updated so new pages appear and removed pages drop off, rather than maintaining it by hand.
→ Related guides
Keep going — these companion guides go deeper on related topics.