What an XML sitemap is
An XML sitemap is a machine-readable file — almost always named sitemap.xml — that lists the canonical URLs on a site you want search engines to discover, crawl, and index. The keyword to anchor on is XML sitemap: it is not a ranking signal and it does not guarantee indexing. It is a discovery aid that tells Google and Bing *here are the pages that matter and when they last changed*.
Concretely, a sitemap is a list of <url> entries, each wrapping a <loc> (the absolute URL) and optionally a <lastmod> timestamp. Search engines read the file to find pages that internal links or crawl budget might otherwise miss — deep pages, newly published posts, or sections with weak internal linking.
A minimal valid sitemap looks like this:
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-06-10</lastmod>
</url>
<url>
<loc>https://example.com/blog/how-to-create-an-xml-sitemap</loc>
<lastmod>2026-06-10</lastmod>
</url>
</urlset>Two limits define the format. A single sitemap file can hold at most 50,000 URLs and must be under 50 MB uncompressed. Past either limit, you split into multiple sitemaps and tie them together with a sitemap *index* file. A sitemap is one piece of technical SEO — the layer that governs how machines crawl and parse a site rather than how humans read it.
What to include and what to exclude
An XML sitemap should list only the URLs you genuinely want indexed: one canonical, self-canonicalizing, 200 OK version of each page. The fastest way to confuse a crawler — and to invite the "Sitemap contains URLs which are blocked by robots.txt" warning — is to include pages you are simultaneously telling Google not to index.
Include in a sitemap:
- Important content — published posts, products, category pages, key landing pages.
- Pages with weak internal linking that crawlers might otherwise miss.
Exclude from a sitemap:
- Non-canonical duplicates — only the canonical URL belongs here. If you are unsure which version is canonical, see what is a canonical tag.
- robots.txt-blocked URLs, paginated infinite-scroll junk, faceted-search parameter URLs, and login or admin pages.
The governing rule: the sitemap is a statement of intent, not an inventory. Every URL in it should be one you would be happy to see ranking tomorrow. Mixing in noindex URLs, redirect chains, or canonicalized duplicates is the most common reason Search Console reports a sitemap full of warnings.
How to generate, submit, and verify a sitemap
Creating an XML sitemap is a four-step loop: generate the file, host it at a stable URL, submit it to search engines, then verify it processed cleanly. Most sites never reach the verify step, which is exactly where the silent failures hide.
- Generate the fileProduce sitemap.xml from your CMS, framework, or a crawler, listing only canonical 200 OK URLs.
- Host at a stable URLServe it at https://yourdomain.com/sitemap.xml with the correct XML content type.
- Reference in robots.txtAdd a Sitemap: line so any crawler discovers the file automatically.
- Submit to search enginesEnter the sitemap path in Google Search Console and the full URL in Bing Webmaster Tools.
- Verify processingReturn a day later and confirm the status reads Success with the expected discovered-URL count.
- Monitor and resubmitRe-check after structural changes; resubmit only after migrations or large batches of new pages.
Generating the file depends on your stack. Most modern frameworks and CMS platforms produce one automatically: WordPress via Yoast or Rank Math, Next.js via a sitemap.ts route or next-sitemap, Shopify and Squarespace out of the box. For static or custom sites, a crawler like Screaming Frog can export a sitemap from a live crawl. Whatever the method, the output must be served at a stable, absolute URL — conventionally https://yourdomain.com/sitemap.xml.
Submitting means two things. First, reference the sitemap in robots.txt with a Sitemap: line so any crawler discovers it automatically:
Sitemap: https://example.com/sitemap.xmlSecond, submit the URL explicitly in Google Search Console and Bing Webmaster Tools (covered in the next section). Verifying is the step people skip — open Search Console a day later, confirm the status reads *Success*, and check that the discovered URL count matches what you expected. To confirm the file is even reachable and well-formed before you submit, run the live URL through our free SEO + GEO audit at /, which checks the sitemap alongside 40+ other signals at /check.
How to submit a sitemap to Google and Bing
Submitting a sitemap to Google means pasting its URL into Google Search Console; submitting to Bing means doing the same in Bing Webmaster Tools. Both processes take under a minute and both require a verified property for the domain first.
Google Search Console. Open the property for your domain, click Sitemaps in the left navigation, enter the path (e.g. sitemap.xml) in the *Add a new sitemap* field, and click Submit. Google fetches the file, parses it, and within hours to a few days reports a status of *Success*, *Has errors*, or *Couldn't fetch*, along with the number of discovered URLs.
Bing Webmaster Tools. Open your verified site, go to Sitemaps, click Submit sitemap, paste the full absolute URL (https://example.com/sitemap.xml), and submit. Bing also imports sitemaps automatically if you connect a verified Search Console account, which is the fastest path if Google is already set up.
| Step | Google Search Console | Bing Webmaster Tools |
|---|---|---|
| Where to go | Sitemaps section in the left nav | Sitemaps section, then Submit sitemap |
| What to enter | The path, e.g. sitemap.xml | The full absolute URL |
| Auto-import | No native import from Bing | Can import directly from a verified GSC account |
| Status reported | Success / Has errors / Couldn't fetch | Submitted with last-read date and URL count |
You do not need to resubmit a sitemap every time content changes. Once submitted, Google and Bing re-fetch it on their own schedule. Resubmit manually only after a major structural change — a domain migration, a new sitemap index, or a large batch of new pages you want crawled quickly. The Sitemap: line in robots.txt also keeps the file discoverable to crawlers that never touch your webmaster accounts.
Common sitemap errors and how to fix them
Most sitemap problems reduce to a handful of repeatable errors, and the two that waste the most time are the "Couldn't fetch" stale status and lastmod abuse. Both look like the sitemap is broken when the real cause is elsewhere.
"Couldn't fetch" rarely means the file is broken. It usually means Google could not reach it at the moment it tried.
"Couldn't fetch" (the stale-status trap). Search Console shows *Couldn't fetch* when Google's request for the file failed or timed out. The status is sticky — it can linger for days even after you fix the cause, which makes people re-edit a sitemap that was fine all along. Real culprits, in order of frequency: the sitemap URL returns a redirect or non-200 status; a slow or cold-start server timed out the fetch (common on serverless and free-tier hosting that sleeps); a robots.txt rule blocks the sitemap path; or the file is served with the wrong content type. Fix the root cause, confirm the file loads instantly in a browser, then resubmit — and ignore the stale status for a day or two while it clears.
lastmod abuse. The <lastmod> tag should reflect the genuine last meaningful change to a page. Setting every URL's lastmod to today's date on every build — a default in some generators — trains Google to distrust the field entirely. Google has stated it ignores lastmod when the values are obviously unreliable. Either populate lastmod accurately or omit it; a wrong timestamp is worse than none.
Other frequent errors:
- Non-200 / redirect / noindex URLs in the sitemap — strip anything that is not a canonical, indexable
200. - Relative or wrong-protocol URLs — every
<loc>must be a full absolute URL on the canonical host and protocol. - Over the 50,000-URL or 50 MB limit — split into multiple files and reference them from a sitemap index.
When the file looks correct but pages still are not indexed, the issue is often crawl access rather than the sitemap itself. A blocked AI or search crawler never reads your perfect file — confirm crawler access at /check/geo.aibots.blocked.