A poorly configured sitemap wastes crawl budget. Learn to generate a proper XML sitemap, avoid common validation errors, and submit it to Google Search Console so your important pages get indexed fast. We cover priority tags, lastmod usage, and real-world filtering strategies.
Most SEOs treat the sitemap as a checklist item. Upload once, forget. That is a mistake. Google uses the sitemap to discover URLs, but it does not guarantee indexing. A common situation we see: a site with 50,000 product pages submits a sitemap containing all of them, but only 200 get indexed. Why? Because the sitemap included blocked URLs, noindex tags, and pages with thin content. The official Google documentation on building a sitemap is clear: only include canonical, indexable URLs. Filter out everything else before you even generate the file.
In practice, when you create and submit an XML sitemap to Google, the most common bottleneck is not the submission itself—it is the quality of the URLs inside. If you dump every URL, you dilute your crawl budget. A lean, validated sitemap with 5,000 high-priority pages will outperform a bloated one with 50,000 entries every time. The rest of this guide focuses on the decisions that separate effective sitemaps from noise.
| Generator Type | How It Works | Best For | Hidden Risk / Failure Mode |
|---|---|---|---|
| Static generator e.g., Screaming Frog, XML-Sitemaps.com | Crawl site, generate flat XML file. Manual upload via FTP or CMS. | Small sites (< 500 pages) or one-time builds. Quick fix. | Stale data. If you add pages, the sitemap is outdated. No automatic update. Easy to accidentally include noindex or canonicalised URLs. |
| Dynamic generator (plugin) e.g., Yoast, Rank Math, SEOPress | Sits inside CMS. Regenerates on post publish. Adds lastmod automatically. | WordPress sites. Medium traffic. Content updates daily. | Plugin can conflict with caching. Lastmod may be wrong if post modified date is set to current time by cache. Can include pagination pages or tag archives. |
| Dynamic generator (custom script) e.g., Python + Flask, PHP cron job | Queries database directly. Filters by custom rules. Outputs gzipped XML. | Large sites (10k+ pages). Custom logic needed. Agency workflows. | Requires dev maintenance. Common bug: forgetting to exclude staging or test domains. Slow database queries can timeout on large catalogs. Gzip must be streamed, not buffered. |
| Dynamic generator (API-based) e.g., SpeedyIndex, JetOctopus | Uses API to fetch indexable URLs. Allows bulk submit. Some validate before adding. | Agencies managing multiple client sites. Bulk workflows. Scale. | Cost per API call. Vendor lock-in. If the API returns empty results due to a filter mismatch, you get a blank sitemap. Always validate the output before submission. |
Priority tags are a hint, not a directive. Google ignores them if your site structure is clear. Lastmod, on the other hand, is useful—but only if it is accurate. A common edge case: your CMS sets lastmod to the current date every time a post is saved, even if nothing changed. This tricks Google into recrawling unchanged pages, wasting budget. Use the actual content modification date, not the database update timestamp.
Changefreq is the least reliable tag. Google ignores it in most cases because it derives frequency from crawl history. Do not waste time on it. Instead, focus on ensuring that your lastmod values are correct and that you only include URLs that are actually indexable. For a deeper look at how Google can still choose a different canonical URL than what you set, see this analysis of why Google sometimes picks a different canonical URL and how to fix it.
We had a client with 12,000 product URLs. Initial sitemap included everything. After one month, only 1,200 pages were indexed. We rebuilt the sitemap using these filters:
Final sitemap: 4,500 URLs. After resubmission, within 10 days, 3,800 of those were indexed. The rest had other issues (slow server, no backlinks), but the crawl efficiency jumped from 10% to 84%. The key: do not just generate a sitemap—audit it first.
Export all site URLs. Run bulk checks for indexability using a crawler or API. Identify duplicates, redirects, noindex, and thin content pages.
Keep only canonical, indexable, and high-value URLs. Set priority based on business value: 1.0 for home and core pages, 0.5 for standard posts, 0.3 for archives.
Use a dynamic generator or custom script. Ensure lastmod is from the actual content change date. Validate XML syntax and size limits (max 50MB or 50,000 URLs per file).
Open XML in browser. Check for namespace errors. Use Google's URL Inspection Tool on a sample of included URLs. Confirm they return 200 and are indexable.
Go to Sitemaps section. Add sitemap URL. Monitor the 'Submitted' vs 'Indexed' count daily for the first week. Watch for errors under 'Coverage'.
Check weekly for sudden drops. If new pages are added, regenerate the sitemap. Use Google's Index Coverage report to spot patterns. After submission, <a href="https://teletype.in/@speedyindex/check-if-google-indexed">check if Google actually indexed your pages</a> using URL inspection.
When you create and submit an XML sitemap to Google, the Search Console will show errors if the file is invalid. The most frequent ones:
For news sites, regenerate the sitemap every time new content is published. At least daily. Use dynamic generation that triggers on publish. Include only the latest 1,000 articles, not the entire archive. Submit via Google Search Console's Sitemaps section. Monitor the 'Submitted' count to ensure it updates within 24 hours.
The top errors: namespace missing, lastmod format incorrect (use W3C datetime with timezone), URLs over 2048 characters (common with long product filters), and exceeding the 50,000 URL or 50MB limit. You must split into multiple sitemaps and use a sitemap index file. Also, ensure no URLs return 4xx or 5xx status codes.
Yes. Use Google Search Console API or third-party tools like SpeedyIndex that support bulk submission. For each domain, you need ownership verification in GSC. The API allows submitting sitemap URLs programmatically. However, Google does not guarantee immediate indexing. The sitemap just signals existence; the pages must still pass quality and relevance checks.
Use a custom script that generates a separate sitemap for each domain. Exclude any URLs that link to the money site. Do not use the same IP or CMS pattern. Validate that no sitemap includes internal search pages or author pages that might expose relationships. Submit each sitemap individually to its respective Google Search Console account.
Use Google Search Console's URL Inspection tool. Enter a sample of URLs from the sitemap. The tool shows 'URL is on Google' or 'URL is not on Google'. Also check the 'Indexed' count in the Sitemaps report. For a quick bulk check, use a third-party tool that compares your sitemap URLs against Google's index via the Indexing API.
First, verify the sitemap is valid and accessible (200 status, no robots.txt block). Use URL Inspection to test a few pages—they may be noindex or blocked. Check the 'Coverage' report for errors like 'Submitted URL blocked' or 'Submitted URL has crawl issue'. Ensure your pages are not behind a login. If everything looks correct, wait 1-2 weeks; Google may index them slowly.
Audit all URLs for uniqueness (canonical tags, duplicate content). Exclude pages with less than 300 words of original content. Use canonical tags to consolidate duplicates, then include only the canonical URL in the sitemap. Set priority low (0.3) for thin pages if you must include them, but better to exclude entirely. Regenerate and resubmit.
Yes. Use a custom script that loops through client sites, generates sitemaps via API or database query, validates XML, and submits to each site's Google Search Console via the API. Tools like SpeedyIndex offer a dashboard for multi-site management. Set up cron jobs to regenerate daily. Monitor error logs for each client to catch validation failures early.
Set priority: 1.0 for homepage and cornerstone articles, 0.8 for recent posts (last 30 days), 0.5 for older posts. For lastmod, use the actual publication or substantial update date—not the date a comment was added. Do not use lastmod if your CMS cannot output a stable, correct date; omit the tag entirely. Google will still crawl based on its own signals.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.