Before you panic about penalties or algorithm updates, run through these 10 checks. Most indexing failures come from robots.txt blocks, noindex tags, sitemap errors, server hiccups, or crawl budget exhaustion. No fluff. Fixes only.
You publish content. Google crawls nothing. Or it crawls and drops pages into a black hole. The question 'why is Google not indexing my website' is usually answered by one of five root causes: a hard block (robots.txt or noindex), a broken sitemap, a server error, a crawl budget leak, or a weak page that Google deems unworthy.
In practice, when you run a site audit for a client who swears they 'did everything right', the first thing we find is a Disallow: / in robots.txt that someone added as a joke or a staging leftover. A common situation we see: an agency launches a new site, copies the old robots.txt, and that file still blocks the entire production domain. One line. Zero indexed pages. Three weeks of silence.
Below is the exact 10-check sequence we use. Start at check 1. Do not skip.
| Check # | What to Inspect | Expected State | Failure Mode & Risk |
|---|---|---|---|
| 1. robots.txt URL: /robots.txt | Allow: / (no Disallow for critical paths) | Disallow: / blocks entire siteRisk: total deindexing for weeks | |
| 2. noindex tags View page source | No | Developers add noindex to staging, forget to remove Risk: new pages stay invisible | |
| 3. Sitemap format Check XML validity | Valid XML, < 50MB, < 50,000 URLs | Sitemap >50MB triggers truncation Risk: Google ignores oversized sitemaps | |
| 4. Index coverage report Google Search Console | Errors < 5% of total URLs | Submitted URLs marked 'Excluded' with 'Crawled - currently not indexed' Risk: content quality issue | |
| 5. Server response curl -I or browser devtools | 200 OK within 2 seconds | 5xx errors or slow TTFB >3s Risk: Google abandons crawl | |
| 6. Canonical URL Check | Points to self or correct preferred version | Different canonical chosen by Google Risk: wrong page indexed, original ignored. See canonical mismatch analysis. | |
| 7. Crawl budget GSC Crawl Stats | Crawl rate > 10 pages/day for small sites | Low crawl rate + high URL count = budget exhaustion Risk: deep pages never indexed | |
| 8. Internal linking Check orphan pages | Every page has >= 1 internal link | Orphan pages with 0 internal links Risk: Google never discovers them | |
| 9. Page quality signals Content length, uniqueness | Minimum 300 words, no thin content | Under 200 words, duplicate or auto-generated Risk: 'Crawled - currently not indexed' | |
| 10. Manual action / penalty GSC Manual Actions report | No manual actions listed | Spam or unnatural links penalty Risk: entire site or section deindexed |
The fastest way to stop guessing is to follow a linear flow. Here is the exact sequence we run for every 'why is Google not indexing my website' ticket. Each node has a single operational note so you can execute without context-switching.
Open /robots.txt. If Disallow: / exists, remove it. Test with Google's robots.txt Tester.
Search for 'noindex' in page source. Use Screaming Frog to bulk scan 200+ pages.
Submit sitemap to GSC. Check for 'Couldn't fetch' or 'URL not accessible' errors.
Run curl -I https://yoursite.com. Must return 200 OK under 2 seconds. Fix 5xx at host level.
In GSC > Crawl Stats, check 'Total crawl requests'. If < 10/day and site has 1000+ URLs, fix server speed and internal linking.
GSC > Security & Manual Actions. If red, submit reconsideration request after cleanup.
The scenario: An e-commerce client with 30,000 product pages submits a sitemap. Google indexes 0 pages after 3 weeks.
Step 1: Check robots.txt. Found Disallow: /. Removed it. Waited 5 days. Zero change.
Step 2: Check sitemap in GSC. Sitemap shows 'Submitted: 30,000 URLs, Indexed: 0'. Click 'See index coverage'. Filter: 'Excluded > Crawled - currently not indexed'. Count: 28,500 URLs.
Step 3: Spot-check 10 excluded URLs. Each one has a tag injected by the theme. Vendor had hardcoded it.
Step 4: Remove noindex tag from theme template. Resubmit sitemap via GSC. After 8 days, 12,400 URLs indexed. After 21 days, 24,100 indexed.
Takeaway: Two blocks (robots.txt + noindex) stacked. Fixing one alone does nothing. Always check both.
Even after clearing blocks, sitemap formatting errors quietly kill indexing. Google's large sitemap guidelines state a single sitemap must not exceed 50MB or 50,000 URLs. A common situation we see: a site with 80,000 URLs in one sitemap file. Google truncates it at 50,000, and the remaining 30,000 URLs never get submitted. The fix: split into two sitemaps or use a sitemap index file.
Crawl budget becomes the bottleneck for large sites. If your server responds in 3 seconds, Google might crawl only 50 pages per day. For a 200,000-page site, that is 4,000 days to cover everything. Prioritize thin product pages, consolidate weak pages, and use noindex on filter URLs to conserve budget for money pages.
Check robots.txt for Disallow: / or critical path blocks
Search page source for noindex meta tag and X-Robots-Tag header
Validate sitemap XML format, size, and number of URLs
Review GSC Index Coverage report for error types and counts
Test server response time with curl or webpagetest.org
Verify canonical URLs point to the correct version of each page
Analyze GSC Crawl Stats for low crawl rate
Map internal links to ensure every page has at least one inbound link
Sometimes none of the above work. Here are real edge cases we have debugged:
1. Staging environment indexed. A client had two sitemaps pointing to the same domain, one with staging URLs. Google indexed staging pages and treated production pages as duplicates. Fix: remove staging sitemap and add canonical tags on staging.
2. CDN cache poisoning. A misconfigured CDN served a cached 503 error for 48 hours. Google saw the 503, marked 10,000 URLs as 'Crawled - currently not indexed'. Fix: purge CDN cache and force recrawl with GSC URL Inspection tool.
3. JavaScript rendering failure. A React site loaded content via JS that Googlebot could not execute. Pages returned empty HTML. Fix: implement dynamic rendering or server-side rendering for critical content.
For a quick check if a specific URL is indexed, use this lightweight index checker to confirm status before diving deeper.
Sitemap submission is not a guarantee. Check GSC Index Coverage report for errors like 'Submitted URL not found (404)' or 'Crawled - currently not indexed'. The latter means Google found the page but chose not to index it, often due to thin content or low perceived value. Fix content quality and ensure internal links point to the page.
It means Googlebot visited the URL, read the content, but decided not to add it to the index. This is common for blogs with short posts (<300 words), duplicate topics, or weak author authority. Fix by expanding content, adding unique insights, and building internal links from high-authority pages on your site.
Open yourdomain.com/robots.txt in a browser. Look for 'Disallow: /' or 'Disallow: /wp-admin/'. Google's robots.txt Tester in GSC shows which URLs are blocked. A common mistake is adding a Disallow for the entire site during development and forgetting to remove it before launch.
Staging sites often have no robots.txt block or noindex tag. Google finds them through sitemaps or external links. Add a 'Disallow: /' in robots.txt on staging, or set a 'noindex' meta tag globally. Also ensure your live sitemap does not include staging URLs.
Use the GSC URL Inspection tool to request indexing. Ensure the page has at least one internal link from an already-indexed page. Submit the page URL via a sitemap. For time-sensitive content (news, product launches), use the 'Request Indexing' button after verifying the page is crawlable.
Crawl budget is the number of URLs Googlebot crawls per day. If your site has 100,000 URLs but only 50 get crawled daily, deep pages may never be discovered. Fix by improving server speed, removing thin or duplicate pages, and using noindex on low-value filter or tag pages.
Yes. If page A has a canonical tag pointing to page B, Google may index page B and drop page A entirely. This is common in e-commerce sites with multiple product URL variants. Use self-referencing canonicals or consolidate variants. For more, see <a href='https://hackmd.io/@SpeedyIndex-Official/Why-Google-Chooses-Different-Canonical-URL-How-to-Fix'>this canonical mismatch analysis</a>.
Images need their own indexable URLs. Ensure images are not loaded via JavaScript or CSS background. Use descriptive alt text, submit an image sitemap, and check that your server does not block image crawling via robots.txt. Images in PDFs are rarely indexed.
This error usually means Googlebot tried to fetch the URL but encountered a server error (5xx) or timeout. Check your server logs for the specific URL. Verify the page loads in a browser without errors. If using a CDN, ensure it does not block Googlebot's user-agent.
Check for a recent manual action or algorithm update. Review GSC Crawl Stats for a drop in crawl rate. Look for server errors or robots.txt changes. Sometimes a site-wide noindex tag is accidentally added after a theme update. Run the full 10-check diagnostic in this article.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.