Google Indexing API Setup for Large Websites

On this page

Why Automate With the Indexing API?Indexing API Submission Flow Indexing API: Common Errors & How to Fix Worked Example: Python Script for Job Listings Quota Limits & Operational Realities Pre-Submission Checklist FAQ Operational Failure: The Duplicate List Trap Production Deployment Steps

Field notes

Why Automate With the Indexing API?

The Indexing API is not for every page. It's built for time-sensitive content: job postings, event pages, live blogs, and product availability updates. You send a URL and structured data; Google decides when to crawl and index. Structured data is mandatory — without it the API returns a 400 error. In practice, when you run a large site with 50,000 new job listings daily, waiting for a standard crawl means losing revenue. Automation changes that.

A common situation we see: a team implements the API, hits quota at 10:00 AM, and wonders why half their URLs are missing. Quota is 200 URLs per day per verified property. That's tight. You must prioritize. URLs with no structured data get silently ignored. URLs already indexed get a 200 response but no re-crawl guarantee. Use the index status checker to verify before submission.

Workflow map

Indexing API Submission Flow

1. Verify Site in Search Console

Must have owner permission. Use domain property for API access.

2. Enable API & Create Credentials

Enable Indexing API in GCP console. Create service account with JSON key.

3. Add Service Account as Owner

In Search Console, add service account email as owner (delegate via GCP group).

4. Prepare URL + Structured Data

Embed JSON-LD for JobPosting or Event. Validate with Rich Results Test.

5. POST to API Endpoint

Use batchCreate method. One URL per call. Handle 429 and retry with exponential backoff.

6. Monitor Quota & Errors

Track dailyUsage and errors via Cloud Monitoring. Adjust priority based on business impact.

Data table

Indexing API: Common Errors & How to Fix

Error Code	Error Message	Root Cause	Fix / Workaround
400	No matching structured data found	URL missing required schema (JobPosting, Event, etc.)	Add JSON-LD and test with Rich Results Test tool.
403	Permission denied	Service account not added as owner in Search Console	Verify property ownership. Add service account via GCP group delegation.
429	Quota exceeded	More than 200 URLs/day or request rate too high	Reduce batch rate. Use exponential backoff. Prioritize high-value URLs.
500	Internal error	Google server-side issue (rare)	Retry with backoff (2s, 4s, 8s). If persists, submit to Google Issue Tracker.
409	URL already pending	Duplicate submission within brief window	Deduplicate queue. Skip URLs that returned 200 in last hour.

Worked example

Worked Example: Python Script for Job Listings

Assume a job board with 2,000 new listings per day. Quota is 200. You must choose. Priority: 1) Paid listings, 2) Listings from top 50 companies, 3) All others. Script below (Python) reads a CSV with columns: url, priority, title, datePosted. It sorts by priority, takes top 200, and sends each to the Indexing API.

from google.oauth2 import service_account from googleapiclient.discovery import build import csv, time SCOPES = ['https://www.googleapis.com/auth/indexing'] creds = service_account.Credentials.from_service_account_file('key.json', scopes=SCOPES) service = build('indexing', 'v3', credentials=creds) with open('jobs.csv') as f: reader = csv.DictReader(f) urls = sorted(reader, key=lambda r: (-int(r['priority']), r['url'])) daily = 0 for row in urls: if daily >= 200: break body = {'url': row['url'], 'type': 'URL_UPDATED'} try: service.urlNotifications().publish(body=body).execute() print(f'OK {row["url"]}') except Exception as e: print(f'FAIL {row["url"]}: {e}') time.sleep(1) daily += 1

Edge case: if a URL returns 409 (already pending), log it and skip. Do not retry — you burn quota. After submission, run a check with the index status checker. Typical result: ~80% indexed within 2 hours, ~95% within 24 hours. The remaining 5% often have no structured data or are blocked by robots.txt.

Field notes

Quota Limits & Operational Realities

200 URLs per day per property. No batch endpoint. A site with 10,000 pages takes 50 days to notify. That's not automation — that's a trickle. The canonical URL mismatch is a silent killer. Google might index a version with a trailing slash, and your API call uses the non-slash variant. Both get indexed, splitting signals. Canonical confusion wastes quota and dilutes ranking. Always canonicalize URLs before sending.

Another operational failure: sending URLs that return 404 or redirect. The API accepts them, but Google treats them as low-quality. You burn quota on dead pages. Filter your list: resolve each URL server-side, check HTTP status 200, and discard non-200. Also remove URLs with noindex meta or X-Robots-Tag. Empty results from your CMS pipeline? That's a data pipeline bug, not an API problem.

Pre-Submission Checklist

1

URL is live and returns HTTP 200 (not 3xx, 4xx, or 5xx).

2

Page contains valid structured data (JobPosting, Event, or BroadcastEvent).

3

Canonical URL is consistent across sitemap, rel=canonical, and API call.

4

Page is not blocked by robots.txt or meta robots noindex.

5

Service account is added as owner in Search Console (via GCP group).

6

Quota not exceeded today (check via Cloud Monitoring or Search Console API).

7

URL not already submitted in the last hour (deduplication cache).

8

Content is unique and substantial (Google may ignore thin pages).

FAQ

What is the Google Indexing API and how does it work for large websites?

The Indexing API lets you programmatically notify Google when a URL is added or updated. It's designed for time-sensitive content like job postings or events. You send a URL plus structured data; Google decides when to crawl. For large sites, quota is 200 URLs/day per property, so you must prioritize high-value pages.

How do I set up the Google Indexing API for a multi-site agency?

Create a separate GCP project per client or use a single project with multiple service accounts. Each property needs its own service account email added as owner in Search Console. Watch out: quota is per property, not per project. If you manage 50 sites, that's 200 URLs/site/day total, not a shared pool.

What structured data types are required for the Indexing API to work?

Only three schema.org types are accepted: JobPosting, Event, and BroadcastEvent. The structured data must be embedded as JSON-LD on the page. If the URL lacks valid structured data, the API returns a 400 error. Validate with Google's Rich Results Test before submission.

How do I handle Google Indexing API quota limits for bulk submissions?

Quota is 200 URLs/day per verified property. No batch endpoint exists. To handle bulk, prioritize: assign scores to URLs (e.g., based on page traffic or business value), submit only the top 200 daily. Use Cloud Monitoring to track usage. If you need more, create additional properties (e.g., subdomain per country).

What are common Google Indexing API errors and how do I fix them?

400: missing structured data — add JSON-LD. 403: permission denied — add service account as owner in Search Console. 429: quota exceeded — reduce rate and retry with backoff. 409: duplicate — deduplicate queue. 500: server error — retry with exponential backoff. Always log and analyze errors per URL.

How do I check if Google indexed a URL after using the Indexing API?

Use the URL Inspection tool in Search Console. Or call the Search Console API's urlInspection.index endpoint. A faster method for bulk: use a third-party index status checker like <a href='https://teletype.in/@speedyindex/check-if-google-indexed'>this one</a>. Expect ~80% indexed within 2 hours, ~95% within 24 hours if structured data is valid.

Why does Google Indexing API sometimes ignore my URL submission?

Common reasons: URL lacks required structured data, URL returns redirect or 404, page has noindex tag, content is thin or duplicate, canonical URL mismatch, or quota was exceeded earlier. Google also may ignore if the page is not deemed fresh enough. Validate each URL against the checklist before submission.

Can I use the Indexing API for backlinks or guest post indexing?

No. The Indexing API is only for pages with JobPosting, Event, or BroadcastEvent structured data. It is not for generic content or backlinks. Using it for non-qualifying pages will return 400 errors and waste quota. For guest posts, use standard sitemap submission and internal linking.

What is the difference between URL_UPDATED and URL_DELETED in the Indexing API?

URL_UPDATED tells Google a page is new or changed. URL_DELETED indicates the page no longer exists (returns 404 or 410). Use URL_DELETED for removed job listings or expired events; it helps Google remove them from search faster without waiting for a recrawl. Both count toward the 200/day quota.

How do I automate the Indexing API workflow for a real estate site with 1000 listings daily?

You can't with one property — quota is 200/day. Split listings across subdomain properties (e.g., city1.example.com, city2.example.com) with separate Search Console verification. Each subdomain gets its own 200 quota. Then write a Python script that reads a DB, assigns each listing to a subdomain, and submits the top 200 per property per day.

Field notes

Operational Failure: The Duplicate List Trap

We worked with a client who sent the same 200 URLs every day for a week. They thought the API was persistent. It's not. Each call is an independent notification. Google may ignore repeated submissions if content hasn't changed. Worse: they sent URLs with different query parameters (e.g., /job?id=123 and /job/123). Both were indexed, creating duplicate content. Canonical confusion wasted their quota. Fix: deduplicate at the source. Store canonical URLs in a DB column, and only submit unique ones that have actually changed. Use the canonical URL guide to normalize.

Another edge case: slow vendors. One API call takes ~500ms. 200 calls = 100 seconds. If your script runs synchronously, it blocks. Solution: use async I/O or parallel threads with rate limiting. Python's asyncio with aiohttp works well. But don't exceed 10 concurrent requests — Google returns 429 even within the 200 quota if you blast them. Throttle to 5 requests/second.

Production Deployment Steps

Create a dedicated GCP service account with Indexing API enabled. Store the JSON key securely (e.g., Secret Manager, not in code).
Verify your domain in Search Console. Add the service account email as an owner via the property's user settings.
Build a URL pipeline: extract from CMS, filter by structured data presence, canonicalize, and deduplicate.
Implement rate limiting: max 5 requests/second, with exponential backoff on 429 errors (1s, 2s, 4s, max 30s).
Log all submissions: URL, timestamp, HTTP status, error message. Send logs to a monitoring system.
Schedule the script to run once daily, just after midnight PT when quota resets. Use a cron job or Cloud Scheduler.
Monitor quota usage via Cloud Monitoring metric 'indexing.googleapis.com/quota/usage'. Alert when >180 URLs used.

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.

Expected monthly value, USD Average waiting time, days

Next reads

Related guides

↗

Main guide

↗

Google Not Indexing Your Site? 10 Diagnostic Checks

↗

XML Sitemap Best Practices: Create and Submit for Faster Indexing

↗

Add Website to Google Checklist: 15 Steps to Full Indexing