Audit Every Page on Your Site from sitemap.xml in One Command

By Codcompass Team·2026-05-12·5 min read

"Audit my site" almost never means one URL. It means the homepage, the pricing page, the top twenty blog posts, every product, every category, every location page. On any real site that's hundreds to thousands of URLs, and clicking each one through a free checker is not a workflow — it's a way to lose an afternoon.

This post walks through the workflow we recommend instead: pull your sitemap.xml, hand the URL list to the batch audit endpoint, and export a single CSV ranked by priority. It's ~40 lines of Python. It works on any site that publishes a sitemap. And the size of your site tells you which plan tier to start on.

The whole script

import csv
import os
import xml.etree.ElementTree as ET
import requests

API_KEY = os.environ["SEOSCORE_API_KEY"]
SITEMAP_URL = "https://example.com/sitemap.xml"
BASE = "https://api.seoscoreapi.com"

# 1. Fetch the sitemap
xml = requests.get(SITEMAP_URL, timeout=20).text
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
root = ET.fromstring(xml)
urls = [loc.text for loc in root.findall(".//sm:url/sm:loc", ns)]

print(f"Found {len(urls)} URLs in sitemap")

# 2. Batch audit (chunked at 50 to stay under per-call limits)
rows = []
for i in range(0, len(urls), 50):
    chunk = urls[i:i + 50]
    r = requests.post(
        f"{BASE}/audit/batch",
        headers={"X-API-Key": API_KEY},
        json={"urls": chunk},
        timeout=180,
    )
    r.raise_for_status()
    for result in r.json()["results"]:
        rows.append({
            "url": result["url"],
            "score": result.get("score", 0),
            "grade": result.get("grade", "F"),
            "seo": result.get("categories", {}).get("seo", 0),
            "performance": result.get("categories", {}).get("performance", 0),
            "accessibility": result.get("categories", {}).get("accessibility", 0),
            "ai_readability": result.get("categories", {}).get("ai_readability", 0),
            "priority_issues": len(result.get("priority", [])),

})

3. Sort worst-first and write CSV

rows.sort(key=lambda r: r["score"]) with open("audit-report.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=rows[0].keys()) writer.writeheader() writer.writerows(rows)

print(f"Wrote {len(rows)} rows to audit-report.csv") print(f"Worst page: {rows[0]['url']} ({rows[0]['score']})")


Enter fullscreen mode Exit fullscreen mode

That's the whole thing. Drop it in a audit.py, set SEOSCORE\_API\_KEY, run it, open the CSV in whatever spreadsheet you like.

## [](#why-batch-matters)Why batch matters

Running 500 URLs through GET /audit one at a time means 500 round trips, 500 rate-limit hits, and 500 chances for a transient error to break the loop. POST /audit/batch accepts up to 50 URLs per call, runs them concurrently on our side, and returns a single response. For 500 URLs you do 10 batch calls instead of 500 sequential ones, and the whole audit finishes in two or three minutes.

Batch is not available on the free tier — it's the line where a free SEO checker stops being useful and an API starts paying for itself.

The cheapest tier that lets you re-audit at the cadence you actually want is the right tier. If your sitemap has 3,000 URLs and you want a Monday-morning snapshot every week, that's 12,000 audits/month — Pro covers it with room to spare.

## [](#handling-sitemap-indexes)Handling sitemap indexes

Most large sites don't publish a flat `sitemap.xml`; they publish a sitemap _index_ that points at child sitemaps. The script above breaks on those. Two extra lines fix it:

def collect_urls(sitemap_url): xml = requests.get(sitemap_url, timeout=20).text root = ET.fromstring(xml) # Sitemap index — recurse if root.tag.endswith("sitemapindex"): urls = [] for child in root.findall(".//sm:sitemap/sm:loc", ns): urls.extend(collect_urls(child.text)) return urls # Regular sitemap return [loc.text for loc in root.findall(".//sm:url/sm:loc", ns)]

urls = collect_urls(SITEMAP_URL)


Enter fullscreen mode Exit fullscreen mode

That's enough to handle WordPress (Yoast/Rank Math both publish indexes), Shopify (one index per resource type), and most enterprise CMS setups.

## [](#make-the-csv-actionable)Make the CSV actionable

Sorting by score gets you "worst pages first." That's a start, but the high-value moves are usually:

-   **High-traffic pages with mid-tier scores.** A blog post with 12,000 pageviews/month and a score of 72 is worth fixing before a product page with 40 pageviews and a score of 41.
-   **Pages with a low _category_ score even if overall is fine.** A product page scoring 85 overall but 58 in accessibility is an ADA risk you don't want to ignore.
-   **Pages that _just_ regressed.** That's where historical tracking comes in — see the [historical SEO score tracking post](https://dev.to/blog/historical-seo-score-tracking) for the `/history` endpoint that adds month-over-month deltas to each row.

To join your audit report to traffic data, export GA4 or Search Console to CSV and merge in pandas:

import pandas as pd audit = pd.read_csv("audit-report.csv") traffic = pd.read_csv("ga4-pages.csv") # url, pageviews joined = audit.merge(traffic, on="url", how="left").fillna(0) joined["impact"] = (100 - joined["score"]) * joined["pageviews"] joined.sort_values("impact", ascending=False).head(50).to_csv("priority.csv")


Enter fullscreen mode Exit fullscreen mode

That gives you a 50-row priority list ranked by _expected impact_ of a fix, not just by raw score. The pages that show up at the top are the ones that are bad _and_ matter.

## [](#scheduling-it)Scheduling it

Once the script works, the obvious next step is running it weekly. A `cron` line on any server:

0 8 * * 1 /usr/bin/python3 /home/you/audit.py >> /var/log/seoaudit.log 2>&1


Enter fullscreen mode Exit fullscreen mode

Or as a GitHub Action that posts the diff to Slack:

name: Weekly SEO audit on: schedule: - cron: "0 8 * * 1" jobs: audit: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: {python-version: "3.11"} - run: pip install requests - run: python audit.py env: SEOSCORE_API_KEY: ${{ secrets.SEOSCORE_API_KEY }} - uses: actions/upload-artifact@v4 with: name: audit-report path: audit-report.csv


Enter fullscreen mode Exit fullscreen mode

The agency-monitor-setup post has a [more involved Slack-alerting variant](https://dev.to/blog/agency-seo-monitor-setup) if you want to skip the artifact and just get pinged when scores drop.

## [](#what-you-do-not-want-to-do)What you do not want to do

A few traps we see people fall into:

1.  **Auditing the homepage and assuming the rest follows.** Templates differ. Product pages and blog posts on the same site routinely score 15+ points apart. If you haven't sampled the long tail, you haven't audited the site.
2.  **Hammering the API with 500 separate `GET /audit` calls instead of using batch.** It's slower, it hits rate limits, and on Pro you'll eat through your monthly cap five times faster than you needed to.
3.  **Treating the CSV as a static deliverable.** The first audit is a baseline. The value compounds when you run it weekly and watch the trend — which is exactly what the [historical endpoints](https://dev.to/blog/historical-seo-score-tracking) are for.

## [](#getting-started)Getting started

Pull your sitemap URL, copy the script above, set `SEOSCORE_API_KEY`, and run it. If your sitemap has more than a few hundred URLs, [grab a Basic key](https://dev.to/upgrade?tier=basic) so you've got room to re-run the audit on a weekly cadence. The first run gives you a baseline; the fourth run is where the trend becomes useful.

If you've got an enterprise sitemap with 10,000+ URLs and want help architecting the right batch size and cadence, the [Ultra tier](https://dev.to/upgrade?tier=ultra) ships with that headroom built in.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

The whole script

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle