How To Easily Find All Of The Sitemap.xml Files In Python

James Phoenix
James Phoenix

To effectively analyse websites, knowing how to download all of the sitemap.xml files for a particular website is an incredibly useful skill.

Forunately, there are python packages that allow us to easily download all of sitemap.xml file’s with brute force!


NB: If you’re using a standard python environment, then simply exclude the ! symbol. The reason for using !pip install is because this guide is written in a jupyter notebook.

!pip install ultimate-sitemap-parser
!pip install requests
from usp.tree import sitemap_tree_for_homepage
import requests

Download all of the Sitemap.xml files based upon the URL of the homepage:

After running the following method, we’ve found all of the sitemap files and have saved them to a variable called tree:

tree = sitemap_tree_for_homepage('https://website.understandingdata.com/')
print(tree)

sitemap_tree_for_homepage() returns a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on a given website.

To find all of the pages we can simply do:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

Also, you can save of the URLs to a new variable via a list comprehension:

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book
urls = [page.url for page in tree.all_pages()]
print(len(urls), urls[0:2])

Conclusion

Now you’ll hopefully be able to easily find all of the sitemap.xml files and the web pages in just a few lines of python code!

Topics
Python For SEO

Newsletter

Become a better AI engineer

Weekly deep dives on production AI systems, context engineering, and the patterns that compound. No fluff, no tutorials. Just what works.

Join 306K+ developers. No spam. Unsubscribe anytime.


More Insights

Cover Image for DRY: Dev Utils Panels Beat Manual State Setup

DRY: Dev Utils Panels Beat Manual State Setup

Every repeated setup ritual is an undeclared API waiting to be formalised. Build the panel once, skip the ritual forever.

James Phoenix
James Phoenix
Cover Image for How to Protect Your Coding Harness from Longer Integration Test Runs

How to Protect Your Coding Harness from Longer Integration Test Runs

Integration test suites get slower under LLM-assisted development. Not because the tests themselves are fundamentally slow, but because agents have every incentive to bump the timeout when they hit it

James Phoenix
James Phoenix