How To Easily Find All Of The Sitemap.xml Files In Python

James Phoenix
James Phoenix

To effectively analyse websites, knowing how to download all of the sitemap.xml files for a particular website is an incredibly useful skill.

Forunately, there are python packages that allow us to easily download all of sitemap.xml file’s with brute force!


NB: If you’re using a standard python environment, then simply exclude the ! symbol. The reason for using !pip install is because this guide is written in a jupyter notebook.

!pip install ultimate-sitemap-parser
!pip install requests
from usp.tree import sitemap_tree_for_homepage
import requests

Download all of the Sitemap.xml files based upon the URL of the homepage:

After running the following method, we’ve found all of the sitemap files and have saved them to a variable called tree:

tree = sitemap_tree_for_homepage('https://website.understandingdata.com/')
print(tree)

sitemap_tree_for_homepage() returns a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on a given website.

To find all of the pages we can simply do:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

Also, you can save of the URLs to a new variable via a list comprehension:

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!
urls = [page.url for page in tree.all_pages()]
print(len(urls), urls[0:2])

Conclusion

Now you’ll hopefully be able to easily find all of the sitemap.xml files and the web pages in just a few lines of python code!

TaggedPython For SEO


More Stories

Cover Image for Soft Skills for Programmers: Why They Matter and How to Develop Them

Soft Skills for Programmers: Why They Matter and How to Develop Them

Overview You need a variety of soft skills in addition to technical skills to succeed in the technology sector. Soft skills are used by software professionals to collaborate with their peers effectively and profitably. Finding out more about soft skills and how they are used in the workplace will help you get ready for the job if you are interested in a…

James Phoenix
James Phoenix
Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix