Hey everyone - looking for the best approach to crawling a domain and identifying PDFs and their publish date. I've achieved some success with Python but wondering if there's a better approach? Needs to crawl through the entire domain even if sitemap.xml is not available