Any way to scrape data from a raw XML page?
π€ You've caught us outside of our support hours (9am-9pm EST), but don't worry - we'll be back in touch within 24 hours (often sooner!). If you haven't already, please include the URL of your table in the thread below so that we can help you as quickly as possible!
Hey there, you have a few options to choose from. You can utilize any scraper API or our scrape website integration. Another option is the Clayagent with the following prompt: Extract structured data from the XML located at /XML_page . Method: 1. Access the XML URL and parse the complete XML structure. 2. Extract crucial data points like "to", "from", "heading", and "body". Output this information in JSON format with keys corresponding to each element name. Handle errors by returning "Data extraction failed at URL: https://www.w3schools.com/xml/note.xml" if the URL is inaccessible or the data cannot be parsed. Let me know if you need further guidance or have any questions!
Tried this and prompts 10x more complicated than this. This is the page: https://projects.propublica.org/nonprofits/download-xml?object_id=202401939349301015
Happy to look at it. It seems like converting the XML to JSON might be the way to go since our http integration doesn't play well with XML files. You could try https://nocodeapi.com/docs/xml-parser-api/ One workaround could be sourcing the data from a particular non-profit if we have a URL available. Another option might involve using automation tools like n8n to parse the data before sending it through a webhook. What else have you tried?
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!