Best Practices for Processing School Websites Data Scraping
Hello guys, Would like to know what would be iyo the best practice for my usecase.
- 1.
I’ve a list of schools websites
- 2.
I’m using Claygent to scrape the headmaster of the school (mentioned in the footer or in a google search)
- 3.
Then I’ve a lot of “cleaning steps” :
If it doesn’t have the first name, it fills both the firstname and the lastname with the fullname
I’ve to put some “proper” function in order to have like “Name” instead of “NAME”
I’ve to look manually if all the firstnames are filled...etc. etc.
And I’ve 5k schools to do What’s the best practices in your opinion ? Build a waterfall with all that different steps ? Build a big prompt for the claygent in order to do all that at once ? Build a lot of different formulas ? Would love to know what is working best for you in this case