Clay S. Iβm using claygen to deal with paginations. I know this is not very good. Iβm trying to scrape the database with multiple pages for example like the one here. https://connect.health.gov.bc.ca/assisted-living-residence Any recommendations on the prompt? Or method.? Next, I also noticed that some data when I use AI scraper on the Internet is not that accurate , what is the best way to find accurate data on the web scrapper or to verify data? For example, Iβm in the healthcare niche , instead of pulling contact information for a care home in Canada and hold the information with a similar name in United States. Is there any way to avoid this?
When dealing with web scraping, especially for healthcare data, here's a strategic approach to ensure accuracy: 1. First, evaluate if web scraping is the best approach for your needs: β’ Consider if the data is available through other Clay enrichment methods β’ Determine if web scraping will provide more reliable results for your specific use case 2. For healthcare data accuracy: β’ Use multiple Apify actors in combination to cross-validate the data β’ Create comprehensive datasets by combining different data sources β’ Implement proper data validation and verification steps 3. To improve location-specific accuracy: β’ Be explicit in your search parameters β’ Use location-specific identifiers in your queries β’ Cross-reference data points across multiple sourcesRemember to approach web scraping responsibly and ethically, ensuring compliance with website terms of service and data protection regulations. Sources: Intro to Web Scraping [Limitless Research] - Clay University Deep Dive: Apify Actors [Limitless Research] - Clay University
Talk to a real person
Looks like you're still looking for help. We're currently outside of our standard office hours (9 AM to 9 PM EST, Monday through Friday), so it may take a little longer for us to respond.If youβre dealing with a specific table, drop the URL below so we can help you quicker. Otherwise, someone from our team will be in touch soon!
Did that answer your question?
Hey Edward, thanks for reaching out. The pagination challenge is real and seen across many website scrapers. It may be worth testing out a website crawler from Apify such as the Website Content Crawler. The prompt can only handle so much especially with performing tasks such as clicking the next page, collecting all of that data, clicking again. I think to gather the most accurate data when it comes to Claygent specifically, providing as detailed of context and steps about the specific task you're trying to perform may be helpful. Additionally in the case it seems like we know the location or at least country for said healthcare facility. This location could be another data point you add in the prompt (i.e. use this company name and it's location to find the website) or however your prompt is setup. Speaking of this, do you mind sending the link (url) to the table so we can take a closer look and find some workarounds? https://downloads.intercomcdn.com/i/o/w28k1kwz/1347419411/65baf32e1d2a789ffba395c6d35c/CleanShot+2025-01-22+at+_42I1FzxXHf%402x.png?expires=1738701000&signature=9ebd3df9c9d2a4c6dca8d5d0cb221e6c2c1abe6a130b7b5271cce78c3d339c48&req=dSMjEc1%2FlIVeWPMW1HO4zVws4pkwDVWogvYjA82bedBmYA%3D%3D%0A
Hereβs the link
Hey there Edward, jumping in for Owen here, to clarify these are all in Canada correct? As Owen mentioned it would be a benefit to have location of these facility to include in the prompt. What we can do here is we know that all of these are in Canada is create a formula column with the following formula in your table. https://downloads.intercomcdn.com/i/o/w28k1kwz/1365484634/d8b0f2eff19090ddfaaa867afaca/image.png?expires=1738712700&signature=067bea0b74f09283b8f0fa3e8002e338c6b16507343ed304f2fac0b55a4a995e&req=dSMhE812mYdcXfMW1HO4zQ6COVFis6VIZdVG%2F9wPna4k68Mi%2F%2F%2BRc7TmPVnG%0AN1NR%0A This will help us given a country location to help the AI better focus in on just Canada. Afterwards we could use another formula to combine the "Facility Address" column, "Facility City" column and our Country formula above into a single address column to include in our prompt. From we can also use the AI prompt helper to help rewrite the prompt into an input that the AI integration can better read and execute. This AI prompt helper can be found by selecting the "Help Me" option in the bottom right of the prompt menu. https://downloads.intercomcdn.com/i/o/w28k1kwz/1365488580/529120dd9fd5d4b4bbbaa45e3ee1/image.png?expires=1738712700&signature=3cceb41b8506ba5d79620c71a174f079633fb5a08543bc8b2efbd6204bb0fa0d&req=dSMhE812lYRXWfMW1HO4zaKNUzNKeD1VAj%2F3v2QFPFd2GEy23FQTDcKemxVE%0AUstM%0A
Hi team, That is a very impactful refinement. Thanks for the insight. I believe this will already help. One other thing Iβm noticing is my column count is getting pretty high and the database is getting more and more messy . Do you have any suggestions on cleaning the database and organizing it?
Hey there Edward, a suggestion that I would have to clean up your table would be to see if there are any columns to you can remove because they are not need. If this is solely because you want the data base to look cleaner, my other suggestion would be to use formulas to combine any inputs together that can be combined, just like we did above with address and city. And then to hide the actual columns we combined so that they are not visible and making the table look cluttered.
I Acknowledge your suggestion. I believe it is disorganized because I have both company data and personal data under one table. In addition to this, I was trying to extract more people data from the organizations. The original data were extracted from directories or claygen, is there anyway to merge the two databases of people data?
Hey there Edward, we do have a method to merge two different tables. The following loom shows how this can be done. https://www.loom.com/share/1ae0bfb385094e9d98a8e22f7ed99c40
Much appreciate your help. I hope to become a intermediate user soon.
Thanks! We've reopened this thread. You can continue to add more detail directly in this thread.
