Hey everyone, im trying to extract data from a website using clayagent. However i am struggling to find the right prompt. the websites are always built the same way, theoretically all information can be pulled from the same div class from the html. the target website looks like this. the expected output would be the website, e-mail and phone number from the right hand side (see screenshot). can somebody help me with the write prompt? (will be using 4o or 4o mini) 🙏
what's your current prompt?
Extract the following four pieces of information from the given webpage: 1. Website URL 2. Email Address 3. Telephone Number Rules: • Focus only on information near or within sections labeled with “Kontakt”, “Contact”, or near the company name/description. • Ignore any values found in the footer, header, or social media links (e.g., Twitter, LinkedIn). • Do not use any contact information from sections that include “Follow Us”, “Newsletter”, or generic terms like “Support”. • Prefer values located within the main body of the page rather than sidebars or peripheral elements. Return the output in this format: • Website: [URL] • Email: [Email Address] • Telefon: [Text]
_______ Actually, there might be a different issue, as the content on the target url has an age check, clayagent might not be able to access the data?
What's the current output with this?
Yes, if it has an age check it will not be able to proceed. The same happens with websites requiring logins.
it does work for some rows tho - super weird
is there a way to have clay click the ‘over 18’ cta?
Maybe the website doesn't age check prompt for the rows that have filled
So it could access the info
Try the same prompt in other websites without an age check
If it works, you know it's that. If it doesn't, you gotta refine your prompt
its alwayts the same website, its the exhibitor detail page from a conference page. so will always be the same url except for exhibitor id
Got you. Try on a different website without an age check prompt at all.
If Claygent doesn't give you a specific error, it might be because the prompt actioned that time
But it usually either returns something, or gives out an error
yeah my prompts usualyl work on different websites
is there anyway to bypass it for this table?
Does the website URL change after the age prompt? If so, you can do a quick formula to give you the post age prompts URL and start scraping from there
unfortunately not, they use cookies for that
can i train clayagent to accept the popup and get the cookie?
Not that I'm aware. Claygent simply scrapes what it can see on the website with the added benefit of using AI so it can run logic
Hi Julian, thanks for reaching out! Appreciate you Oriol for jumping in here and sharing this great Claygent prompt! How is that working for you now? Another solution you can try here is also using the "Run Zenrows Scrape" enrichment with the auto-parse setting enabled. This will automatically parse the website scrape for website urls, emails, and telephone numbers. Feel free to give this a try for the rows that don't work with Claygent and let me know if this resolves your issues!
zenrows is a good options, but is there a way to scrape only certian type of Links on pages (that reflects only certain URL pattern?)
Hi Alberto! Yes, Zenrows does provide a way to scrape specific types of links on a page, especially when using tools like BeautifulSoup. You can find more detailed information on how to do this in the Zenrows documentation here: Zenrows Documentation Let me know if you need further clarification or help! 😊
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!