Clay Icon

⚡ Scraping Websites just got better ⚡

Scraping Websites just got better Happy Friday, Everyone! I just shipped a change to Clay’s Scrape website integration that makes scraping more reliable and also enables some new features. We’ve added:

  • The ability to delay results (for websites that take a long time to load)

  • The ability to add custom regex to find things we might not track natively (like Wikipedia URLs)

  • The ability to automatically retry when a scrape fails to make sure you’re getting the results you need

We’ve also fixed a few issues with www. being removed when scraping and not rendering Javascript properly when trying to find the body text. Please let me know if you have any issues with our Scrape Website action, and I’m excited for you all to see the improvements we have in the pipeline as well! This is only the beginning 😉.

  • Avatar of Ajay G.

    hey, what's the difference between website scrape and claygent scrape?

  • Avatar of Akshat K.

    Thanks, I also wish we could scrape the top 5 pages of the website, rather than just one page. All at once

  • Avatar of Eric E.

    Ajay G. — The difference is, this scrape website gets you the body text, and pieces of information from a website, but you decide what to do with it from there. Claygent then feeds those scraped datapoints to AI to answer questions automatically Akshat K. - How would you define the top 5 pages of a website? Is this more for scraping lists, or would you want to just look for common sites like about-us contact-us careers etc?

  • Avatar of Ajay G.

    Eric E. I thought Claygent uses GPT-4's native website scrape? Or does it use Clay's scrape + GPT ai?

  • Avatar of Akshat K.

    Eric E. For common pages like about us or blogs, case studies or testimonials, etc

  • Avatar of Nicholas R.

    Solid update. For me, I'd love to be able to scrape articles in a structured format, where we can scrape the title, meta description, Then each header with the corresponding text within the section of that header. Would that be possible to do with the regex or another feature in Clay?