Hi, I am looking for an easy scraper. With InstantDataScraper, I can scrape any site but the one feature it misses is how you can scrape websites without a "next page button" but where you need to navigate through a parameter in the URL, e.g. https://contactout.com/dashboard/search?keyword=AI&location=New%20York%2C%20New%20Yo[…]page=2&title=Software%20engineer where I could navigate to the next page with page=3 etc. Any tips on tools that allow this and have a similarly simple scraping setup as Instant Data Scraper (not selecting every column one by one)?
Thanks for reaching out! To confirm you're looking to scrape paginated data correct? For this type of pagination-based scraping, you'll actually want to use a crawler rather than a standard scraper. I'd recommend checking outApify's website content crawler which can handle this kind of URL-based pagination and lets you specify how many pages to crawl. One important heads-up though: While you can use this as an enrichment in Clay tables, you might hit cell size limits (around 8KB or ~8000 characters). For larger datasets, you'd probably want to run the crawler directly in Apify and send the data to a webhook instead.
Hey there - just wanted to check in here to see if you needed anything else! Feel free to reply back here if you do.
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!
Thanks a lot Bo! Any tips on how to deal with larger cell sizes in general?
Hey! Yes, absolutely. Here's how: - Targeting only needed fields - Breaking large requests into smaller chunks (or other enrichments) - Setting field-specific filters Basically, the only way is to ask for less data. Depending on what you'd like to extract, if you see this a lot, it's better to try to work around this by being more selective.
Okay makes sense, but then we would need to do that outside Clay right? The LinkedIn enrichment API we use returns JSONs that are longer than 8KB and then we cannot process / shorten them in Clay. Any best practices where we should call it (other than locally), process it and can then send it to Clay?
Hey there, jumping in for Bo here, to clarify Bo is stating that when running this call inside of Clay it is best to limit the amount of data that you want is best as you can to avoid hitting the size limit for cells. As of right now this is the method we recommend as our team looks into other ways of addressing this issue.
Okay thanks. The problem is that if the API returns larger data, I cannot process it in Clay to reduce the size to <8KB but will then look for other tools as well
Hey! I'm back ! :) For LinkedIn enrichment data over 8KB, you'll need to use field path filtering to specify exactly which data you want - similar to JSON dot notation. You can see this at the bottom of the HTTP API enrichment. For example: experience.company_name, experience.title, education.school_name This way, you only pull the specific fields you need, keeping responses under the size limit. What were you trying to get back from the enrichment? https://downloads.intercomcdn.com/i/o/w28k1kwz/1351647889/10aedd5e44a1c7f1321985c9024f/CleanShot+2025-01-25+at+_35BqdsBvy2%402x.png?expires=1737828000&signature=accf2177be0346b3172ef447d98b7ff9dfb79cc5008aaf8d13ab36f96b00f2e8&req=dSMiF896molXUPMW1HO4zUTWo7v%2FL8fJu7YhdrTwsZvC7SUoC7oDGI0v%2BjQU%0AN4Ye%0A
Thanks, Bo 🙂 Not sure if I understand correctly. From most affordable LinkedIn APIs I can use, it seems like the output is too long to fit into one field (if not imported as a JSON object). I tried the stringify function to put it into one column {{LinkedIn API Stringify (2)}}. It contains the "column"="headline". Could I also use a formula like {{LinkedIn API Stringify}}?.headline? Or do I need to already restrict it when making the API call and how could I extract "nested items" from the JSON then?
Hey! I'd need to see your actual table setup to better help (please send the table URL), but let me explain: 1. For accessing API data: - Direct field access would be just headline if it's a top-level field - For nested data, use data.headline format - You shouldn't need to use {{LinkedIn API Stringify}} Could you share: 1. Your table URL 2. An example of a single row working (if you have one) This will help me give you specific guidance! :)
Hey there - just wanted to check in here to see if you needed anything else! Feel free to reply back here if you do.
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!