Hi all, I was wondering if someone could help me out. I’m currently putting together a target list where I’d like to include the number of products on each website. Right now, I’m trying this using a Claygent website scrape with the Argon model, but I have some doubts about the reliability of the results. Does anyone know of any other / or better ways to approach this?
What type of websites/businesses are you targeting?
We’re looking to target retailers and e-commerce businesses within the Benelux. We’re currently trying to assess the TAM for one of our clients, so the benchmark for the number of products still needs to be defined. That said, we can only set this benchmark if we’re confident that we can accurately determine the number of products on each website. The benchmark will probably be on the higher end in terms of product count, since that makes more sense ROI-wise.
That’s tough - I can’t think of a way to get that information without guessing (educated guess, but still… maybe not 100% accuracy). . Claygent would be fine at grabbing it but most e-com would have several pages, categories, etc. You can ask Claygent to review how many categories they have, how many pages per category, and then how many products on each page. Then run a formula to do the math. You’ll maybe want to split this into two agents. One that grabs all the info, then the other who makes sense of the info. I believe it can be done but it’ll take some trial and error on the Claygent prompting. You might also want to find someone who’s excellent at scraping and see if they can figure this out.
Alright, thank you! I’ll give this some trial and error with the existing customer base to see if the numbers match up.
Website scrapes with Claygent can vary in accuracy a better approach is to use site specific APIs or structured data sources if available, otherwise manually verify samples to confirm the model’s consistency.
Good question! If you’re trying to count the number of products on each site, Claygent with Argon is one valid route, but you’re right — it’s not 100% reliable, especially for sites with dynamic content or e-commerce platforms like Shopify/WooCommerce where products load via JavaScript. A few better approaches you can try:
BuiltWith / StoreLeads / SimilarTech these can detect the e-commerce platform and sometimes return product counts.
Google Shopping / Sitemap method use Claygent to look for /sitemap_products.xml or /collections and count the entries — much more accurate for Shopify sites.
Apify e-commerce scrapers — they have actors for Shopify, WooCommerce, and generic product scrapers you can trigger via API inside Clay.
If you want to stay fully native, you can run Claygent with a strict output format (JSON schema with “product_count”) and ask it to “count listed products visible on the page excluding navigation, blog, or category items.” That improves consistency.
So — reliability depends on your target stack: if it’s mostly Shopify or WooCommerce, the sitemap or Apify route will be most accurate.
