Strategies to Scrape Google Search Results from Deeper Pages | Clay

Strategies to Scrape Google Search Results from Deeper Pages

Hey I want to scrape google results, based on on query searchs. The issue I have, is that I keep starting the searchs form the beginning, but I'd need to start much further behind (e.g. on page #5) to not get allways the same results. Is there any possibility to solve this topic? E.g. going for a different approach?

13 comments

Channeled
APP
·
·
👋 Hey there! Our support team has got your message - we'll be back in touch within 24 hours (often sooner!). If you haven't already, please include the URL of your table in this thread so that we can help you as quickly as possible!
Mustafa A.
·
·
Seems to work via Claygent and visiting the search query url. But I keep getting very very long queue times (of like 1 minute waiting and then having the run erroring). What could fix that issue? 😞
Bo (.
·
·
Hey M, thanks for reaching out! 😊 It sounds like you’re trying to scrape Google results but are running into the issue of starting from the beginning each time. Instead of using our own "search" way to create a table, I would create a new page, paste the current search you have and then: 1. Adjust the search parameters: Instead of starting from the first page, you can modify the query URL to begin at a later page (e.g., &start=50 for page #5). This can be done through Claygent by customizing the search query URL, so you’re not fetching the same results repeatedly. 2. Addressing the long queue times: The speed issue might be related to the specific model you’re using in Claygent. Each model has different processing speeds, and some might be slower depending on the complexity of the task. Try switching models or optimizing the run/prompts. Feel free to send us the table URL as well so we can see what's wrong. Let me know how it goes! 😊
Mustafa A.
·
·
Hey Bo, thank you for your answer 🙂 The run with scraping a certain link via Claygent is working okay. At least using the claygent neon model. But I notice, is that the results (sometimes quite hard) differ from what a manual search would return to me. Maybe I need here to add certain parameters to it, like searching from Berlin, Germany? Overall the idea is to automate the process of scraping data via search queries and searching for the information manually. From this table, where we scrape the google results and send the To this table, where we want to read out the data and make a cross check with our SF, if the lead already exists.
Mustafa A.
·
·
Maybe I'm complecating stuff, but overall it seems to work quite fine (even tho, would be awesome to also know, how many results we can scrape or to have a like breaking point, when the results are getting too far of).
LuisArturo
·
·
Hey there Mustafa thanks for reaching out, one thing that you can try doing in this scenario is to prompt Claygent to return "x number of results from the page", doing so will enable Claygent to start returning a specific number of results to your table. You can test a handful rows at a time to ensure that the number of results you are brining in are still on topic for what you are looking for and once you find a good balance of number results to quality results, you can run for all rows in table. Although will say that allowing Claygent to know how many results it should scrape or to have set breaking point on the number of results to find, without having to include it as part of prompt does sound very useful to have included in Claygent. Will go and bring this up with the team to see if something like this could ever be included.
Mustafa A.
·
·
Hey Luis, thank you for the answer with putting out the result number. But what to do on getting different results scraping via Claygent vs scraping manually? 😞
Daniela D.
·
·
Hey Mustafa! Thanks for the reply. Can you share an example row in your table that returned different results? A few tips to get more accurate results here is checking the models reasoning (see screenshot) for the results returned and updating the prompt instruction based on that. If you can share some example rows, we're happy to take a further look. https://downloads.intercomcdn.com/i/o/1187381129/2a6c89f017bc01464b1f79d2/CleanShot+2024-09-20+at+16_55_30%402x.png?expires=1726850700&signature=ea47e5c3ddcaca8d6fcda3728d57c6bff6bad6670526c51bcf7f8234f23942d9&req=dSEvEcp2nIBdUPMW1HO4zV%2FcPAtQAefuVXRYLABEfQok8UAWEe7yJqp9wqKR%0A0fBk%0A
Channeled
APP
·
·
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!
Channeled
APP
·
·
Hi Mustafa A.! This thread was recently closed by our Support team. If you have a moment, please share your feedback:
Mustafa A.
·
·
Hey Daniela, what I mean is for example in the table e.g. row #2: I have the result form the URL in Clay (screenshot 1) and the ones scraping manually (going from top to bottom result #21 to 30) Used the same link https://www.google.com/search?q=gesundheit+site:digistore24.com&uule=w+CAIQICIGQmVybGlu&start=20 for both. Probably the results also differ not just on the region I force Claygent to search for, but in general from cookies in my browser etc?
Bo (.
·
·
Hey Daniela, Yes, you’re correct. The results can vary due to a couple of factors. Even though you’re using the same link, differences can arise because of: 1. Region settings: Claygent and your browser might have different location preferences, even if you’re forcing a region search. 2. Cookies and personalization: Your browser may display different results based on your browsing history, cookies, and other personalized data, which Claygent won’t take into account. Let me know if you need more clarification! 😊
Mustafa A.
·
·
got it! 👍 🙂