Hello Team, I am working on web scraper tool in Clay and trying to pull the company name and website from the given portfolio link but It is not working properly.
I am using below prompt Act as an expert web data extractor focused on retrieving company information. Access the URL provided in the Company Portfolio link and extract all company names and their corresponding websites. Ensure to navigate through additional pages if the portfolio spans multiple pages. Compile the findings into a structured JSON format with fields: "company_name" and "website". If specific data retrieval fails, record "Company or website not found" for those entries.
To properly scrape company information using Clay, follow these steps:1. Install the Clay Chrome Extension from the Chrome Web Store if you haven't already. 2. Once on your target portfolio website: • Click the Clay Chrome Extension icon in your toolbar • Select "Select Data" option • Hover over the company information you want to extract • Click on two examples of company names to help Clay match the data points • Name your list (e.g., "Portfolio Companies") 3. For specific data extraction: • Choose the attributes you want to scrape (Company Name, Website) • Create and save your list • Use "Add to Workspace" to import the data into ClayNote: For popular websites, Clay often pre-maps data points automatically to make the process easier. Source: Get transcript of any YouTube video in seconds | Clay
Looks like you're still looking for help. We're escalating this over to our support team and someone should be in touch soon. In the meantime, feel free to include any other information that may be helpful in solving your issue!
https://downloads.intercomcdn.com/i/o/w28k1kwz/1314021408/474995969b469a45bdc0bd5c57de/CleanShot+2024-12-28+at+_553oghILsN%402x.png?expires=1735391700&signature=8f324d25fb310b22ee9dc6a3a16ceb5efa069dac55933a5ab2d9703905b3def8&req=dSMmEsl8nIVfUfMW1HO4zSjG8HYd4oDMU4K%2BG3JoSZwXHFya%2FV2zsIhoRWxq%0AXJ6q%0A Hey, thanks for reaching out - Which rows were you saying it wasn't working properly? It seems like it's working on my end?
Hey there - just wanted to check in here to see if you needed anything else! Feel free to reply back here if you do.
Hi Bo (. In the second row (VentureFriends), Portfolio link has 55 companies but it pulled only 3 companies. Kindly review below portfolio link https://www.venturefriends.vc/portfolio
I want all the companies and their website to be pulled.
I see - For this use case, you would need to approach this in a different way. Since each page has sub-pages and within those, they have more information on the company, you would need to do this like this: 1. Use our Chrome Extension to scrape the page and have all the sub-page links, export it to a new clay table. 2. Use our "Find Data from Page" enrichment and the recipe within our Chrome Extension to extract the data from the sub-page. This will allow you to get all their company information from their pages. It seems like they don't have a domain, so I would recommend using a "Domain" waterfall found in the enrichment panel by clicking the "add enrichment" button at the top right of the page as well.
thanks for explaining the process Bo (. So in this process I need to open each portfolio URL and use the Clay extension which is very similar to the manual work we are currently doing. We have around 100s of portfolio URL, there is no any automation or recipe through which I can automate in the the Clay table itself?
Let me know what you think of doing it like this instead: 1. First table: Scrapethe portfolio pages and use "Write to Other Table" to collect all URLs and their source companies (make sure to send a value that can be matched in the "other data from this table part" 2. Second table: Add "Scrape Website" and set a rule to scrape non-root domains (not domain.com format). That can be specified in the Run Conditions with the AI Formula. E.g don't run if /domain isn't a root domain. Use "Write to Other Table" again to: - Send root domains to table 3 - Include source company data 3. Third table: You'll have your collection of root domains with data - filter and clean as needed 4. Return to table 1 and use "Lookup" to match companies from table 3, bringing in all discovered root URLs
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!
Hi Bo (. thanks for sharing the process, I have tried but not able to implement properly. Could you please implement in the below table so I can review and get to understand. https://app.clay.com/workspaces/289883/workbooks/wb_MGwqExw24fBz/tables/t_JSRnNMMqYtjr/views/gv_CvdobhQZEjXD
Hey Sachin! Thanks for reaching out. Happy to help✨. We'll add an example and get back as soon as it's ready. Just confirming what you'd like to set up here: 1. Extract the companies listed on the company portfolio page 2. Enrich those companies and clean up their urls
Hi Daniela D. I just wanted to extract companies and their website (if not there enrich it) from the portfolio page.
Hey Sachin. Thanks for the reply! I added an example workflow to your table and made a quick video walking through the steps we took: https://www.loom.com/share/fc275a6593424c7f9727f4458fd5d952 Let me know if you have any questions!
This thread was picked up by our in-app web widget and will no longer sync to Slack. If you are the original poster, you can continue this conversation by logging into https://app.clay.com and clicking "Support" in the sidebar. If you're not the original poster and require help from support, please post in 02 Support.