A way to pull all of the HTML from a webpage into Clay? Looking for a way to pull the entire HTML from a webpage into clay! The reason for doing this is to identify the tools used on the page by searching for the names of vendors in the div components. For example, on this site I am looking "ocelot_ai".
You can use Clay's Scrape Website action to extract the entire content from a webpage, including all HTML elements. This will allow you to search through the page content for specific vendor names or div components.To do this: 1. Use the "Scrape Website" action in your Clay workflow 2. Input the target URL you want to analyze 3. The action will retrieve all webpage content, including body text, links, and HTML elements 4. You can then search through this data to identify specific vendor tools by looking for their names in the div components Source: Scrape Website with Clay
Yeah I tried that. However this was the output. It doesn't elements live divs and their IDs ***** Skip to main content UC Berkeley Cal Student Central Toggle navigation Services & Support Student Information Access Student Records Delegate Access Financial Aid Financial Aid Types Financial Aid Support Billing, Fees & Payment Billing Payment & Refunds Waivers & Other Resources Registration & Academic Records Registration & Enrollment Academic Records Chatbot Home / Chatbot Have questions? Weโve got answers! Welcome to Berkeleyโs new artificially intelligent chatbot, that is available to Bears 24/7/365. Get your financial aid, registration, billing, payment, and rec sports questions answered through text and video in English, Spanish, Simplified Chinese and Vietnamese. Disclaimer: This chatbot service is not intended for private or confidential information. To Use the Chatbot (Itโs As Easy As 1-2-3!):Click on the box on the bottom left hand corner of your screen. The box will expand. Ask your question on the line provided and hit โSend.โ (Ex: โWhen is the financial aid deadline?โ) Follow the chatbotโs prompts to get your suggested answers to your questions.Try it today Chatbot Special Features To change the language, click on the box on the left hand corner of your screen. The box will expand. Click on the globe on the upper right hand corner and pick the language you would like to use (English, Spanish, Simplified Chinese or Vietnamese). To download a transcript of your conversation, click to expand the box and click on the three circles on the top right hand corner of your screen. Select โDownload Transcriptโ and it will automatically download as an HTML file. To clear the conversation when you are done, click to expand the box and click on the three circles on the top right hand corner of your screen. Select โClear Conversationโ and it will erase automatically. Services & Areas of Support Student Information AccessStudent Records Delegate AccessFinancial AidFinancial Aid Types Financial Aid SupportBilling, Fees, & PaymentsBilling Payment & Refunds Waivers & Other ResourcesRegistration & Academic RecordsRegistration & Enrollment Academic RecordsSearch Chatbot facebook twitter instagram youtube snapchat Contact Us Report Web Technical Issues Privacy Policy Accessibility Nondiscrimination Copyright ยฉ 2024 UC Regents; all rights reserved window.RS_MODULES = window.RS_MODULES || {}; window.RS_MODULES.modules = window.RS_MODULES.modules || {}; window.RS_MODULES.waiting = window.RS_MODULES.waiting || []; window.RS_MODULES.defered = false; window.RS_MODULES.moduleWaiting = window.RS_MODULES.moduleWaiting || {}; window.RS_MODULES.type = 'compiled'; var ajaxRevslider;function rsCustomAjaxContentLoadingFunction(){ajaxRevslider=function(obj){var content='',data={action:'revslider_ajax_call_front',client_action:'get_slider_html',token:'384765aff4',type:obj.type,id:obj.id,aspectratio:obj.aspectratio};jQuery.ajax({type:'post',url:'https://studentcentral.berkeley.edu/wp-admin/admin-ajax.php',dataType:'json',data:data,async:false,success:function(ret,textStatus,XMLHttpRequest){if(ret.success==true)content=ret.data;},error:function(e){console.log(e);}});return content;};var ajaxRemoveRevslider=function(obj){return jQuery(obj.selector+' .rev_slider').revkill();};if(jQuery.fn.tpessential!==undefined)if(typeof(jQuery.fn.tpessential.defaults)!=='undefined')jQuery.fn.tpessential.defaults.ajaxTypes.push({type:'revslider',func:ajaxRevslider,killfunc:ajaxRemoveRevslider,openAnimationSpeed:0.3});}var rsCustomAjaxContent_Once=false;if(document.readyState==="loading")document.addEventListener('readystatechange',function(){if((document.readyState==="interactive"||document.readyState==="complete")&&!rsCustomAjaxContent_Once){rsCustomAjaxContent_Once=true;rsCustomAjaxContentLoadingFunction();}});else{rsCustomAjaxContent_Once=true;rsCustomAjaxContentLoadingFunction();} jQuery( document ).ready(function() { String.prototype.filename=function(extension){ var s= this.replace(/\\/g, '/'); s= s.substring(s.lastIndexOf('/')+ 1); return extension? s.replace(/[?#].+$/, ''): s.split('.')[0]; } jQuery("img").each(function () { var alt = jQuery(this).attr('alt'); if(alt === undefined){ var namefile = jQuery(this).attr('src').filename(); var rep = namefile.replace(/[_#?%*$@!=]/g,'-'); var temp = rep.split('-'); var alt =''; for (i = 0; i < temp.length; i++) { alt = alt+' '; } jQuery(this).attr('alt',alt); } }); });#rs-demo-id {}/* *//* */
By default, the Scrape Website action removes scripts, styles, and other non-text elements from the returned content. However, you can enable these HTML elements by setting the "Keep Non-Text in Body" parameter to true when configuring the action.Alternatively, if you need more advanced scraping capabilities, especially for extracting specific HTML elements, you may want to consider using our Zenrows integration which provides more powerful scraping options. Sources: Scrape Website with Clay Run Zenrows Scrape with Zenrows
To find specific software types on a webpage using Apify and AI in Clay, you can follow this process: 1. Use Apify's integration in Clay to scrape the webpage: โข Select the appropriate Apify actor for your needs โข Configure the scraper settings with your target URL โข Run the scraper to extract the webpage data 2. Import the scraped data into Clay using either: โข The "Import data from Apify Actor" action โข The "Run Apify Actor" action which executes Apify actors directly within Clay workflows 3. Once you have the scraped HTML data in Clay, you can use AI tools to analyze the content and search for specific software signatures, such as div components or vendor names.This approach is particularly effective for websites that are difficult to scrape using standard methods, and allows you to process the data at scale. Sources: Deep Dive: Apify Actors [Limitless Research] - Clay University Apify x Clay integration | Clay.com
Hi Samuel, thank you for reaching out. People have had success using Zenrows to pull the entire HTML from a webpage. Once you have the HTML, you can use AI to clean the output and extract specific information like vendor names in the div components. Another option is Apify, which has various scrapers you can use depending on what task you are trying to complete. One of the benefits of Zenrows is that it has a direct integration with Clay, which could simplify the process for you. Let me know if you need more details on setting up either of these!
Samuel H. this one: https://apify.com/saswave/website-dns-scraper ๐
You can see in this table: https://app.clay.com/workspaces/313555/tables/t_cVPRhDnQrPQ7/views/gv_2tXUdmx5yDFV It's the Javascript option :)
I see - Which tool from Zenrows are you using to see this? Is this a custom function?
https://app.zenrows.com/builder The objective is just to get the raw HTML I think because I am trying to catch an ID that exists on the divs for chatbots vs the predefined HTML output fields. Lmk if you have any other ideas on ways to approach this Bo (.! ๐
Amazing. Keep us posted!! :)
Hi Bo (. - this was the solution I went with in the end and the results were great (much more effective than BuiltWith). Thanks Piers! https://clayrunhq.slack.com/archives/C071WRJFKMH/p1735760554964759