How to Extract Full HTML from Webpage into Clay for Tool Identification

Samuel H. · 2024-12-18 15:17:31.453369+00

_*A way to pull all of the HTML from a webpage into Clay?*_ Looking for a way to pull the entire HTML from a webpage into clay! The reason for doing this is to identify the tools used on the page by searching for the names of vendors in the div components. For example, I am looking "ocelot_ai".

How to Extract Full HTML from Webpage into Clay for Tool Identification | Clay

Channeled
APP
·
·
You can use Clay's Scrape Website action to extract the entire content from a webpage, including all HTML elements. This will allow you to search through the page content for specific vendor names or div components.To do this: 1. Use the "Scrape Website" action in your Clay workflow 2. Input the target URL you want to analyze 3. The action will retrieve all webpage content, including body text, links, and HTML elements 4. You can then search through this data to identify specific vendor tools by looking for their names in the div components Source: Scrape Website with Clay
Samuel H.
·
·
Yeah I tried that. However this was the output. It doesn't elements live divs and their IDs ***** Skip to main content UC Berkeley Cal Student Central Toggle navigation Services & Support Student Information Access Student Records Delegate Access Financial Aid Financial Aid Types Financial Aid Support Billing, Fees & Payment Billing Payment & Refunds Waivers & Other Resources Registration & Academic Records Registration & Enrollment Academic Records Chatbot Home / Chatbot Have questions? We’ve got answers! Welcome to Berkeley’s new artificially intelligent chatbot, that is available to Bears 24/7/365. Get your financial aid, registration, billing, payment, and rec sports questions answered through text and video in English, Spanish, Simplified Chinese and Vietnamese. Disclaimer: This chatbot service is not intended for private or confidential information. To Use the Chatbot (It’s As Easy As 1-2-3!):Click on the box on the bottom left hand corner of your screen. The box will expand. Ask your question on the line provided and hit “Send.” (Ex: “When is the financial aid deadline?”) Follow the chatbot’s prompts to get your suggested answers to your questions.Try it today Chatbot Special Features To change the language, click on the box on the left hand corner of your screen. The box will expand. Click on the globe on the upper right hand corner and pick the language you would like to use (English, Spanish, Simplified Chinese or Vietnamese). To download a transcript of your conversation, click to expand the box and click on the three circles on the top right hand corner of your screen. Select “Download Transcript” and it will automatically download as an HTML file. To clear the conversation when you are done, click to expand the box and click on the three circles on the top right hand corner of your screen. Select “Clear Conversation” and it will erase automatically. Services & Areas of Support Student Information AccessStudent Records Delegate AccessFinancial AidFinancial Aid Types Financial Aid SupportBilling, Fees, & PaymentsBilling Payment & Refunds Waivers & Other ResourcesRegistration & Academic RecordsRegistration & Enrollment Academic RecordsSearch Chatbot facebook twitter instagram youtube snapchat Contact Us Report Web Technical Issues Privacy Policy Accessibility Nondiscrimination Copyright © 2024 UC Regents; all rights reserved window.RS_MODULES = window.RS_MODULES || {}; window.RS_MODULES.modules = window.RS_MODULES.modules || {}; window.RS_MODULES.waiting = window.RS_MODULES.waiting || []; window.RS_MODULES.defered = false; window.RS_MODULES.moduleWaiting = window.RS_MODULES.moduleWaiting || {}; window.RS_MODULES.type = 'compiled'; var ajaxRevslider;function rsCustomAjaxContentLoadingFunction(){ajaxRevslider=function(obj){var content='',data={action:'revslider_ajax_call_front',client_action:'get_slider_html',token:'384765aff4',type:obj.type,id:obj.id,aspectratio:obj.aspectratio};jQuery.ajax({type:'post',url:'https://studentcentral.berkeley.edu/wp-admin/admin-ajax.php',dataType:'json',data:data,async:false,success:function(ret,textStatus,XMLHttpRequest){if(ret.success==true)content=ret.data;},error:function(e){console.log(e);}});return content;};var ajaxRemoveRevslider=function(obj){return jQuery(obj.selector+' .rev_slider').revkill();};if(jQuery.fn.tpessential!==undefined)if(typeof(jQuery.fn.tpessential.defaults)!=='undefined')jQuery.fn.tpessential.defaults.ajaxTypes.push({type:'revslider',func:ajaxRevslider,killfunc:ajaxRemoveRevslider,openAnimationSpeed:0.3});}var rsCustomAjaxContent_Once=false;if(document.readyState==="loading")document.addEventListener('readystatechange',function(){if((document.readyState==="interactive"||document.readyState==="complete")&&!rsCustomAjaxContent_Once){rsCustomAjaxContent_Once=true;rsCustomAjaxContentLoadingFunction();}});else{rsCustomAjaxContent_Once=true;rsCustomAjaxContentLoadingFunction();} jQuery( document ).ready(function() { String.prototype.filename=function(extension){ var s= this.replace(/\\/g, '/'); s= s.substring(s.lastIndexOf('/')+ 1); return extension? s.replace(/[?#].+$/, ''): s.split('.')[0]; } jQuery("img").each(function () { var alt = jQuery(this).attr('alt'); if(alt === undefined){ var namefile = jQuery(this).attr('src').filename(); var rep = namefile.replace(/[_#?%*$@!=]/g,'-'); var temp = rep.split('-'); var alt =''; for (i = 0; i < temp.length; i++) { alt = alt+' '; } jQuery(this).attr('alt',alt); } }); });#rs-demo-id {}/* *//* */
Channeled
APP
·
·
By default, the Scrape Website action removes scripts, styles, and other non-text elements from the returned content. However, you can enable these HTML elements by setting the "Keep Non-Text in Body" parameter to true when configuring the action.Alternatively, if you need more advanced scraping capabilities, especially for extracting specific HTML elements, you may want to consider using our Zenrows integration which provides more powerful scraping options. Sources: Scrape Website with Clay Run Zenrows Scrape with Zenrows
Rafael G.
·
·
We used an apify scraper for this then AI to find the software type we were looking for. Zenrows is also an option for the scraping part
1
Channeled
APP
·
·
To find specific software types on a webpage using Apify and AI in Clay, you can follow this process: 1. Use Apify's integration in Clay to scrape the webpage: • Select the appropriate Apify actor for your needs • Configure the scraper settings with your target URL • Run the scraper to extract the webpage data 2. Import the scraped data into Clay using either: • The "Import data from Apify Actor" action • The "Run Apify Actor" action which executes Apify actors directly within Clay workflows 3. Once you have the scraped HTML data in Clay, you can use AI tools to analyze the content and search for specific software signatures, such as div components or vendor names.This approach is particularly effective for websites that are difficult to scrape using standard methods, and allows you to process the data at scale. Sources: Deep Dive: Apify Actors [Limitless Research] - Clay University Apify x Clay integration | Clay.com
Stefano
·
·
Hi Samuel, thank you for reaching out. People have had success using Zenrows to pull the entire HTML from a webpage. Once you have the HTML, you can use AI to clean the output and extract specific information like vendor names in the div components. Another option is Apify, which has various scrapers you can use depending on what task you are trying to complete. One of the benefits of Zenrows is that it has a direct integration with Clay, which could simplify the process for you. Let me know if you need more details on setting up either of these!
Samuel H.
·
·
Thanks Stefano. Are you able to help me locate where in the Zenrows direct integration do you tell it to give you the entire HTML?
Loading attachments
Samuel H.
·
·
Thanks Rafael G. - which Apify scraper did you use for this?
Rafael G.
·
·
Samuel H. this one: https://apify.com/saswave/website-dns-scraper 😉
Bo (.
·
·
You can see in this table: https://app.clay.com/workspaces/313555/tables/t_cVPRhDnQrPQ7/views/gv_2tXUdmx5yDFV It's the Javascript option :)
Samuel H.
·
·
Thanks Bo (. - I'm still not able to see (video showing view) where I can grab the full HTML (here is a screenshot from what I am looking to return from Zenrows raw). Am I missing something here? Appreciate the help
Loading attachments
Bo (.
·
·
I see - Which tool from Zenrows are you using to see this? Is this a custom function?
Samuel H.
·
·
https://app.zenrows.com/builder The objective is just to get the raw HTML I think because I am trying to catch an ID that exists on the divs for chatbots vs the predefined HTML output fields. Lmk if you have any other ideas on ways to approach this Bo (.! 🙂
Bo (.
·
·
I am not able to see within this page (I don't have an account with them) but if it's not in clay you could most likely integrate it via the HTTP APIthis is their docs It might be worth to ask their team how that would be done as well so they can point you in the right direction
Samuel H.
·
·
Thanks Bo (. - I have reached out and will repost here if I get an answer

18 comments