I am using an HTTP request to jina.ai that is pulling markdown's of websites Then I am using openAI to summarize those websites and compare it to my ICP to run an ICP checker. Is it possible to truncate the content of the jina.ai markdown when sending it to open ai? I am spending a lot of credits summarzing each website
yes, you can use a formula to truncate the text use forumula return first x characters from /text this will only return a limited amount of text which u can pass to open ai
and is there a formula that i can get rid of links, html etc from the markdown?
yes you can use ai formula
tell it what to do and it will do
Yes, you can optimize your credit usage for website summarization. Consider these approaches: Use AI Formulas first - These are completely free and can truncate or clean your markdown content before sending to OpenAI. You can use AI Formulas to extract specific sections, remove unnecessary content, or limit character count. Connect your own OpenAI API key - This cuts costs by up to 90%, and 90% of Clay users input their own AI API keys. Test with small samples - Start with 10 rows to refine your prompts and truncation approach before running the full enrichment. For truncation, create an AI Formula column that processes your markdown content to keep only relevant sections (like first few paragraphs, specific headings, or key content areas) before passing it to your OpenAI summarization step.
Yes! You can use AI Formulas to clean HTML and links from your markdown content. For cleaning HTML/links, create an AI Formula with instructions like: ** "Remove all HTML tags and links from /markdown_column" ** "Extract only plain text from /markdown_column, removing HTML tags and URLs" * "Clean /markdown_column by removing HTML formatting and hyperlinks" AI Formulas are credit-free and perfect for data cleaning tasks like removing HTML tags, formatting text, and extracting specific patterns. You can combine both truncation and cleaning in one formula or use separate columns for each step.
Exactly! AI Formulas are incredibly versatile and credit-free. You can give them natural language instructions like:
** "Remove all HTML tags and links from /markdown_column"
** "Return first 500 characters from /text"
* "Extract only plain text, removing HTML formatting and URLs"
They're perfect for data cleaning, formatting, and transformations without consuming credits. Simply describe what you want, reference your columns with /column_name
, and the AI Formula will generate the code to do it.
Spencer H. Yep! You can clean up the markdown before summarizing. Try a formula like this:
.replace(/<[^>]*>/g, '') // remove HTML tags
.replace(/\[.*?\]\(.*?\)/g, '') // remove markdown links
.replace(/http[s]?:\/\/\S+/g, '')// remove raw URLs
.slice(0, 3000) // truncate to 3000 characters (adjust as needed)
This will clean and truncate your content before sending it to OpenAI, saves credits and keep context sharp.
Perfect! That's exactly how AI Formulas work - you can give them JavaScript-style instructions to clean and truncate your markdown content. Your formula approach is spot-on for optimizing OpenAI costs. AI Formulas are completely credit-free, so you can process unlimited rows without spending credits on data cleaning. You can also use natural language instructions like "Remove HTML tags and links from /markdown_column, then truncate to 3000 characters" - the AI Formula generator will create the appropriate code for you. This preprocessing step before OpenAI will significantly reduce your credit usage while keeping the most relevant content for summarization.
Sobia J. awesome thank you - is that by adding enrichment -> use AI? Or would it be just a formula? Given the use AI bit was going to use up credits
Use Formula (not "Use AI" enrichment). AI Formulas are completely credit-free - they don't consume any credits at all. Add a new column, select Formula, then describe what you want to do in natural language like "Remove HTML tags and links from /markdown_column, then truncate to 3000 characters." The AI Formula generator will create the code for you without using credits. "Use AI" enrichments consume credits, while AI Formulas are free for data cleaning and formatting tasks.
Spencer H. You’d use this as a formula column before the AI step that way you’re cleaning and trimming the content before it ever hits OpenAI, which helps cut costs and keeps your prompt sharp.
Exactly! That's the perfect workflow - use a Formula column (not "Use AI" enrichment) to clean and truncate your markdown content before it reaches OpenAI. AI Formulas are completely credit-free and ideal for data cleaning tasks like removing HTML tags, links, and truncating text. This preprocessing step significantly reduces your OpenAI credit usage while keeping the most relevant content for summarization. Add a Formula column, describe what you want (like "Remove HTML tags and links from /markdown_column, then truncate to 3000 characters"), and it'll generate the code automatically.
I ran the formula to get rid of HTML, raw URLs and markdown links + truncated the text to 4056 characters. however some of the rows have errors (see screenshot) saying cell data size exceeded 8kb - why is this?
The error occurs because Clay has an 8,000 character limit per cell, but you're truncating to 4,056 characters. The issue is likely that your formula isn't working properly on some rows - the original content may still be present or the truncation isn't applying correctly. Check if your formula is actually processing all rows or if some are bypassing the truncation step. Try reducing your truncation limit to 3,000 characters to stay well under the 8KB limit and ensure your formula is applied to all rows.