Claygent isn't sticking to the defined output structure at scale. When I set sonnet 4 to investigate a website and return either "Valid" or "Invalid" - it works fine when rows are run individually - but gets very confused when hundreds of rows are run. It will start ignoring the output structure completely. This is very frustrating, as it's been a recurring issue across other tables across different models. https://app.clay.com/workspaces/223473/workbooks/wb_0t6sw90Zb35DKX93pJf/tables/t_0t8aefk36Zg2vYU5ewS/views/gv_0t8aefk7TeKJBBixK2c
This is a known issue with AI models at scale - they tend to drift from structured outputs when processing large batches. Here are some solutions:
Immediate fixes:
Break your data into smaller batches (50-100 rows at a time) instead of running hundreds at once
Test on a few rows first, then apply to larger datasets [1]
Better structure:
Use Claygent's column outputs feature to define exactly where each piece of data should go [2]
In your prompt, explicitly tell Claygent which information goes to which output field
Try Sculptor first:
Before we dig deeper, I'd recommend trying our built-in AI tool Sculptor (top-right corner). It can help optimize your Claygent setup and might solve this structure issue automatically. Give it a shot and let me know if that helps!
If Sculptor doesn't resolve it and you're still seeing the drift at scale, we can explore other solutions.
Did that answer your question?
Yes, JSON schema is exactly what you need for consistent structured outputs at scale!
In your Claygent setup:
Select JSON schema as the output format instead of text
Define your schema to specify exactly what fields you want (like "Valid" or "Invalid")
Set up the schema structure to constrain the AI's responses
This forces the model to return data in your exact format, preventing the drift you're seeing when processing hundreds of rows. [1]
You can also use column outputs in Claygent to automatically split JSON results into separate columns, making the data even more structured and usable. [2]
JSON schemas are particularly effective because they create hard constraints on AI outputs rather than relying on prompt instructions alone, which models tend to ignore at scale. [3]
Try this approach and let me know if it resolves the consistency issues!
Was that helpful?
If you still need help with ensuring consistent output structure at scale, I’m here to assist. Would you like to provide more details about what you’re trying to achieve or any specific challenges you’re facing?
Thanks Muhammad S., it's helped but not solved entirely. Going to try rewording the prompt...
Good approach! When rewording your prompt, focus on these key areas:
Be extremely specific about output format - Instead of just saying "Valid" or "Invalid", try: "Return exactly one word: either 'Valid' or 'Invalid' with no additional text, punctuation, or explanation." [1]
Add clear step-by-step instructions - Break down what you want Claygent to do: "1. Analyze the website, 2. Determine if it meets criteria X, 3. Return only 'Valid' or 'Invalid'" [2]
Test on small batches first - Run your reworded prompt on 5-10 rows to verify it maintains consistency before scaling up. [3]
The combination of JSON schema + very explicit prompt instructions should give you much better consistency at scale. Let me know how the rewording goes!
Did that answer your question?
If you still need additional help with refining your prompt or ensuring consistent output at scale, I’m here to assist. Would you like to share more about what you’re trying to resolve or any specific issues you’re encountering?
