Hey all, I’m working on an account signals workflow that pulls in relevant news for specific accounts and then run custom analysis / AI prompts on these.
The problem is that Clay Signals will pull in multiple (~3-8) articles from different sources all covering the same single event. I want to be able to essentially dedupe these articles so for articles covering the same topic, I want to surface just one of those articles to run the analysis on (to also avoid wasting credits).
I'm trying to solve this by creating a Deterministic Topic ID using Claygent, and I'd love to hear community best practices on this or how others have solved it. Currently, I have a Claygent/AI Prompt that strictly categorizes these articles into 1 of 10 topics and attached the company name. For example:
Then I tie this “Topic ID” to the date it was published and dedupe this way.
I’m curious what the community is doing to combat this problem. Is this forced classification into a limited list generally considered the most efficient/reliable way to dedupe articles within Clay's ecosystem?
Has anyone found a better method? (e.g., comparing the original news snippets with a different AI function, or relying on a specific news source integration's built-in dedupe feature?)
Any feedback or examples would be massively helpful. Thanks!