Hi Stefano - Thanks for getting back to me.
Our specific use case for our current client is as follows:
We are tracking signals at fintech companies to imply hiring need for CISOs (regulatory approvals, funding rounds, leadership changes, cross-border expansions) etc, to find high-intent accounts for our client in recruitment.
For our initial sample size for this client, we found that out of 100 signals, about 20-30% were duplicates, a fairly costly expense as we scale up campaigns to fit demand.
Currently, we are manually deleting duplicates, as some of the responses returned were more detailed than others, even though they are from essentially the same source and have the same content.
We wanted to prioritise the duplicate responses that gave the most context/information, as they would provide us with more data for later steps in the workflow, such as personalisation.