Auto-dedupe on Keep Newest Rows is not working. I can still see multiple duplicates in the table. Can you please help? Here's the table URL - https://app.clay.com/workspaces/479671/workbooks/wb_0t7thshuyAT25sbDBqD/tables/t_0t7v7kkC4NJVbY2Qwcy/views/gv_0t7v7kkXcHTuuXXegP7
Hey,
Just to check, when did you add the auto dedupe setting, and can you walk me through the exact steps you took to activate Keep Newest Rows? Was dedupe enabled before the duplicates were created?
We can also try re-activating it to see if that clears the duplicates.
Let me know if you have more questions.
Hey there - just wanted to check in here to see if you needed anything else. Feel free to reply here if you do and someone will jump in to help out
We haven't heard back from you in a bit, so we're going to close things out here.
If you need anything else, feel free to reply back here and someone will jump in to help out!
Hey Bo (. - Here's what's happening:
I'm sending data from one table to another, let's assume from Table A to Table B.
I have switched on auto-dedup with newest rows on one particular column (which has urls) in Table B.
Now when I resend data from Table A to Table B (using Send Table Data) function, I want the newest rows with the URL to stay, but instead, it writes it to Table B and all the rows with duplicates are available there.
So the auto-dedup function isn't working.
I did try to switch it off and on again, but nothing changes.
Can you please help?
Hey Daniela D., can you please help?
Hey there. I see the table is set up to de-dupe by the 'Share Url' column. But there are no duplicates in that column right now:
Was this recently de-duped manually? (I see at least 6 duplicates were removed within the last couple weeks.)
Mark L. Hey, yes these were manually removed.
Thank you. De-dupe is more difficult when the records are created at ~the same time. As in, when the records check for duplicates for a given row, the duplicate that's arriving at the same time may not be detected. (Like two divers jumping into the pool at the same time – they check the water to see if a diver is already there, and if not, jump.)
I'm thinking another way to help to avoid duplicate entries in the table is to stop sending duplicate records to the table.
The records that arrive in this table...
...are coming from this one:
Do you know why some URLs (like activity 7376200575060975616 that came back in in row 4) are included multiple times from the provider you're using in the HTTP API column? Is there any way to adjust that to exclude duplicates, as an extra safeguard?
Hey Mark L. Thanks for that context. So the URLs that need to be deduped are unique LinkedIn post URLs. So they only come once when the data is sent to the other table. I want to run the post extraction column on a schedule, where it sends the newly extracted posts to the other table, and there it should get auto-deduped to only retain the new ones, and delete the old ones (because I want updated reactions as well). Since it's sending the data on a schedule, there isn't the problem of deduping records at the same time. So even when I send the data after sometime, the dedupe doesn't work. Please let me know if I can help clarify further.
Thank you. What I was trying to share is that the results coming back from the HTTP API column are not unique. This cell, for example, has two sets of duplicates in the list of 50 results:
So when sending a row for each item in the list of 50...
... each of those results gets a row – that cell sent 4 rows for those 2 unique share URLs. (The send table data action will create them ~simultaneously.) I'm not sure why the HTTP API column would return those duplicates, but to address it at the root, that's where I'm curious to learn more.
Thanks for the clarification Mark L.. I checked these out, and turns out these are edge cases (where the person has reposted the post twice). And it's ok if they get duplicated. But in most cases (more than 95%) of the time, the ShareURLs will be different because they are for different posts. The dedup doesn't work there as well. To show you the issue, I've resent the Send Table Data from Row 1. In an ideal scenario, it should keep the new rows and remove the old ones, but you can see in the table that all the rows appear as is.
Hey,
Sorry about this. I’ve flagged this as a bug and escalated it to the engineering team.
Right now, the workaround is to manually dedupe the table while the team investigates why the existing rows aren’t being replaced when Send Table Data runs again. This isn’t expected behavior, and your workflow being blocked makes sense given the setup.
We’ll follow up here as soon as we have an update from engineering.
Let me know if you have more questions.
Thanks Bo (.
Hey,
The rows aren’t being de-duped because we have a hard limit of 200 characters on the value used for deduplication. In this case, the URLs in rows 17 and 18 are 205 characters long, so they fall outside that limit and won’t dedupe automatically, even though they look identical.
A good workaround is to shrink the value used for deduping. You can do this with a formula by trimming or transforming the URL into a shorter, consistent version. For example, you could remove query parameters or extract only the stable part of the URL, then use that shortened output as the dedupe key instead of the full URL.
That should get these rows to dedupe as expected.
Let me know if you have more questions
