I'm pulling in data from Phantombuster and each run should be about 300 records but the table is adding thousands of duplicate records each time I add one of my runs. These seem to be duplicates and they're are being added before the new data is pulled in.
I've tried adding the same data to a new table and the problem doesn't exist
This has used loads of my GPT credits as I only noticed my table had gone over 50,000 records after about 5 imports (Phantombuster does 6 runs per day)
Just ran another test on this and when records already exist in the table and I add a new Phantombuster import it automatically duplicates everything before pulling in the new results from the requested run.
Hi Callum, thank you for reaching out. In order to make sure that your table does not continuously get updated, I recommend you disable the auto-update function. This will ensure that your table does not constantly refresh and use up your credits if you do not want it to . To do so Select the following button: https://downloads.intercomcdn.com/i/o/w28k1kwz/1249007595/c04d933347afe7ab1e59055e9e32/CleanShot+2024-11-12+at+10_41_37%402x.png?expires=1731428100&signature=44a7da9ce36426de8ca4b3b27dbd379ae3609559eb223a54ddd0254ea46ca252&req=dSIjH8l%2BmoRWXPMW1HO4zVO3wx6vJlf5baPiyHhmvDOYKeII6ZfoxEYUK%2F9U%0AaXAh%0A Then press on "Turn Off" in order to stop the auto-update function. Secondly, it is also possibly to prevent duplicate posts from appearing in your table by selecting the button next to "auto-update" and selecting the following criteria: https://downloads.intercomcdn.com/i/o/w28k1kwz/1249012358/4a06bcd51f402c825dc85f01f0e2/CleanShot+2024-11-12+at+10_45_06%402x.png?expires=1731428100&signature=f5ae736c331e0a844b1aa93c1b9f055f0f950168876519be30161eb7ce7f68d3&req=dSIjH8l%2Fn4JaUfMW1HO4zZY8Fag95%2Bff9mzBtfzlHnidsQjltDMM%2Bcvrg1G0%0A%2F5kK%0A This should ensure that no posts with the same exact content are duplicated in the table. Let me know if this helps.
How do I set auto dedupe in this table - https://app.clay.com/workspaces/304087/workbooks/wb_S9t3Aj2QnjUp/tables/t_YcyoV26weuZY/views/gv_gzqEPQCubN2w
also worth noting that I didn't want to set this as their might be different actions on the same post across different users
I only did it at a column level because it started duplicating everything
Also this doesn't address what was going wrong
I've now turned off the column but i wasn't expecting it to randomly duplicate everything in the table
Hi Callum, Thanks for reaching out! The issue is that you’re running 39 sources within the same table, which most likely are duplicates and that causes multiple sources to fire at the same time. To make this more efficient, I’d recommend creating a new table and setting it up as a webhook from Phantombuster to Clay. Here are some resources to guide you: • Phantombuster Webhooks Documentation • Clay University: Intro to Webhooks For deduplication, you can activate Auto-Dedupe at the bottom right of the table. Let me know if you need help setting this up or have any other questions! https://downloads.intercomcdn.com/i/o/w28k1kwz/1249324481/3ee0716fad49a000e4ec421db631/CleanShot+2024-11-12+at+_22084Z4OVU%402x.png?expires=1731441600&signature=0ac51d3ade167754c6d4de7f4cd9ff66393ca251eecb63adddb8d2480c696e79&req=dSIjH8p8mYVXWPMW1HO4zSPYsVn%2BOv7%2FtRFsJrgSFsn6ZsGg%2BZHKVoKSDlt6%0As%2Bqc%0A https://downloads.intercomcdn.com/i/o/w28k1kwz/1249323399/80646db67c9cc0ca41a48171d5e5/CleanShot+2024-11-12+at+_398fcAc1nF%402x.png?expires=1731441600&signature=1f5e659a81866a1ec1f9f56debd7c2abd969daedad348dcaed0841eec889e9c0&req=dSIjH8p8noJWUPMW1HO4zb%2BHXjLOzdluA7uTxWNdm%2FTjyBOpjLwYRF%2F2diAb%0AtKs%2B%0A
Don't think thats the case. These are not duplicate runs or records and as I said when I tested this on a table with only one source it duplicated everything in that table and then added the new results. I haven't had this before so it seems like something on the Clay side is going wrong.
I also think the webhook just pulls results on the run, not the results of that run.
{ "agentId": "5027055349780535", "agentName": "Test Script", "containerId": "3358014727012763", "script": "test_script_50516785467.js", "scriptOrg": "phantombuster", "branch": "test-branch-0545107204", "launchDuration": 121, "runDuration": 1850, "resultObject": {...}, "exitMessage": "finished", "exitCode": 0 }
Maybe I'm missing something with the webhook but i've tested it before and couldn't get the data.
The record duplication though is very strange and seems like a bug on Clays side that ended up using a lot of my ChatGPT "credits"
really hoping to get this sorted so please let me know if you're looking into this
Hey there Callum thanks for reaching out, jumping in for Bo here, could you give us a couple examples of where you are seeing records being duplicated so that we can see what's going on here
I can create you a loom tomorrow morning (UK time). What’s happening is if you create a table with phantombuster records and then you do more than one import it automatically duplicates all the records already in the base before adding the new results
Can you confirm if I’m right or wrong about the phantombuster webhook?
Maybe try doing your own test based on my description. In general it’s slightly annoying when I describe a problem and it automatically gets handled by sending over basic help docs. I’ve had this a lot recently and it’s not how the support used to function.
Hey Callum quick question when you are importing these Phantombuster scrapes, are you toggling on the "Only Fetch Latest Container" option or specifying which container to use? If not then this would be the cause of duplicates as this would cause the Integration to pull all of the results from the Phantom regardless of whether they already exist in your table or not. https://downloads.intercomcdn.com/i/o/w28k1kwz/1250886606/cae0862bf72c552e3e8145a4a45e/image.png?expires=1731532500&signature=0f6df8625a8f0b851d9c77af3ac795feb3b146f9bbc10858cf0e8df9bffdf67f&req=dSIiFsF2m4dfX%2FMW1HO4zSqVLgju7J260DAWlw2Q6nk2TaDbHmGN4TXvIUuH%0AzVRV%0A
Hey there - just wanted to check in here to see if you needed anything else! Feel free to reply back here if you do.
We haven't heard back from you in a bit, so we're going to go ahead and close things out here - feel free to let us know if you still need something!