Hey Team, I am pulling data from phantom bustor But it is not Pulling the complete container
If PhantomBuster is not pulling the complete container of data, check these settings: ** Ensure you have specified the correct Agent ID. ** If you want all data from a specific run, disable the Only Fetch Latest Container option and provide the exact Container ID you wish to pull. ** If you only want the most recent data, enable *Only Fetch Latest Container. Review your configuration in the source panel or import settings to make sure the right options are selected. Adjusting these should help you retrieve the full container data you need.
Hello Aman! Thanks for reaching out! Can you tell me more about this situation? What do you mean by it is not pulling in the entire container? Do you mean it is not pulling in all the fields that you can currently see? Any chance you set up those fields after creating this source and pulling in the data? Created this loom further talking about my thought process, lemme know!
So Basically I am not getting all the leads when Export List directly rom phantom bustor it is giving complete results but when pulling here in clay it is deduping the data between the containers + it is not getting all the leads
Hey Aman! Can you share a loom video of you using both methods? Since your auto de dupe is disabled it is unusual for these leads to get de duped. Also let me know more about what you mean by it is de duping between containers. Since we do not have access to your phantombuster, it is a little difficult to de dupe.
Hey — just to clarify, you’re currently specifying a single container ID (7234839676679712), which is why you’re not seeing all the leads. Clay will only pull data from that specific container, so anything outside of it won’t be included. Also, for more reliable and real-time results, it’s better to use webhooks with PhantomBuster — that way, the data will always come through instantly. Let me know if you need help setting that up.
Hi Support Team, I’ve been troubleshooting the data pulling issue and wanted to share my findings and concerns:
Even though I’ve been pulling data using specific containers, I’m still not receiving the complete dataset. I’ve tried pulling from each container individually, but the issue persists.
From my observation, the problem seems to stem from the large volume of data. As containers age, they appear to merge into larger files. This causes some data overlap between containers, which is fine — we can handle deduplication automatically.
The main issue is that pulling from a container dated, for example, the 18th, doesn’t yield data from that date. Instead, it starts pulling from a later date (e.g., the 26th). I understand this happens because data from earlier dates gets merged into newer containers. However, the pulling function still doesn’t retrieve the complete dataset, which results in missing leads — including those from the 18th.
When exporting the CSV file directly from Phantom, I can see a higher volume of leads, including data from the missing dates. But Clay isn't fetching all of this when using the pulling method.
Regarding the webhook: Phantom provides a CSV URL, which currently requires manual downloading and importing into Clay. What I need is full automation of this step. Thank you, Aman.
Thanks for flagging this! It’s possible the missing data is due to deduplication logic—some entries may have been filtered out if they were identified as duplicates. That said, could you confirm whether deduplication is enabled on your end? I also noticed two columns that contain dates. Could you clarify which one you’re referring to? That would help narrow down the issue. If possible, could you record a quick video showing the data inside that container as well as what you're seeing in Phantombuster? That will give us a clearer picture. From what I can see, it seems like re-running the rows successfully pulls in new data—let me know if that’s consistent on your end. Maybe those are the records that were missing that are now being pulled in. Looking forward to your reply!
Check the Column timestamp and For loom i prefer that if we have a short Zoom/Gmeet call so I can explain better and you can also help me figuring that out
Hey Aman — we’re not offering 1:1 calls at the moment, but happy to help here. From what I understand, the main issue is missing data. A few things we should check: 1. After we updated the source, are you still seeing gaps in the dataset? There was 801 rows before the refresh, now there is 1602. 2. I noticed the timestamp shows the 18th, but the data starts on the 20th. It might be worth asking Phantombuster if that’s expected — especially since your observation about containers merging sounds spot on. Was it infos they sent you? 3. Could you try uploading the CSV you exported directly from Phantombuster into Clay? That’ll help us compare and isolate what’s not coming through via the automated pull and flag it to our team as a potential bug. Let me know what you find and we’ll keep digging.
1- 800 Duplicates means it is rerun no new leads from that container checked from Job URL 2- Did not got your point completely, Look when I try to download the Earlier Container from Phantom Bustor It also has full data including all the containers Only if I download the latest one I got only the 1 Latest container results, also if I download generally directly from Phantom it completely downloads the file 3- See the Table Phantom Buster Scraping Jobs Daily | LinkedIn Search Export (Jeffs Version) that is till 23rd date ( 25th not downloaded and uploaded yet it is available in csv Links webhook) 4- The solution that I am using now is manually downloading the csv from csv link webhook coming from Phantom and Uploading it to Phantom Buster Scraping Jobs Daily | LinkedIn Search Export (Jeffs Version) Table
Hey, thanks for the video Looks like Clay isn’t able to pull the full CSV data from the PhantomBuster container in this case, and it’s likely due to how the data is paginated. I’ve noticed that our team already had this flagged to engineering as a potential bug. For now, the best workaround is reducing the scraping batch size (under 800) and running it more frequently to help with consistency. This pagination issue has been added to our backlog. Our team prioritizes fixes based on severity and impact, so I’ll reach back out as soon as there’s an update. Let me know if you have more questions.