Week 2: Building in Public
Week 2: Building in Public The CSV situation got worse before it got better. What started as a few files turned into this 1. Live job board scrape coming through n8n 2. Do Not Contact list 3. Contacts assigned to reps 4. Contacts not assigned to reps 5. No relationship companies 6. Past clients/companies Looks like the problem is growing, not shrinking. But this was intentional. The only way to clean the data was to separate it first: Contacts → Companies → Relationship status Not elegant. But necessary. Once that was clear, the next step was obvious Merge everything back into one system. That’s where the real challenge showed up. Not enrichment. Not structure. Deduplication - without losing context. Because this is what the data actually looks like → Multiple contacts per company → Multiple reps per company → New contacts with new companies → New reps getting auto-assigned → And trying to reuse V1 enrichments to save credits Everything overlaps. If you dedupe too aggressively → you lose context If you don’t dedupe enough → you waste credits and create noise I used "single-row lookup" in Clay, with normalised URLs as the anchor. It works. But... I wouldn’t call it “solved.” I can already see the tradeoff → Save credits → Or preserve quality And right now, it feels like you can’t fully have both. I'm going to explore Claude for this on Sameer's recommendation. Do you optimise for cost or for data integrity when deduping messy GTM systems? I've shared screenshots of V1 and V3 Overviews. More soon. --------------------- Mansoor
