Has anyone tested gpt models (e.g. 4o) vs clay models (e.g. Argon)? Would love to know what your findings are. 4o seem to be more cooperative than Argon in my experience.