Has anyone ever done a side-by-side of all the AI models on a few tasks to really compare them? So difficult to know what to use when
for me the ones i have used so far, here is how i would say they are best use 4o mini - cheap, used for general task, can also scrape the web - but sometimes it could not be accurate. 4.1 mini - new version of the 4o mini, same use case as it but has better reasoning - notice in some cases it slightly cost more by an inch or sometimes not when using clay credit 4.1 / GPT-5 - for most complex reasoning tasks - does most of the work you want it to do - but very costly 4o mini is usually my go to for everyday task/scrape
Tim P. I'd agree with the above post, but a lot of it comes down to personal preference and testing. For example, I found Gemini to be better at parsing call transcripts for compliance purposes than Claude or ChatGPT, and I prefer Claude's writing style over Gemini or ChatGPT. Also, since the space keeps expanding and changing, its hard to predict which new models will be better for certain tasks. I recommend coming up with a testable use-case (you already know the correct answer) and test different models against it using the same prompt. This will give you a better feel for what works best for certain tasks.
