Understanding AI Run Consistency and Output Variability
AI Prompting, Run-consistency, and output help: Hey team, quick question about AI run consistency. When I run my agent to identify/confirm a company name, I’ll often get a mix of high-confidence positives and low-confidence/unknowns. What’s confusing is that if I rerun the low-confidence rows, a meaningful portion of them flip to a positive match — even though I haven’t changed the prompt or inputs. Can you help me understand what causes the output to vary between runs? A few specific things I’m trying to learn:
Is the agent doing live web search each run (and therefore seeing different pages/snippets), or reusing cached results?
Are there any non-deterministic factors (model sampling/temperature, different retrieval paths, rate limits, timeouts) that can lead to lower confidence on one run and higher confidence on another?
When a row returns “low confidence,” is that typically because of insufficient evidence retrieved, parsing issues, or ambiguity between entities?
Any best practices to reduce variance (e.g., force citations/require 2+ sources, constrain to homepage/title/footer, rerun strategy, etc.)?
Happy to share an example table/prompt + a few row IDs where this happens if that helps. 🧵
