How do people "version control" AI prompts? As in, if I'm trying to refine a prompt, and I want to test it on many cells (with different inputs). It works on most inputs and then is bad at others, so I refine it, and then its gets better at some parts and worse in earlier. But since I'm constantly changing the same AI column, I'm not able to effectively keep track of what changes I've made. Also, since I re-run the cells, I lose the previous output, so I can't "compare" the output of one AI prompt vs another.