Constant Eval in Production, the idea
pablofernandez.tech·2d
🎙️Whisper
Preview
Report Post

I’m sure this is not my idea, so I’m not claiming it to be. I’ve been wanting to do a sort of continuous AI eval in production for a while, but the situation never presented a work. It was a mixture of having the data to do the eval off line, and wanting to avoid the risks of doing it in prod. But now I’m going to do it for a side project.

I don’t want to reveal what my side project is yet, so I’ll keep it vague. I’m very excited about this part, so I wanted to share it early. And I’m hoping that the Internet will tell me if, as it usually does, if this is a bad idea.

I have a task that will be done by an AI and I can measure how successful it was done but only 2 to 7 days after the task was completed and seeing it out there, in the world. I will gather some successful examples t…

Similar Posts

Loading similar posts...