I’m sure this is not my idea, so I’m not claiming it to be. I’ve been wanting to do a sort of continuous AI eval in production for a while, but the situation never presented a work. It was a mixture of having the data to do the eval off line, and wanting to avoid the risks of doing it in prod. But now I’m going to do it for a side project.

I don’t want to reveal what my side project is yet, so I’ll keep it vague. I’m very excited about this part, so I wanted to share it early. And I’m hoping that the Internet will tell me if, as it usually does, if this is a bad idea.

I have a task that will be done by an AI and I can measure how successful it was done but only 2 to 7 days after the task was completed and seeing it out there, in the world. I will gather some successful examples t…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help