AI Snake Oil (Part 3): Evaluation

In the last post, I discussed training data: mostly that you ought to have it or a way of getting it. If someone pitches you an idea without even a reasonable vision of what the training data would be, they’ve got a lot less credibility. In other words, if you can’t even envision training data for a given task, then the task itself may be impractical.

No offense to the creators of this actual robot; I just needed an image with a CC license.
A ridiculous image to get the idea of “AI evaluation” into your head. (No offense to the creators of this actual robot; the giant red letter “F” is not an actual evaluation of your robot. I just needed an image with a CC license. I hope it ping-ponged well.)

Next, let’s talk about evaluation with respect to application development, namely, if someone pitches an AI application idea to you:

Question: Do they have an evaluation procedure built into their application development process?

Arguably, evaluation is more important than training data. I chose to discuss training data in the first post because thinking in terms of training data gives you intuitions about what’s possible. It eliminates the infinite, but still leaves you with dreams. Evaluation is where your dreams are torn to shreds, whether or not you have training data.

Fundamentally, I want to cover three things here: why we evaluate, how do we evaluate, and how do we score the results. Understanding these three things is essential to understanding what makes a suitable evaluation; a crappy evaluation sows false confidence, something worse than no evaluation at all.

Continue reading “AI Snake Oil (Part 3): Evaluation”