Yeah, the problem is how to sanitise effectively. You’ve gotta be able to find a way to automatically strip out “bad” things from your training data (via an “oracle”). But if you already had that oracle, you could just slap it on your final product (e.g. Search) and make all the “bad” things disappear before they hit the user (via some sort of filter).
Favourite part of the whole article:
“You never saw what you thought you saw. And even if you did, it was entirely justified and your interpretation was extreme.”