Consider a spam email detection system where only 1% (ground truth) of emails are spam.
(a) Which evaluation metric is considered the best choice: precision," recall," or ``f1-score"?
(b) Explain the benefits of the best metric.
(c) Criticize the remaining metrics.
(a) f1-score.
(b) A high f1-score requires both precision score and recall score are high.
A high precision score entails that there is a low probability that non-spam emails are incorrectly tested spam.
A high recall score entails that there is a low probability that spam emails are incorrectly tested non-spam.
(c) Precision: For a high precision score, it is still possible that there is a high probability that spam emails are incorrectly tested non-spam.
Recall: For a high recall score, it is still possible that there is a high probability that non-spam emails are incorrectly tested spam.
1 Like