The line indicating the loss from L(spam, nospam) =1, L(nospam, spam) = 10 (pg. 711) makes sense.
But the different weightings seem to disappear when L(y, y^) is discussed.
Why can't the loss be minimized by classifying every instance as nospam? Then no errors that classify nospam as spam could ever occur.
Also how are expressions such as | y - y^ | to be interpreted? Are the y's numerical variables such as 1 and 0?
Reading over Secs. 5.6 and 5.7 of Data Mining 3ed. (Witten, et al) show how complex these techniques can be. Not sure that the AIMA discussion gets the basics across.
- Bob Futrelle