Skip to main content
10 events
when toggle format what by license comment
May 2, 2019 at 20:54 comment added LSC Not to be political, but there is often a lack of statistical knowledge by those teaching "machine learning" and "big data" courses where statistical methods like logistic regression are employed and somewhat abused; that is, statisticians who vastly understand the methodologies and how to employ them in new scenarios aren't the ones teaching these subjects. It's hard to imagine the cases where only caring about accuracy is what matters. I second reading Frank Harrell's blog posts about this as referenced in the "answer" above.
Apr 26, 2019 at 17:12 comment added StatsSorceress Fair enough Wayne! Personally, I've found that in course work, the objective was to minimize misclassification error, not to worry about precision/recall, so I think the comment still has some merit for those in a course-based setting who are wondering why the two separate steps are necessary if all we want to do is "get the class right".
Apr 26, 2019 at 12:10 comment added Wayne @StatsSorceress "... sometimes in machine learning classification ...". There should be a big emphasis on sometimes. It's hard to imagine a project where accuracy is the correct answer. In my experience, it always involves precision and recall of a minority class.
Apr 26, 2019 at 11:10 history edited gung - Reinstate Monica CC BY-SA 4.0
added 32 characters in body
Apr 25, 2019 at 20:11 vote accept StatsSorceress
Apr 25, 2019 at 18:13 comment added gung - Reinstate Monica As I said, you very much can set up your own custom optimization that will train the model & select the threshold simultaneously. You just have to do it yourself & the final model is likely to be poorer by most standards.
Apr 25, 2019 at 16:29 comment added StatsSorceress Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
Apr 25, 2019 at 16:02 comment added gung - Reinstate Monica You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
Apr 25, 2019 at 15:55 comment added StatsSorceress Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
Apr 25, 2019 at 15:43 history answered gung - Reinstate Monica CC BY-SA 4.0