MLE and Cross Entropy

Log likelihood for binary classification. Note that we can view this as only caring about the likelihood of the true class.
Log Likelihood for multiclass classification.

Because maximizing "log likelihood" is equals as minimizing "negative log likelihood", negative log likelihood can be view as a loss function.

given two probability distributions p and q, their cross-entropy is defined as this term
  • An important distinction between Negative Log Likelihood Loss (NLLLoss) and Cross Entropy Loss (CE) is that CE implicitly applies a softmax activation followed by a log transformation but NLLLoss does not.

Last updated