Flip a yesno coin

3/20/2024

The weirdest “likelihood” conversation I ever had, the putative team lead didn’t want to change priorities to fix a bug because, “how often does that happen?” Similarly in the CrossValidated question one could summarize the data with the mean and sum of squares. You could summarize the sequence with the number of heads or tails and then the likelihood values would be larger but the ratios would remain the same (it's a sufficient statistic). This has nothing to do with probability densities or logarithms, though the fact that we often work with densities also makes absolute likelihood values relative to the choice of units Nonetheless it makes sense to use the ratios of these numbers to compare the models

It's possible that every observed result has a tiny probability under every model you're considering The more complex the event you're predicting (the rarer the tyical observed result) the smaller the associated likelihoods will tend to be That is, the sequence we observed supplies about twice as much evidence in favor of bias = 0.5 as compared with bias = 0.4-this is likelihood. Under the model where the bias is 0.4 the probability is (0.4)^10 × (0.6)^10 or about one in two million. In fact, the probability of any sequence you could observe is one in a million. Under the model where the bias is 0.5-a fair coin-the probability of that sequence is (0.5)^20 or about one in a million. You flip a possibly-biased coin 20 times and get half heads, half tails, e.g. My answer would be, well, that's because for many posterior distributions, a lot of the probability mass will be near the MLE, if not exactly at it - so knowing the MLE is often useful, even if the probability of that exact value of the parameter is low.

(And also, separately, in a discrete parameter setting, specific parameter value could have substantial mass.) You are right but I think their question is still legitimate, because often in practice we do not give a range, and just give the maximum likelihood estimate of the parameter. You are pointing out they should really frame things in terms of a range and not a point estimate. The original poster is confused, because they are basically saying, well, if the actual probability is so low, why is this MLE stuff useful? Given some data, it would actually be possible that MLE would come up with the exact value, say 250.īut maybe given the data, a range of values between 240 and 260 are also very plausible, so the likelihood of exactly 250 has a fairly low probability. Let's say the parameter can have the value of the integers between 1 and 500 and most of the mass is clustered in the middle between 230 and 270. Let's imagine the parameter we're trying to estimate is discrete and has, say, 500 different possible values. I know they asked with a continuous example, but I don't interpret their question as limited to continuous cases, and I think it's easier to address using a discrete example, as we avoid the issue of each exact parameter having infinitesimal mass which occurs in a continuous setting. Right - I think this is what's at the heart of the original question. (If the distribution is very dispersed, then while the average is less useful as an idea of what to expect, it still minimises prediction error in some loss but that's a different thing and I think less relevant here). (And gets better if you augment it with some measure of dispersion, and so on). Much like how the average is unlikely to be the exact value of a new sample from the distribution, but it's a good way of describing what to expect. However, the idea is that often a lot of the probability mass - an amount that is not small - will be concentrated around the maximum likelihood estimate, and so that's why it makes a good estimate, and worth using. Yes, individual likelihoods are so small, that yes even a MLE solution is extremely unlikely to be correct. It is fair to ask why the likelihoods are useful if they are so small, and it's not a good answer to talk about how they could be expressed as logs, or even to talk about the properties of continuous distributions. I think most of the replies, here and on stack exchange, are answering slightly the wrong question.

0 Comments

Flip a yesno coin

Leave a Reply.

Author

Archives

Categories