StatQuest(ML) Odds Ratios and Log(Odds Ratios)

When people say "odds radio", they are talking about a "ratio of odds". So we've got

\[\frac{a\ ratio\ of\ one\ odds...}{...to\ another\ odds.}\]

Just like when we calculate the odds of something, if the denominator is larger than the numerator, the odds ratio will go from 0 to 1, and if the numerator is larger than the denominator, then the odds ratio will go from 1 to infinity (and beyond!)…

…and just like the odds, taking the log of the odds ratio (i.e. $log(odds ratio)$) makes things nice and symmetrical.

Odds Ratio in Action

Has Cancer
Yes No
Has the mutated gene Yes 23 117
No 6 210

We can use an "odds ratio" to determine if there is a relationship between the mutated gene and cancer. If someone has the mutated gene, are the odds higher that they will get cancer?

\[\begin{aligned} odds\ ratio&=\frac{\frac{23}{117}}{\frac{6}{210}}=6.88 \\ log(odds\ ratio)&=log(6.88)=1.93 \end{aligned}\]

The odds ratio and the log(odds ratio) are like R-squared; they indicate a relationship between two things (in this case, a relationship between the mutated gene and cancer), and just like R-squared, the values correspond to effect size.

  • Larger values mean that the mutated gene is a good predictor of cancer.
  • Smaller values mean that the mutated gene is not a good predictor of cancer.

There are 3 ways to determine if an odds ratio (or log(odds ratio)) is statistically significant.

  1. Fisher's Exact Test
  2. Chi-Square Test
  3. The Wald Test

There is no general consensus on which method is best and people often mix and match.

  • Some people will use Fisher's exact test or Chi-Square Test to calculate the p-value, and use The Wald Test to calculate a confidence interval.
  • Some people are happy to let Wald do all the work - calculate the p-value and the confidence interval.
  • The last method ensures that the p-value and confidence interval will always be consistent, but check and see what other folks do in your field to find out what is most acceptable.