Perspectives on the Bayes factor

class: left, bottom, inverted, title-slide

.title[
# Perspectives on the Bayes factor
]
.author[
### Jorge N. Tendeiro Hiroshima University <a href="mailto:tendeiro@hiroshima-u.ac.jp" class="email">tendeiro@hiroshima-u.ac.jp</a> Materials: <a href="https://www.jorgetendeiro.com/talk/2023_unilisboa/">https://www.jorgetendeiro.com/talk/2023_unilisboa/</a>
]
.date[
### 01 March 2023 
]

---

background-image: url(Figures/pexels-max-fischer-5212343.png)
background-size:  cover

# Outline

The Bayes factor:

1. Introduction.

2. In practice.

3. Properties.

4. In applied research.

5. Conclusions, next steps.

The contents of this talk include materials that I recently presented at a conference:
[https://www.jorgetendeiro.com/talk/2023_csp/](https://www.jorgetendeiro.com/talk/2023_csp/)

---
# Setting

For this talk, I do _not_ assume that everyone is...

- ... acquainted with the Bayesian framework.

- ... acquainted with the Bayes factor.

- ... familiar with R nor JASP.

--

I included more material than I can discuss in today's talk, _on purpose_.
 
Those interested should have enough info to follow up afterwards!

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 1. Bayes factor — Introduction

---
## Bayes factor
Bayes factors are being increasingly advocated as a better alternative to _null hypothesis significance testing_ (NHST).1,2,3,4,5

.footnote[
1Jeffreys (1961) 
2Wagenmakers et al. (2010) 
3Vanpaemel (2010) 
4Masson (2011) 
5Dienes(2014)
]

---
## Bayes factor — Definition
The Bayes factor1,2 quantifies the change from prior odds to posterior odds due to the data observed. 
 
Consider:

- Two hypotheses (or models) to compare, `$\mathcal{H}_0$` _vs_ `$\mathcal{H}_1$`.
- Data `$D$`.

Assume that either `$\mathcal{H}_0$` or `$\mathcal{H}_1$` must hold true.
 
Then by Bayes’ rule ( `$i=0, 1$`):
`$$p(\mathcal{H}_i|D) = \frac{p(\mathcal{H}_i)p(D|\mathcal{H}_i)}
{p(\mathcal{H}_0)p(D|\mathcal{H}_0) + p(\mathcal{H}_1)p(D|\mathcal{H}_1)},$$`

and dividing member by member leads to

`$$\underset{\text{prior odds}}{\underbrace{\frac{p(\mathcal{H}_0)}{p(\mathcal{H}_1)}}}
\times 
\underset{\color{#A97F12}{\text{Bayes factor, }BF_{01}}}{\underbrace{\frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}}} = 
\underset{\text{posterior odds}}{\underbrace{\frac{p(\mathcal{H}_0|D)}{p(\mathcal{H}_1|D)}}}.$$`

.footnote[
1Jeffreys(1939) 
2Kass and Raftery (1995) 
]

---
## Bayes factor — Interpretation (1/2)
`$$\boxed{
BF_{01} = \frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}
}$$`

For instance, `$BF_{01} = 5$`:

> _The data are five times more likely to have occurred under `$\mathcal{H}_0$` than under `$\mathcal{H}_1$`._

---
## Bayes factor — Interpretation (2/2)
`$$\boxed{
\underset{\text{prior odds}}{\underbrace{\frac{p(\mathcal{H}_0)}{p(\mathcal{H}_1)}}}\times \underset{\color{#A97F12}{\text{Bayes factor, }BF_{01}}}{\underbrace{\frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}}} = 
\underset{\text{posterior odds}}{\underbrace{\frac{p(\mathcal{H}_0|D)}{p(\mathcal{H}_1|D)}}}
}$$`

For instance, `$BF_{01} = 5$`:
> _After observing the data, my relative belief in `$\mathcal{H}_0$` over `$\mathcal{H}_1$` increased by 5 times._

--

This holds regardless of the initial relative belief (i.e., prior odds) of a rational agent.

---
## Bayes factor — Possible values
`$BF_{01}=\frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)} \in [0, \infty)$`:

- `$BF_{01} > 1 \longrightarrow$` Evidence in favor of `$\mathcal{H}_0$` over `$\mathcal{H}_1$`.
- `$BF_{01} = 1 \longrightarrow$` Equal support for either model.
- `$BF_{01} < 1 \longrightarrow$` Evidence in favor of `$\mathcal{H}_1$` over `$\mathcal{H}_0$`.

--

Some qualitative cutoff labels have been suggested, for instance1,2,3.

Here's Kass and Raftery's classifier:

.footnote[
1Jeffreys (1939) 
2Kass and Raftery (1995) 
3Lee and Wagenmakers (2013) 
]

---
## Bayes factor — Computation
`$$\boxed{BF_{01} = \frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}}$$`

--

Essentially, any two statistical models that make predictions are in theory eligible to be compared via the Bayes factor.

We ''just'' need to evaluate each model's marginal likelihood:

`$$P(D|\mathcal{H}_i) = 
\displaystyle\int_{\Theta_i} \underbrace{p(D|\theta, \mathcal{H}_i)}_{\text{likelihood}}\underbrace{p(\theta|\mathcal{H}_i)}_{\text{prior}}d\theta.$$`

There are various numerical procedures for this.1,2,3,4,5,6,7,8
 
As of recently, bridge sampling7 has been of great practical use (in combination JAGS, Stan, or NIMBLE).

.footnote[
1Berger and Pericchi (2001) 
2Carlin and Chib (1995) 
3Chen, Shao, and Ibrahim (2000) 
4Gamerman and Lopes (2006) 
 
5Gelman and Meng (1998) 
6Green (1995) 
7Gronau et al. (2017) 
8Kass and Raftery (1995) 
]

---
## Bayes factor — Computation
`$$\boxed{BF_{01} = \frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}}$$`

For simpler models there are a few R packages available to assist with the computations:

- `BayesFactor`1 (mostly used).

- `bain`2.

- `easystats`3.

- `bayestestR`4.

- `brms`5 and `rstanarm`6, relying on the `bridgesampling`7 package.

There is also [JASP](https://jasp-stats.org/), a handy and open source GUI.

.footnote[
1Morey and Rouder (2022) 
2Gu et al. (2021) 
3Lüdecke et al. (2022) 
4Makowski, Ben-Shachar, and Lüdecke (2019) 
 
5Bürkner (2021) 
6Goodrich et al. (2022) 
7Gronau, Singmann, and Wagenmakers (2020)
]

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 2. Bayes factor — In practice

---
## Bayes factor — In JASP

---
## Bayes factor — In R

---
## Bayes factor — Default priors

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 3. Bayes factor — Properties

---
## Bayes factor — Critical appraisal
Bayes factor have been praised in many instances.1,2,3,4,5

But, surprisingly, I could not find many sources with critical appraisals of the Bayes factor.

--

I have been doing this for a few years now.6,7,8,9

.footnote[
1Dienes (2011) 
2Dienes (2014) 
3Masson (2011) 
4Vanpaemel (2010) 
5Wagenmakers et al. (2018) 
6Tendeiro and Kiers (2019) 
 
7Tendeiro, Kiers, and Ravenzwaaij (2022) 
8Tendeiro and Kiers (2023a) 
9 Tendeiro and Kiers (2023b)
]

---
## Bayes factor — Some properties

- Bayes factors are not posterior odds!

- Bayes factors are (at least _can be_) sensitive to priors!

- Bayes factors are a measure of relative evidence!

- Bayes factors can not establish absence/presence!

- Bayes factors are not an effect size measure!

- Inconclusive evidence is not evidence of absence!

- Bayes factors are a continuous measure of relative evidence!

---
## Bayes factor — Some properties
For the rest of this presentation, I will:

- Present the results of a study aiming at studying the occurrence of misconceptions in the literature.

- Explain each misconception.

- Speculate on why these misconceptions come about.

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 4. Bayes factors — In applied research

---
## Bayes factors — In applied research
Until recently, there was no characterization of the use of the Bayes factor in applied research.
 
Wong and colleagues1 were the first to start unveiling the current state of affairs.

--

In an ongoing effort, I am currently extending the work of Wong et al..
 
Here I report the details and main findings of my study.
 
Work with [Henk Kiers](https://www.rug.nl/staff/h.a.l.kiers/research?lang=en), [Rink Hoekstra](https://www.rug.nl/staff/r.hoekstra/), [Tsz Keung Wong](https://hk.linkedin.com/in/tsz-keung-wong-a93738161?trk=people_directory), and [Richard Morey](https://richarddmorey.com/).

Preprint (under review):
 
[https://psyarxiv.com/du3fc/](https://psyarxiv.com/du3fc/)

.footnote[
1Wong, Kiers, and Tendeiro (2022)
]

---
## Context
**Background**
 
Social Sciences.

**Target:**
 
NHBT and the Bayes factor in particular.

**Motivation:**
 
Bayes factors have been regularly used since, say, 2010.
 
It is very recent. 
 
Not many researchers have received formal training.
 
It is unclear how things are working out.

---
## Advanced literature search
_Google Scholar_ (2010—):

> `$\texttt{("bayes factor" AND "bayesian test" AND psychol)}$`

_Web of Science_:

> `$\texttt{(TI=((bayes factor OR bayes* selection OR bayes* test*) AND psycho*) OR}$`
> `$\texttt{AB=((bayes factor OR bayes* selection OR bayes* test* OR bf*) AND psychol*) OR}$`
> `$\texttt{AK=((bayes factor OR bayes* selection OR bayes* test* OR bf*) AND psychol*))}$`
> `$\texttt{AND PY=(2010-2022)}$`

`$109 + 58 = 167$` papers (after selection).

---
## Grading criteria
<center>
<img src="Figures/BF_QRIPs.png" alt="QRIPs." style="width:90%;"/>
</center>

---
## Results
<center>
<img src="Figures/BF_results.png" alt="Bayes factor study results" style="width:53%;"/>
</center>

---
## Results

Overall:

- 149 papers (89.2%) displayed at least one QRIP.

- 104 papers (62.3%) displayed at least two QRIPs.

---
## Discussion of the results

We reasoned over the reasons behind the found problems.

Below is a selected synopsis of our considerations.

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 4. Bayes factors — In applied research
## Bayes factors are _not_ posterior odds

---
## Bayes factors are _not_ posterior odds — _Explanation_
`$$\underset{\text{prior odds}}{\underbrace{\frac{p(\mathcal{H}_0)}{p(\mathcal{H}_1)}}}
\times 
\underset{\color{#A97F12}{\text{Bayes factor, }BF_{01}}}{\underbrace{\frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}}} = 
\underset{\color{#A97F12}{\text{posterior odds}}}{\underbrace{\frac{p(\mathcal{H}_0|D)}{p(\mathcal{H}_1|D)}}}.$$`

--

Say that `$BF_{01} = 32$`; what does this mean?

> _After looking at the data, we revise our belief towards `$\mathcal{H}_0$` by 32 times._

--

**Q:** What does this imply concerning the probability of each model, given the observed data?
 
**A:** On its own, nothing at all!

--

Bayes factors `$=$` rate of _change_ of belief, not the _updated_ belief.1

.footnote[
1Edwards, Lindman, and Savage (1963)
]

---
## Bayes factors are _not_ posterior odds — _What we found..._

> *"The alternative hypothesis is 2 times more likely than the null hypothesis ( `$B_{+0}=2.46$`; Bayesian 95% CI [0.106, 0.896])."*

.pull-right-30[
 

Incidence:
 
- 13.2% as definition
 
- 20.4% as interpretation


]

**Possible explanations:**

- Principle of indifference.

- Overselling Bayes as the _theory of inverse probability_.1

- Cognitive dissonance.

.footnote[
1Jeffreys(1961)
]

---
## Bayes factors are (at least can be) _sensitive_ to priors — _Explanation_
Very well known.1,2,3,4,5

`$$\boxed{P(D|\mathcal{H}_i) = 
\displaystyle\int_{\Theta_i} p(D|\theta, \mathcal{H}_i)\color{#A97F12}{p(\theta|\mathcal{H}_i)}d\theta}$$`

**Example: Bias of a coin**6

- `$\mathcal{H}_0: \theta = .5$` _vs_ `$\mathcal{H}_1: \theta \not= .5$`

- Data: 60 successes in 100 throws.

- Four within-model priors; all `$Beta(a, b)$`.

.footnote[
1Kass (1993) 
2Gallistel (2009) 
3Vanpaemel (2010) 
4Robert (2016) 
5Withers (2002) 
6Liu and Aitkin (2008) 
]

---
## Bayes factors are (at least can be) _sensitive_ to priors — _What we found..._
Reporting nothing at all (29.9%) or relying on software defaults (35.3%) was quite common.

--

**Possible explanations:**

- Lack of awareness.

- Economic writing style.

- Default priors to...
 
... ease comparison, avoid specification, meet 'objectivity'.
 
Also: improve peer-review chances, principle of indifference, preregistration.

---
## Bayes factors are a measure of _relative_ evidence — _Explanation_
Say that `$BF_{01} = 100$`; what does this mean?

> *The observed data are 100 times more likely under `$\mathcal{H}_0$` than under this particular `$\mathcal{H}_1$`.*

--

- Evidence is _relative_.1

- A model may actually be dreadful, but simply less so than its competitor.2,3

- Little is known as to how Bayes factors behave under model misspecification (but see4).

.footnote[
1Morey, Romeijn, and Rouder (2016) 
2Rouder (2014) 
3Gelman and Rubin (1995) 
4Ly, Verhagen, and Wagenmakers (2016) 
]

---
## Bayes factors are a measure of _relative_ evidence — _What we found..._

> *"With this 'stronger' VB05 prior, we found strong evidence for the null hypothesis ( `$\text{BFs}_\text{null}$` ranging from 12.7 to 22.7 for the 5 ROIs)."*

.pull-right-30[
 

Incidence 62.3%


]

**Possible explanations:**

- Writing style.

- Implicitly assumed.

- Increased impact.

---
## Bayes factors can _not_ establish absence/presence — _Explanation_

Say that `$BF_{01} = 100$`, for `$\mathcal{H}_0: \mu=0$` vs `$\mathcal{H}_1: \mu\not=0$`.

> *This does not imply that `$\mu=0$`.*

--

- First of all, the Bayes factor (as the `$p$`-value) is a stochastic endeavor, not a factual proof.

- Furthermore, the Bayes factor provides a relative assessment of the likelihood of the observed data, not of the entertained hypotheses.

---
## Bayes factors can _not_ establish absence/presence — _What we found..._

> *"For 6-year-olds, there was no difference between environments ( `$M_\textit{smooth} = 2.11$` vs. `$M_\textit{rough} = 1.93$`, `$t(52) = 1.0$`, `$p = 0.31$`, `$d = 0.3$`, `$BF = .42$`)."*

.pull-right-30[
 

Incidence 35.3%


]

**Possible explanations:**

- Increased impact.

- Avoid uncertainty.

- Writing style.

- Influence from NHST.

- Decision making.

---
## Bayes factors are _not_ an effect size measure — _Explanation_
**Example:**

- Bayesian one sample `$t$`-test:
 
`$\mathcal{H}_0: \mu=0$` vs `$\mathcal{H}_1: \mu\not=0$`.

- JZS default prior ( `$r=.707$`).

- `$\overline{x}=0.1$`, `$sd=1$` at each sample size (thus, the effect size is fixed throughout).

---
## Bayes factors are _not_ an effect size measure — _What we found..._

> *"Pupil size was larger in a higher tracking load (...). However, the Bayesian test showed only positive, but smaller, effect of Load on tracking pupil size ( `$BF_\text{incl.} = 7.506$`)."*

.pull-right-30[
 

Incidence 4.2%


]

**Possible explanations:**

- Recreating a similar misconception based on `$p$`-values.

- Bayes factor labels in use.

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 4. Bayes factors — In applied research
## Inconclusive evidence is _not_ evidence of absence

---
## Inconclusive evidence is _not_ evidence of absence — _Explanation_
`$$\boxed{BF_{01} = \frac{p(D|\mathcal{H}_0)}{p(D|\mathcal{H}_1)}\color{#A97F12}{=1}}$$`

> _Data are equally likely under either model._

--

Data are perfectly uninformative.

This does not equate to ''_there is nothing to be found_''.

---
## Inconclusive evidence is _not_ evidence of absence — _What we found..._

> *"In contrast there was no difference in meaning between the thinking without examples and planning conditions; the Bayes factor provided anecdotal evidence in favor of the null ( `$BF_{10} = .86$`)."*

.pull-right-30[
 

Incidence 3.6%


]

**Possible explanations:**

- Recreating a similar misconception based on `$p$`-values.

- Absence as default.

- Dichotomization.

- Increased impact.

- Preference for parsimony.

---
## Bayes factors are a _continuous_ measure of relative evidence — _Explanation_
Bayes factors are a continuous measure of evidence in `$[0, \infty)$`.
 
For instance, if `$BF_{01} > 1$` then

- The observed data are more likely under `$\mathcal{H}_0$` than under `$\mathcal{H}_1$`.

- The larger `$BF_{01}$`, the stronger the evidence for `$\mathcal{H}_0$` over `$\mathcal{H}_1$`.

--

**Q:** Can ''_more likely than_'' be qualified?
 
**A:** Several categorizations of strength of evidence (what is weak?, moderate?, strong?) exist.1,2,3,4

But this is problematic in various ways.

.footnote[
1Jeffreys (1961) 
2Kass and Raftery (1995) 
3Lee and Wagenmakers (2013) 
4Dienes (2016) 
]

---
## Bayes factors are a _continuous_ measure of relative evidence — _What we found..._

> *"(...) In terms of Bayes factor ( `$BF$`), evidence for greater disgust in the experimental group was strong ( `$BF_{10} > 10$`), but there was only weak evidence for a difference in other emotions ( `$BF_{10}\text{’s} < 3$`)."*

.pull-right-30[
 

Incidence 5.4%


]

**Possible explanations:**

- Summary.

- Seeking authority.

- Avoiding criticism.

- Borrowing from the literature and JASP.

- NHST ('significant', 'not significant').

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# 5. Conclusions, next steps

---
## Conclusions (1/2)

I think that, concerning testing:

- Model comparison (including hypothesis testing) is really important.

- However, and clearly, researchers test _way_ too much.

- Testing says very little about how well a model fits to data.

---
## Conclusions (2/2)
And what about estimation?

I think that:

- Testing need not be a prerequisite for estimation, unlike what some advocate.1

- Estimation quantifies uncertainty in ways that Bayes factors simply can not.

- Estimating effect sizes (direction, magnitude) is crucial. Bayes factors ignore this!

- Avoiding the dichotomous reasoning subjacent to Bayes factors can help.

Bayes factors can be very useful (I use them!).
But they should not always be the end of our inference.

.footnote[
1Wagenmakers et al. (2018) 
]

---
## What’s next?

A follow-up study is in preparation.

- Create and deploy a Shiny app that illustrates correct and incorrect usage of the Bayes factor.

- Assess the efficacy of this app by means of an experiment.

---
class: center, middle
background-image: url(Figures/pexels-ryutaro-tsukata-5191373_SEP.png)
background-size:  cover
# Questions?