I have been thinking lately about how to present modelled results from \(\text{log}_{10}\) transformed data (e.g., most immunological outcomes), but the ideas really apply for any non-linear transformation.

Geometric means

I would like to clarify, in my mind (and hopefully yours), what exactly the geometric mean is and how it should be interpreted. As discussed in this article online, the geometric mean of a set of \(n\) numbers is defined as the \(n^\text{th}\) root of the product of the set (as opposed to the sum of the set divided by \(n\) for the arithmetic mean). Intuitively, the geometric mean is a measure of central tendency when the set of numbers are exponential in nature. Although the intuition makes sense, I would still struggle to interpret a given geometric mean, but nevertheless, let’s push on.

Simulated data

To help ground our understanding, let’s simulate some generic immunological data to use as an example for the remainder of the discussion. Let’s assume the data is approximately normally distributed on the \(\text{log}_{10}\) scale with \(\text{E}[X] = 4\) and \(\text{Var}[X] = 0.25\).

N <- 100
mu <- 4
sigma2 <- 0.25
x <- rnorm(n = N, mean = mu, sd = sqrt(sigma2))
(sample_mean_x = mean(x))
## [1] 3.938046
(sample_var_x = var(x))
## [1] 0.2922701
(geometric_mean_x = prod(x)^(1/N))
## [1] 3.900136
ggplot() + 
  geom_density(aes(x = x)) + 
  geom_vline(aes(xintercept = sample_mean_x)) +
  geom_vline(aes(xintercept = geometric_mean_x), linetype = "dashed")

As you can see from the simulated data, the geometric mean is approximately equal to the arithmetic mean. This is because the data is approximately symmetric.

Relationship with the logarithm

Below I reformulate the geometric mean in terms of the logarithm (base 10). Recall the following properties of the logarithm, \(a^{\text{log}_a(b)}=b\), \(\text{log}_a(b^c) = c \times \text{log}_a(b)\) and \(\text{log}_a(\prod_{i=1}^n b_i) = \sum_{i=1}^n \text{log}_a(b_i)\), and suppose that we have a set of \(n\) data points \((x_1,x_2,\ldots,x_n)\). The geometric mean is:

\[\begin{align*} \left(\prod_{i=1}^n x_i\right)^\frac{1}{n} &= 10^{\text{log}_{10}\left(\left(\prod_{i=1}^n x_i\right)^\frac{1}{n}\right)} \\ &= 10^{\frac{1}{n} \text{log}_{10}(\prod_{i=1}^n x_i)} \\ &= 10^{\frac{1}{n}\sum_{i=1}^n \text{log}_{10}(x_i)} \end{align*}\]

We see that this reformulation is consistent with our simulated data:

geometric_mean_x
## [1] 3.900136
10^(mean(log(x, base = 10)))
## [1] 3.900136

What does this have to do with presenting modelled results?

Good question. As you may know, the reason we \(\text{log}_{10}\) transform the raw data is because it is often extremely positively skewed. Let’s have a look at the simulated data on the untransformed scale:

ggplot() + geom_density(aes(x = 10^x)) + xlab(TeX("$10^x$"))

Now, for obvious reasons we prefer to model the data on the \(\text{log}_{10}\) transformed scale. This means that our modelled results (parameter estimates) are also on the \(\text{log}_{10}\) scale, which is possibly problematic given that most readers will be more familiar with the untransformed scale.

In my mind, the first set of results to present should be the posterior distributions of the mean parameters on the model scale (\(\text{log}_{10}\)). With this, one could directly interpret the distributions of the means (whether or not one makes the correct interpretation of the posterior distributions of the means is another story). Here is the posterior distribution for \(\hat{\mu}\) (these are pseudo posterior samples to avoid specifying a particular model, which is unimportant):

mu_post <- rnorm(n = 10000, mean = mu, sd = 0.2)
ggplot() + 
  geom_density(aes(x = mu_post)) + 
  geom_vline(aes(xintercept = mu)) +
  xlab(TeX("$\\hat{\\mu}$"))

Great. As you can see our posterior distribution is nice and close the the “true” \(\mu\) (the value used for data generation). Notice that the posterior distribution is on the model scale (\(\text{log}_{10}\)) as mentioned above. Readers will then naturally want to see the responses on the untransformed scale. Fair enough, but easier said than done. This is where the rubber hits the road for me. What, specifically, do they actually want to see?

What does the mean of the mean, mean?

I want to attempt to disentangle the interpretation of the posterior distribution of \(\hat{\mu}\). We have defined \(\mu\) to be the mean of the distribution of an immunological outcome. The posterior distribution of \(\hat{\mu}\) then captures the uncertainty that we have in what this value actually is (presumably it has some true value, doesn’t it?) conditional on the observed data. In theory, with infinite data, we would be able to precisely estimate what the true value of \(\mu\) is (in our simulated example the posterior distribution would be a large spike at \(\mu = 4\)).

Anyways, we do not have infinite data, so instead we summarise our posterior knowledge of \(\mu\) with the posterior mean and a credible interval of some sort (or by presenting the entire distribution as I did in the previous section). In this case, the posterior mean is the mean of the posterior distribution of \(\hat{\mu}\) which is itself the mean of the distribution of the immunological outcome. So there you go, that’s what the mean of the mean, means.

Non-linear transformations

Now, back to answer the question about what readers may be looking for by requesting results presented on the untransformed scale. To start to answer this question, let’s assume we have infinite data and have perfectly estimated \(\mu = 4\) (i.e., with no uncertainty). One may think that we could simply transform this value (i.e., \(10^4 = 10,000\)) to retrieve the mean on the untransformed scale. Unforunately, thanks to Jensen’s inequality, this transformation is convex and so \(10^\mu = 10^{\text{E}[X]} < \text{E}[10^X]\) (in fact, in our example, \(\text{E}[10^X] \approx \frac{1}{n}\sum_{i=1}^n 10^{x_i} \approx 19,090 > 10,000\)), meaning that this solution would fail us.

So what does the transformation of \(\mu\) (i.e., \(10^4 = 10,000\)) really tell us? Well, as you may have guessed, this is the geometric mean of the data on the untransformed scale. Using the reformulated definition of the geometric mean above and defining the data on the untransformed scale as \(x_i^\ast=10^{x_i}\), the geometric mean of the data on the untransformed scale is:

\[\begin{align*} 10^{\frac{1}{n}\sum_{i=1}^n \text{log}_{10}(x_i^\ast)} &= 10^{\frac{1}{n}\sum_{i=1}^n \text{log}_{10}(10^{x_i})} \\ &= 10^{\frac{1}{n}\sum_{i=1}^n x_i} \\ &= 10^\mu \end{align*}\]

Let’s have a look at this distribution:

(model_geometric_mean_10mu <- mean(10^mu_post))
## [1] 11122.36
ggplot() + 
  geom_density(aes(x = 10^mu_post)) + 
  geom_vline(aes(xintercept = model_geometric_mean_10mu)) + 
  xlab(TeX("$10^{\\hat{\\mu}}$"))

Okay, so the transformed posterior distribution of \(\hat{\mu}\) is the posterior distribution of the geometric mean of the immunological outcomes. Great. Although, once again, given that I struggle to interpret the geometric mean itself, I do wonder how well the posterior distribution of the geometric mean will be interpreted once it is presented.

Should we present anything else?

I’m glad you asked. My personal preference would be to also present the posterior predicted distributions. I have a feeling, and I may be completely off base, that when we present posterior distributions, some readers interpret the distribution as a description of the possible values of the immunological outcomes. Of course, this would be incorrect, as the distributions describe the mean of the immunological outcomes, not their full distributions. To get an idea of this, we would need to incorporate the estimated variability in the responses, hence the suggestion to present posterior predicted distributions! Let’s have a look:

x_pred <- rnorm(n = N, mean = mu_post, sd = 0.5)
(model_mean_x_pred <- mean(x_pred))
## [1] 3.990711
ggplot() + 
  geom_density(aes(x = x_pred)) + 
  geom_vline(aes(xintercept = model_mean_x_pred)) +
  xlab(TeX("$x_{pred}$"))

This looks extremely similar to our raw data but this shouldn’t be surprising. Nevertheless, this describes how one could predict what a future immunological outcome may look like. Again, it may be more informative to have this on the untransformed scale. However, given that we are now working with predicted data and no longer the mean \(\mu\), we may freely make the transformation without concern about the arithmetic mean vs geometric mean problem. See below:

(model_mean_10x_pred <- mean(10^x_pred))
## [1] 23237.2
ggplot() + 
  geom_density(aes(x = 10^x_pred)) + 
  geom_vline(aes(xintercept = model_mean_10x_pred)) +
  xlab(TeX("$x_{pred}$"))

In this case, the mean we computed is truly the arithmetic mean because we computed it directly on the (predicted) data, and therefore, it’s interpretation is natural. This predicted distribution answers the question that I think is of premier interest, which is about how we could expect future immunological outcomes to behave.

Conclusion

We saw above how if we are to transform the posterior distribution of the mean on the model scale to the untransformed scale, then its interpretation changes to be that of the geometric mean (which once again, I find confusing). Alternatively, we could present the posterior predicted distributions on either the model scale or untransformed scale to aid in the interpretation of prediction.

Bonus

The night after writing this document I couldn’t sleep. I thought to myself, you know, it’s great that we have the geometric mean of the distribution of immunological outcomes, but I really want the mean! There must be a way to compute the posterior distribution of the mean on the untransformed scale. As it turns out, there is!

Mathematically, I want the following:

\[\begin{align*} \text{E}[10^X] &= \int^\infty_{-\infty}10^x \times \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu^2)} dx \end{align*}\]

This expression, as I later found out, does not have an analytical solution. In fact, an analytical solution only exists when the model scale uses the natural logarithm (curse the scientists that decided \(\text{log}_{10}\) was a good idea!). Anyways, fear not, we can always use numerical approximation (e.g., quadrature methods) to estimate this nasty integral. See below:

integrand <- function(mu){
  f <- function(x) 10^x*(1/(0.5*sqrt(2*pi)))*exp(-((x - mu)^2)/(2*0.5^2))
  return(f)
}

integrand_eval <- function(mu) pracma::quad(integrand(mu), xa = 0, xb = 20)

Now, let’s test it out assuming we knew \(\mu = 4\) and compare it to our “true” (simulated) mean:

integrand_eval(mu = mu)
## [1] 19400.96
mean(10^rnorm(1000000, mean = mu, sd = sqrt(sigma2)))
## [1] 19374.55

Looks like it works!

Okay, so we now have a function that for any given value of \(\mu\) will compute the approximate mean of the associated distribution transformed to the immunological outcome scale. Given that we have a posterior distribution for \(\mu\), so that we capture the uncertainty in the estimation, we could simply pass the full posterior through the function. Let’s see:

mu_post_t <- sapply(mu_post, function(m) integrand_eval(mu = m))
(mean_mu_post_t <- mean(mu_post_t))
## [1] 21578.45
ggplot() + 
  geom_density(aes(x = mu_post_t)) + 
  geom_vline(aes(xintercept = mean_mu_post_t)) +
  xlab(TeX("$E[10^X]$"))

There you go, we have the posterior distribution of the arithmetic mean immunological outcome on the untransformed scale!

The general case

Let’s formalise this idea for the general case. Suppose we have a random variable \(X\) and we are interested in making inference on the mean (i.e., \(\text{E}[X]\)). We collect data \(\boldsymbol{x}\) but, and this is the key, we transform the data via some well-behaved non-linear function \(Y = g(X)\) to a more convenient scale for analysis (i.e., \(\boldsymbol{y} = g(\boldsymbol{x})\)). We specify a likelihood function \(f(\boldsymbol{y}|\boldsymbol{\beta})\), where \(\boldsymbol{\beta}\) is our unknown parameter vector, along with some prior distribution and generate the posterior distribution \(\boldsymbol{\beta} | \boldsymbol{y}\).

The correct way to derive the posterior distribution of \(\text{E}[X]\) is to recognise that \(\text{E}[X] = \text{E}[g^{-1}(Y)] = \int_Y g^{-1}(y)f(y|\boldsymbol{\beta})dy\). Clearly, \(\text{E}[X]\) is a function of the unknown parameters \(\boldsymbol{\beta}\), but it is quite possibly very complicated and may not have an analytical form (although we can always use numerical approximation). Now, all we need to do is push through the posterior distribution of \(\boldsymbol{\beta}\) through the expression to recover \(\text{E}[X]\).