Benchmarking my perception of sound
15 December 2024
Learning about Bayesian statistics and the Jeffreys prior
I’ve been learning about Bayesian statistics and computational methods recently so I want to work on a few small projects to put into practice what I have been learning.
Today I decided to benchmark my perception of sound: I created a game where I’m played a random tone and I have to respond with a number that is as close as possible to the frequency that was played. I sample frequencies uniformly at random between 131Hz and 495Hz.
I wanted to do this experiment as I came up for two models for how I’d perform on this game. If we let be the frequency I report and be the frequency that I actually heard, my first model is that I will give the frequency that I heard added to some normally distributed error term, that is
Where . If this is the case, I’d expect to be zero as I don’t expect to have a consistent bias to guessing higher or lower than the actual frequency.
My second model is motivated by the fact that human sound perception is inherently logarithmic and musical intervals are actually frequency ratios. For example, playing C3 then E3 sounds the same as playing a C4 then E4: it’s just an octave higher however the difference between C3 and E3 is about 35Hz and the difference between C4 and E4 is about 70Hz. And in fact, an octave difference is a doubling of the frequency not a fixed difference in Hz. My second model hypothesises that the differences between the logarithms of the frequencies or “cents” is normally distributed.
Where is again a normally distributed error term.
I decided to test these two models as I hypothesised that my absolute error in frequency would become worse for higher tones which is a consequence of the second model but not the first model. This came about as earlier this week, I began teaching myself to report numerical frequencies as opposed to the musical note names: I wanted to teach myself to respond with “440 Hertz” as opposed to “A4” in response to hearing a tone played at 440Hz.
I quickly created a python script to collect this data. It looks like this
import pysine
import random
LOWEST = 131
HIGHEST = 494
should_continue = True
played = []
heard = []
iteration = 0
try:
while should_continue:
iteration += 1
tone = LOWEST + random.random() * (HIGHEST - LOWEST)
read = None
while read is None:
pysine.sine(tone)
entered = input(f"{iteration}>> ").lower().strip()
if entered == "q":
should_continue = False
break
if entered == "r":
continue
try:
read = float(entered)
except:
print("Unable to read number")
print(f"Played {tone}, you heard {read}")
played.append(tone)
heard.append(read)
except KeyboardInterrupt:
pass
except EOFError:
pass
print("tone,heard")
for x, y in zip(played, heard):
print(f"{x},{y}")
Here are the results from my first play through of the game where I answered 20 questions
From a glance both of these models seem fairly sensible: my reported frequency appears to closely track the frequency that I actually heard. The error plot for my first game looks like this
It’s hard to get any insights from this by just analysing the data visually, so I turn to statistics to compare how well these models explain the data.
The first thing I do is play another round of the game but this time I listen to 50 tones instead of just 20. The sine waves were also pretty boring to listen to so I synthesised my own instrument however I did a quick runthrough of the game with the synthesiser I made and I noticed that there was a clear upwards bias in the frequency that I reported. I reverted back to using sine waves and noticed that this bias disappeared which indicates that this bias was an artefact of how I wrote my synthesiser. I added harmonics but not subharmonics to my synth so this may have caused me to perceive the notes from the synthesiser as being higher than they actually were. That’s the euphemistic way of explaining that I wrote a bad synth.
After going back to using sine waves, the data I’m analysing looks like this
The error looks like this
Now, the absolute error looks like this
This final graph is important as it shows that the error term is clearly heteroskedastic which means that the variance of the error term is not constant: the error term has much larger variance for higher frequencies.
This matters because the first model predicts a homoskedastic resdiual term which does not match the data. My second model predicts heteroskedasticity but also predicts that the error for log frequency is homoskedastic. The plot for the absolute error between log frequencies looks like this
Visually, this is homoskedastic which is really good for the second model and is strong evidence that the second model better explains the data. Now, let’s quantitatively examine these models using bayesian statistics.
Bayesian statistics centres itself around Bayes’ rule which states that or that the likelihood of after having observed is proportional to the prior likelihood of theta multiplied by the probability of observing with parameter .
What does this mean? If we have a distribution representing our beliefs about the value of , Bayes’ rule tells us how to update our beliefs after observing new data. The distribution from before observing the data is called our “prior” and the updated distirbution after observing the data is called the “posterior”.
If we fix for both models then both models become single paramter models as we have . What is my prior distribution for ?
For the sake of learning I’d like to use an uninformative prior: specifically, the Jeffreys prior, named after Harold Jeffreys.
The Jeffreys prior gives us a prior distribution that is invariant under montone transformations of the parameter. For example, if I took a uniform distribution as my prior for between 0 and 1 then the implied distribution of would also be between 0 and 1 but it would not be uniform. We can contrast this to the Jeffreys prior where the Jeffreys prior for aligns with the Jeffreys prior for .
The Jeffrey’s prior is specified as proportional to the square root of the determinant of the Fisher information matrix.
Our normal distribution with mean 0 looks like
We have a single parameter model so we get
Unfortunately, this is an improper prior and the integral across the support is not finite and does not form a true probability distribution. I’m too scared to use an improper prior after reading about how they can lead to nonsense inference results. Analysing whether or not they’re appropriate to use appears to be very complicated so I’ll stay away from improper priors for a few years.
Fortunately, I can create my own uninformative prior. I want to feign ignorance about both the variance and the standard deviation so I’d like a prior that ensures me transformational invariance between and I can get this with any pair of distributions and that satisfy
So we now have
Integrating this by parts we get
Where denotes the CDF for and lower bounds the variance in this distribution. Unfortunately, I do not want to solve this functional equation so I’ll just use the Jeffreys prior.
Using the Jeffreys prior for the linear model gives me the following posterior distribution
The Jeffreys prior for the log-linear model gives me the following posterior distribution
After taking the MLE for each of these models I can find a point estimate for the Bayes factor for the two models. The Bayes factor allows me to quantify how much better one model explains the data than another.
My code for calculating the likelihoods for each of the models looks like this
likelihood_linear <- prod(dnorm(df$heard - df$tone, mean=0, sd=sigma_linear))
likelihood_log <- prod(dnorm(log(df$heard) - log(df$tone), mean=0, sd=sigma_log))
likelihood_log / likelihood_linear
The likelihood for the the first model is about whereas the likelihood for the second model comes out to be around . This gives us a ratio of about and numerically integrating over the posterior distributions to approximate the true Bayes factor gives me about the same result which is a decisive victory for the log-linear model.
I learned a lot through this project and I really enjoyed surveying Bayesian statistics while doing background reading and was super fun all around.