Design Approach | Aca-coustics: Analyzing vowel formants with waveforms

Equations

I plan on using equations in order to determine the standard distribution of frequencies that form a particular formant. For example, if I analyze each member of my group singing a particular vowel, I can collect the data of each formant, and create a distribution to analyze what the mean frequency is for each formant of that vowel. Since men and women have slightly different formants, I will also likely need to calculate the differences to see if blend among women differs from blend among men.

I will likely use these equations, which I’ve sources from http://integral-table.com/downloads/stats.pdf:

Mean: μ = (1/n)∑x_i

Std. Deviation: s_x = Sqrt[1/(n-1)∑[(x_i −x bar)^2]

z-score: z = (x−μ)/σ

Correlation: r = 1/(n-1) ∑(from i=1 to n) [(x_i −x bar)/s_x^][(y_i −y bar)/s_y_^]

Tools

Praat

In order to analyze the different frequencies present when a speaker says or sings a certain formant, I will be using a technology called Praat. Developed in Europe, this software allows a user to record a vocal sound and then analyze the different frequencies produced. I plan to utilize the function that creates a spectrogram and a waveform. It not only shows dark bands for different formants, but it also has the functionality to superimpose a line of best fit so the user can better visualize the formant’s change over time. There is also a feature that allows the user to break up the speech signal into phonemes, which creates readily understandable images to be presented with the data.

Here’s a screenshot of analyzing a word in Praat:

Audacity

I plan on further analyzing the data I collect using Audacity. Audacity also allows for more intricate audio editing than Praat, so I will likely use it for filtering and trying to change formants in order to fix the blend on audio segments.

Python

Things are still in the works, but I hope to have a component of my project that takes in audio samples and returns which vowel is being produced. If that is too difficult or not applicable enough to my project, I hope to find another way to be able to utilize Python in the way I analyze formants and calculate blend, in order to create a user interface in which a user who knows nothing about formants or the science of waves can still analyze blend. This will likely be in a Jupyter notebook, since that is such a great user interface.

7 thoughts on “Design Approach”

Make sure you look at the page that we created based on Jim’s speech lecture. He has good tools there that focus on speech that may give you some further ideas on what to analyze: https://canvas.harvard.edu/courses/34558/pages/links-from-jims-speech-lesson.
Though Praat sounds amazing! I looked at a tool called Sonic Visualizer that has a lot of plugins that maybe of interest to you as well, take a look here: https://www.vamp-plugins.org/download.html

For signal processing we look at things like (auto) correlation and power spectrum to determine the distribution, if you wish, of different frequencies. Let’s talk some more about this and also talk to Jim who is most knowledgeable in speech in our group.

I’m wondering what specific kinds of data do you plan to analyze? Also, in analyzing the mean frequency for men/women of a specific vowel, will you need to record each person separately or will that software be able to capture individual’s voices? Also, I understand the concept of blend in acapella; it’s the same in classical music. However, is there a way you plan to quantify blend or do you rather just plan to correlate it to a specific trend in frequency content in whatever you find? Both ways seem really interesting.

ogphillips on April 24, 2018 at 7:41 pm said:

Great questions! I was going to consider more of a relative correlation in frequencies and how individuals’ formants matched up when blending. However, I’ve since altered my project a bit, which I will be updating shortly. My project will now be more focused on generating a response based on an audio input. When a user sings a vowel, the program will be able to tell which vowel it is, and hopefully which note is sung as well. Thus, the data I’ll be analyzing will be the frequency of the different formants of the inputted audio as well as the fundamental in order to identify the note sung.

Praat looks really cool — would you be able to integrate it with Python at all, or is it more of a stand-alone program?

Also curious what sorts of audio processing you intend to use Audacity for — are you going to try signal processing to actually enhance ‘blend’ as opposed to getting singers to sing differently (or try to digitally replicate the change in sound from a different method of singing)?

ogphillips on April 25, 2018 at 11:44 pm said:

Yes! I’ve actually found some code on GitHub to assist me in pulling the formants from the files I record in Praat. I’m excited about this, because I really enjoy working with Praat.

I was previously considering how to alter audio signals in order to make them sound more like a particular vowel, but I’ve since updated my project a bit. I will shortly be updating my website to reflect these changes, but in short I plan on using a combination of Praat and Python in order to extract formants from vowels and produce to the user which vowel the audio signal consists of.
- Salma on April 30, 2018 at 1:18 pm said:
  
  This is great. Can you bring the goals of your analysis as you are discussing back to the original interest of how it relates to your Acapella group and that type of performance?

Olivia, this looks super interesting! I was wondering, does Pratt account for differences in accents or general enunciation that may exist between different singers singing the same words? Is that part of what you’ll be studying, or will it make it harder to study what you intend on looking at?

Comments are closed.