Project Update – April 25th

A New Direction

Since my first post, I have further narrowed down what I hope to accomplish in my project. The following is an update on what my new goal is and how I will be working to accomplish that goal.


My new goal is to create a user-friendly program that analyzes vowel formants in an audio signal and presents to the user the IPA symbol of the vowel being produced. The program will consider the most commonly found vowels in English: /i/, /ɪ/, /ɜ/, /ɝ/, /æ/, /u/, /o/, /ʌ/, /ə/, /ɚ/, and /ɑ/. If possible, I would love to also be able to analyze the English diphthongs, /eɪ/, //,  /oʊ/, /ɔɪ/, and //. The diphthongs will be more challenging, since the vowel formants change partway through these sounds.

The program should also present the fundamental frequency of the sung or spoken vowel. Therefore, a singer can see on the screen both the vowel they’re singing and the pitch of that vowel.


I’ve been studying phonetics and phonology this semester, and the tools and methods I’ve learned will be very helpful in this project. Acoustic phonetics studies the sound signal of speech sounds, and one area of this considers the formants produced when speaking different vowels. When a speaker produces a sound, the fundamental frequency gives the pitch that we hear the speaker produce– for example, this is the pitch a singer sings to match the intonation of a song. The peaks in loudness at other frequencies are the different formants of this vowel sound, and the values at which these vowels occur changes the quality of the vowel. The most important formants to consider are the first and second, which correlate to where in the vowel space the sound is being produced and therefore correspond to particular vowels. The third formant mostly corresponds to the rhoticity of a vowel. These first three formants, especially the first and second, will be the main parameters I will look at to distinguish the vowels. More information about this can be found here:

Updated Methods and Tools


I will be using Praat to extract the formants from the audio recordings. Praat allows a user to record and/or upload audio, and from this audio, the program analyzes the acoustic signal and creates a spectrogram showing the formants. One aspect of the program even superimposes lines to highlight where the formants are, and using tools in Praat one can find the decimal values of each formant. After recording vowel sounds, I will use the decimal values of each formant when creating the cutoffs in my program that tells the user which vowel is being produced.

Python/Jupyter Notebooks

I will use a Jupyter notebook in order to create an accessible interface where a user can upload their audio file and easily find out what the IPA representation and fundamental frequency of that vowel are. By utilizing the Jupyter interface, users unfamiliar with Python will have an easier time with this program.

I plan on coding the necessary functions in Python in order to use the formant frequency data to choose from a dictionary of frequency values and share with the user which phoneme is being produced.


In order to extract the formants from the data in Praat, I plan on using code from GitHub.

The above program interacts with Praat in order to find the formants at a particular point or interval in time.

Design Approach


I plan on using equations in order to determine the standard distribution of frequencies that form a particular formant. For example, if I analyze each member of my group singing a particular vowel, I can collect the data of each formant, and create a distribution to analyze what the mean frequency is for each formant of that vowel. Since men and women have slightly different formants, I will also likely need to calculate the differences to see if blend among women differs from blend among men.

I will likely use these equations, which I’ve sources from

Mean:          μ = (1/n)∑xi

Std. Deviation:          sx = Sqrt[1/(n-1)∑[(xi −x bar)^2]

z-score:          z = (x−μ)/σ

Correlation:          r = 1/(n-1) ∑(from i=1 to n) [(xi −x bar)/sx ][(yi −y bar)/sy ]




In order to analyze the different frequencies present when a speaker says or sings a certain formant, I will be using a technology called Praat. Developed in Europe, this software allows a user to record a vocal sound and then analyze the different frequencies produced. I plan to utilize the function that creates a spectrogram and a waveform. It not only shows dark bands for different formants, but it also has the functionality to superimpose a line of best fit so the user can better visualize the formant’s change over time. There is also a feature that allows the user to break up the speech signal into phonemes, which creates readily understandable images to be presented with the data.

Here’s a screenshot of analyzing a word in Praat:


I plan on further analyzing the data I collect using Audacity. Audacity also allows for more intricate audio editing than Praat, so I will likely use it for filtering and trying to change formants in order to fix the blend on audio segments.


Things are still in the works, but I hope to have a component of my project that takes in audio samples and returns which vowel is being produced. If that is too difficult or not applicable enough to my project, I hope to find another way to be able to utilize Python in the way I analyze formants and calculate blend, in order to create a user interface in which a user who knows nothing about formants or the science of waves can still analyze blend. This will likely be in a Jupyter notebook, since that is such a great user interface.

Introduction and Background

My project will be considering the science behind a cappella. I will be studying the waveforms of different voices individually and combined in order to consider the effects of frequency and vowel/phoneme quality on blend. The motivation for this project comes from experience in my a cappella group, Under Construction, in combination with my experience in my phonetics and phonology course. 

When my group rehearses our music, one of the most challenging things to do (besides staying in key) is blending with the other members of the group. Even when we’re all singing the correct notes, if someone’s vowel differs from the rest, they stick out, and the piece sounds bad. We need to blend on high notes as well as low notes– basses need to be able to blend with sopranos. Similarly, we need to sound uniform when we’re singing the same vowels/phones, but I wonder if some phonemic sounds are more conducive to blend based solely on their phonetic properties present in the wave forms.

I plan to complete this analysis by first recording members of my a cappella group in Audacity. I will then import files into a phonetics program called Praat, in which I will create graphs of and analyze the formants present in different sounds and how these differ when there are multiple people singing, when those people are or are not attempting to blend, etc.

Some research has been done to scientifically study choral blend. I found an article in the Journal of Research in Music Education called “An Acoustical Study of Individual Voices in Choral Blend” that talks about how singers’ formants differ when they’re singing a solo piece versus trying to blend with a choir…).

Here is a link of my a cappella performing at a concert at Yale in the hall. This example is especially interesting because we’d never performed in this space before, and the room acoustics were such that it was difficult to hear one another, and this likely had a huge effect on our blend.

Hopefully through this project, I can analyze video/audio clips like the above and figure out how to have better blend in a cappella music by going down to the level of the waves to understand the music better.