Project Update – May 5th, 2018

What I’ve Been Working On

Since my last blog post, I spent a few hours attempting to work out the code for my project. I perused GitHub and found several users who had created code to integrate parts of Praat with Python. However, some of the best examples were coded for an older version of Python and had not been updated. Additionally, I began to realize that even after my code would begin working, vowel formants differ so much between speakers that I would have to cater the program to my own voice in order for it to work, because catering to any user would be beyond the scope of this project.

Additionally, my original goal was to complete a project with direct implications for my a cappella group, and this program would not beneficial to my group, since we can hear which vowel we’re trying to match, and pitch can be easily detected in realtime using an application like iAudioTool.

What does this mean?

I’ve decided to return to my original idea of analyzing blend in a cappella music based on vowel formants. The following is what I’ve been working on so far and how I’ll be implementing this project this week.

Research question

How does blend differ between choral music and pop music?


I hypothesize that my participants will show a greater degree of blend to rounded vowels than to unrounded vowels. Since choral music consists of more rounded vowels than pop music, I hypothesize that blending to choral music is articulatorily easier than blending to pop music.


As Jim discussed in class, human speech is made up of different formants. A formant is a high amplitude of waves at a certain frequency in a sample of human speech. There are several different formants, ranging from lowest to highest frequencies, and each vowel has a unique distribution of formants in the range of possible frequencies. This frequency distribution is the fingerprint of the vowel, and this is how we hear the difference between vowels. F0 corresponds to the perceived pitch of a vowel, F1 corresponds to vowel height, and F2 corresponds to the backness of a vowel. The phoneme /i/ (the vowel in the English word “feet”) has a low F1 and a high F2, while the phoneme /u/ (the vowel in the English word “you”) has a low F1 and a low F2.

In a cappella music, vowels are extremely important, because when two singers are supposed to be singing the same vowel and they’re forming the sound differently, the differing pronunciations detract immensely from the music. This vowel matching, called blend, is therefore not just a matter of audience perception, but can be considered in light of the aforementioned vowel formants.

As I referenced in a previous blog post, some research has been done to scientifically study choral blend. I found an article in the Journal of Research in Music Education called “An Acoustical Study of Individual Voices in Choral Blend” that talks about how singers’ formants differ when they’re singing a solo piece versus trying to blend with a choir…). This study thoughtfully considers the differences in singers’ vowel formants when singing in a solo style versus in a choral style in which they’re attempting to blend.


For this study, participants will be asked to sing a series of vowels at a pitch that I will give on the piano app on my phone. In the previously cited experiment, they tested on the vowels [a], [o], [u], [e], and [i]. I will be using the same vowels, except I will be substituting [α] instead of [a], which is a more common English phoneme. As in the study I read about, I will be testing these vowels at 3 different pitches. First they will sing alone. After this recording, I will play a clip of myself singing these same vowels, then play it again and ask the participants to sing along and blend with the recording.

I will also be adding a part to the study in which participants will listen to an audio clip of a song, then are asked to sing along to that song while blending to the singer in the audio. This component was not in the original experiment, and I think it will be interesting to hear the effects on blend of singing actual lyrics besides “oohs” and “ahs.” This part will also give me the opportunity to analyze blends with diphthongs, such as /aI/ and /eI/, the sounds in the English words “high” and “way,” respectively.

Another fascinating difference in my project is that I will be analyzing both male and female voices, while the referenced study considered only sopranos. This is super applicable for coed groups such as Under Construction, the one I’m part of.


For each participant, I will use Praat to create a spectrogram of each of the 5 aforementioned vowels, as well as selected vowels from the song the participants sing. I will use the “get formants” feature in Praat to extract the frequency of formants 0 through 3, and I will then enter this data into two .txt files, one for non-blended singing and one for blended singing. The first line of the file will include the measurements of the recording.

I will then use Python in order to analyze this data. I will find the differences in formants between unblended and blended vowels, and between participants and the recording both without and with blending. I will find the mean of these differences and analyze which vowels are the closest to the standard. I will create scatterplots of these data as well using PyLab. Hopefully following this data analysis, I will be able to see which vowels have the greatest potential in blending and can therefore show differences between blending in choral and pop music.

One thing I anticipate is needing to use a metric other than absolute frequency for the formants, since according to what I’ve learned in my phonetics course, F1-F3 may differ between individuals. I’m curious to see whether this holds for singing, when blend is being attempted. If the formants are radically different, I may consider relative frequencies to each participant’s baseline rather than immediately comparing to my prerecording.

Project Update – April 25th

A New Direction

Since my first post, I have further narrowed down what I hope to accomplish in my project. The following is an update on what my new goal is and how I will be working to accomplish that goal.


My new goal is to create a user-friendly program that analyzes vowel formants in an audio signal and presents to the user the IPA symbol of the vowel being produced. The program will consider the most commonly found vowels in English: /i/, /ɪ/, /ɜ/, /ɝ/, /æ/, /u/, /o/, /ʌ/, /ə/, /ɚ/, and /ɑ/. If possible, I would love to also be able to analyze the English diphthongs, /eɪ/, //,  /oʊ/, /ɔɪ/, and //. The diphthongs will be more challenging, since the vowel formants change partway through these sounds.

The program should also present the fundamental frequency of the sung or spoken vowel. Therefore, a singer can see on the screen both the vowel they’re singing and the pitch of that vowel.


I’ve been studying phonetics and phonology this semester, and the tools and methods I’ve learned will be very helpful in this project. Acoustic phonetics studies the sound signal of speech sounds, and one area of this considers the formants produced when speaking different vowels. When a speaker produces a sound, the fundamental frequency gives the pitch that we hear the speaker produce– for example, this is the pitch a singer sings to match the intonation of a song. The peaks in loudness at other frequencies are the different formants of this vowel sound, and the values at which these vowels occur changes the quality of the vowel. The most important formants to consider are the first and second, which correlate to where in the vowel space the sound is being produced and therefore correspond to particular vowels. The third formant mostly corresponds to the rhoticity of a vowel. These first three formants, especially the first and second, will be the main parameters I will look at to distinguish the vowels. More information about this can be found here:

Updated Methods and Tools


I will be using Praat to extract the formants from the audio recordings. Praat allows a user to record and/or upload audio, and from this audio, the program analyzes the acoustic signal and creates a spectrogram showing the formants. One aspect of the program even superimposes lines to highlight where the formants are, and using tools in Praat one can find the decimal values of each formant. After recording vowel sounds, I will use the decimal values of each formant when creating the cutoffs in my program that tells the user which vowel is being produced.

Python/Jupyter Notebooks

I will use a Jupyter notebook in order to create an accessible interface where a user can upload their audio file and easily find out what the IPA representation and fundamental frequency of that vowel are. By utilizing the Jupyter interface, users unfamiliar with Python will have an easier time with this program.

I plan on coding the necessary functions in Python in order to use the formant frequency data to choose from a dictionary of frequency values and share with the user which phoneme is being produced.


In order to extract the formants from the data in Praat, I plan on using code from GitHub.

The above program interacts with Praat in order to find the formants at a particular point or interval in time.

Design Approach


I plan on using equations in order to determine the standard distribution of frequencies that form a particular formant. For example, if I analyze each member of my group singing a particular vowel, I can collect the data of each formant, and create a distribution to analyze what the mean frequency is for each formant of that vowel. Since men and women have slightly different formants, I will also likely need to calculate the differences to see if blend among women differs from blend among men.

I will likely use these equations, which I’ve sources from

Mean:          μ = (1/n)∑xi

Std. Deviation:          sx = Sqrt[1/(n-1)∑[(xi −x bar)^2]

z-score:          z = (x−μ)/σ

Correlation:          r = 1/(n-1) ∑(from i=1 to n) [(xi −x bar)/sx ][(yi −y bar)/sy ]




In order to analyze the different frequencies present when a speaker says or sings a certain formant, I will be using a technology called Praat. Developed in Europe, this software allows a user to record a vocal sound and then analyze the different frequencies produced. I plan to utilize the function that creates a spectrogram and a waveform. It not only shows dark bands for different formants, but it also has the functionality to superimpose a line of best fit so the user can better visualize the formant’s change over time. There is also a feature that allows the user to break up the speech signal into phonemes, which creates readily understandable images to be presented with the data.

Here’s a screenshot of analyzing a word in Praat:


I plan on further analyzing the data I collect using Audacity. Audacity also allows for more intricate audio editing than Praat, so I will likely use it for filtering and trying to change formants in order to fix the blend on audio segments.


Things are still in the works, but I hope to have a component of my project that takes in audio samples and returns which vowel is being produced. If that is too difficult or not applicable enough to my project, I hope to find another way to be able to utilize Python in the way I analyze formants and calculate blend, in order to create a user interface in which a user who knows nothing about formants or the science of waves can still analyze blend. This will likely be in a Jupyter notebook, since that is such a great user interface.

Introduction and Background

My project will be considering the science behind a cappella. I will be studying the waveforms of different voices individually and combined in order to consider the effects of frequency and vowel/phoneme quality on blend. The motivation for this project comes from experience in my a cappella group, Under Construction, in combination with my experience in my phonetics and phonology course. 

When my group rehearses our music, one of the most challenging things to do (besides staying in key) is blending with the other members of the group. Even when we’re all singing the correct notes, if someone’s vowel differs from the rest, they stick out, and the piece sounds bad. We need to blend on high notes as well as low notes– basses need to be able to blend with sopranos. Similarly, we need to sound uniform when we’re singing the same vowels/phones, but I wonder if some phonemic sounds are more conducive to blend based solely on their phonetic properties present in the wave forms.

I plan to complete this analysis by first recording members of my a cappella group in Audacity. I will then import files into a phonetics program called Praat, in which I will create graphs of and analyze the formants present in different sounds and how these differ when there are multiple people singing, when those people are or are not attempting to blend, etc.

Some research has been done to scientifically study choral blend. I found an article in the Journal of Research in Music Education called “An Acoustical Study of Individual Voices in Choral Blend” that talks about how singers’ formants differ when they’re singing a solo piece versus trying to blend with a choir…).

Here is a link of my a cappella performing at a concert at Yale in the hall. This example is especially interesting because we’d never performed in this space before, and the room acoustics were such that it was difficult to hear one another, and this likely had a huge effect on our blend.

Hopefully through this project, I can analyze video/audio clips like the above and figure out how to have better blend in a cappella music by going down to the level of the waves to understand the music better.