Project Update – April 25th

A New Direction

Since my first post, I have further narrowed down what I hope to accomplish in my project. The following is an update on what my new goal is and how I will be working to accomplish that goal.

Goal

My new goal is to create a user-friendly program that analyzes vowel formants in an audio signal and presents to the user the IPA symbol of the vowel being produced. The program will consider the most commonly found vowels in English: /i/, /ɪ/, /ɜ/, /ɝ/, /æ/, /u/, /o/, /ʌ/, /ə/, /ɚ/, and /ɑ/. If possible, I would love to also be able to analyze the English diphthongs, /eɪ/, //,  /oʊ/, /ɔɪ/, and //. The diphthongs will be more challenging, since the vowel formants change partway through these sounds.

The program should also present the fundamental frequency of the sung or spoken vowel. Therefore, a singer can see on the screen both the vowel they’re singing and the pitch of that vowel.

Background

I’ve been studying phonetics and phonology this semester, and the tools and methods I’ve learned will be very helpful in this project. Acoustic phonetics studies the sound signal of speech sounds, and one area of this considers the formants produced when speaking different vowels. When a speaker produces a sound, the fundamental frequency gives the pitch that we hear the speaker produce– for example, this is the pitch a singer sings to match the intonation of a song. The peaks in loudness at other frequencies are the different formants of this vowel sound, and the values at which these vowels occur changes the quality of the vowel. The most important formants to consider are the first and second, which correlate to where in the vowel space the sound is being produced and therefore correspond to particular vowels. The third formant mostly corresponds to the rhoticity of a vowel. These first three formants, especially the first and second, will be the main parameters I will look at to distinguish the vowels. More information about this can be found here: http://hyperphysics.phy-astr.gsu.edu/hbase/Music/vowel.html

Updated Methods and Tools

Praat

I will be using Praat to extract the formants from the audio recordings. Praat allows a user to record and/or upload audio, and from this audio, the program analyzes the acoustic signal and creates a spectrogram showing the formants. One aspect of the program even superimposes lines to highlight where the formants are, and using tools in Praat one can find the decimal values of each formant. After recording vowel sounds, I will use the decimal values of each formant when creating the cutoffs in my program that tells the user which vowel is being produced.

Python/Jupyter Notebooks

I will use a Jupyter notebook in order to create an accessible interface where a user can upload their audio file and easily find out what the IPA representation and fundamental frequency of that vowel are. By utilizing the Jupyter interface, users unfamiliar with Python will have an easier time with this program.

I plan on coding the necessary functions in Python in order to use the formant frequency data to choose from a dictionary of frequency values and share with the user which phoneme is being produced.

GitHub

In order to extract the formants from the data in Praat, I plan on using code from GitHub. https://github.com/mwv/praat_formants_python

The above program interacts with Praat in order to find the formants at a particular point or interval in time.

6 thoughts on “Project Update – April 25th

  1. What do you mean ‘decimal’ values? In writing this blog please make sure that you assume your audience knows very little about this area of interest. We (students and staff) are interested in learning from you about this area that interests you.

  2. Can you elaborate and think about how your work would benefit your Acapella group, is there a realtime application to your work to say when you are rehearsing together?

  3. I think you’re narrowing down on a really cool idea. To echo Salma, I’m also wondering about the potential applications of your project to your acapella group. Perhaps more analysis would be necessary to determine what specific combinations of frequencies of fundamentals and formants would produce the most sonorous blend? How do you plan on using this program to benefit your group?

  4. Are you going to get the frequency / pitch information by doing something similar to what we did in lab, i.e. calculating a power spectra and trying to infer the fundamental from that?

    Also, have you thought about what margin of error there is in distinguishing one formant from another? Or does Praat take care of that automatically (even so, it should be hard if the vowels are slurred or said with different accents, I’d think).

    • I believe Praat can also retrieve the fundamental frequency of a pitch, so I will likely use that function.

      There will definitely be a margin of error, and I expect this program will have its quirks, since I plan on creating it based on my own pronunciation of different vowels. I will do my best to generalize the program, but since voices vary immensely, I’ll begin by trying to map my own vowels to the program.

  5. Similar to my comment on your previous post, do you think the program will be able to account for differences in pronunciation or accents that may make one syllable sound like a different syllable spoken by a different person? Also, as someone not as well versed on phonetics, what exactly do you mean when you say that “the vowel formants change partway through these sounds”, and how does that make analyzing diphthongs more challenging?

Comments are closed.