“Voice of a 3,000-year-old Egyptian mummy reproduced by 3-D printing a vocal tract.”No, that’s not the plot to a cheesy horror movie on the Sy-Fy channel. It’s the headline of a real news article I recently read online.
Headlines can often be misleading, sometimes to the degree that they directly contradict the story, but this talking mummy headline was a bit different. The story actually seemed to deliver what the headline had promised. The twist this time was that they were both wrong.
I’m always amazed at the discoveries that are being made in the realm of science and technology, but this headline made a pretty wild claim, even in the 21st century. We could end this right here by simply acknowledging that a 3,000-year-old mummy (or even a fresh mummy for that matter) cannot utter a sound, so dead silence would be the only accurate reproduction of its ‘voice.’
As I read the article, I began to realize that the facts didn’t quite add up to support the headline’s claim. As suspected, the mummy did not speak.
The article examined a research project that was the brainchild of David Howard, professor of electrical engineering at the University of London. In his published findings, he did indeed claim that he had reproduced the voice of a mummy.
But did he?
Several news outlets reported on the project as if it were some major scientific breakthrough. Each of them repeated the researchers’ claims without question. Am I the only one who saw through the flimsy linen that wrapped Howard’s bold—and published—assertions?
The mummy in question was the 3,000-year-old remains of the Egyptian priest, Nesyamun, who now resides in England and is on display at the Leeds City Museum.
Nesyamun lived in Thebes (now Luxor) in southern Egypt during the volatile reign of Pharaoh Ramses XI. Ramses XI ruled his kingdom for nearly 30 years during very troubled times marked by civil wars.
For historical reference, Cleopatra lived about a thousand years later than Ramses XI, and King Tutankhamun’s reign was about 300 years earlier. The last of the great pyramids was completed over a thousand years before Ramses XI’s reign, so none of these rulers had pyramids built in their honor. Like the majority of Egypt’s ancient rulers, they were laid to rest in underground tombs.
Nesyamun’s official religious services for Ramses XI would have been very important to the Pharaoh and his kingdom, important enough to have earned him the deluxe mummification and entombment package upon his death. Although mummification was desired by everyone to assure their comfort in the afterlife, not everyone could afford it, and only the rich and important people had access to the top-shelf preservatives.Nesyamun died in 1069 B.C., and it’s interesting to note that an inscription on his tomb states his desire to have a voice in the afterlife. Was that wish granted with Howard’s project?
Professor Howard had heard about a process that researchers were using on living people, where their vocal tracts were CT-scanned and then recreated in plastic. He apparently thought it would be fun to try it on a mummy. The results of his frivolous dalliance from genuine science were published in the journal, Scientific Reports, which apparently is the National Enquirer of the science world.
As it turns out, contrary to the far-fetched headline and his own claims, Professor Howard admits he had attempted only to simulate —or synthesize—the mummy’s voice, with the eventual goal of reproducing the priest’s voice as it would have sounded while he was living.That is very different—but still quite impossible. It would be unverifiable as well; there are no recordings, and no one is still living who has heard the priest’s voice.
As we shall see, Nesyamun’s dream of a voice in the afterlife has not yet come true, at least not through this project. The researchers reproduced neither the mummy’s voice nor the priest’s voice.
Here is how this all unfolded, from ill-conceived idea, through shallow research, sloppy experimentation, publication of an unworthy paper, all the way to the story that I read. And of course, inevitable and unavoidable, my take on the whole sordid affair.
For this project, Professor Howard’s team subjected Nesyamun’s mummy to a CT scan. Using a computer analysis of the results of that scan, the researchers determined all the measurements they would need to map out the structures of the mummy’s vocal tract. Using 3-D printing technology, they were able to produce an accurate, albeit static, plastic model.
The vocal tract is a passageway that travels from the larynx up through the throat and mouth, over the tongue, past the gums and teeth, and finally through the lips. All those different structures along the route alter the sound as it passes through them.
While those structures along the vocal tract are essential in shaping the sound into a recognizable human voice, they do not create the sound.
For whatever reason, Howard’s model did not include the larynx—the voice box—the organ where the actual sound of the human voice originates. Since their vocal tract model was useless without a voice box to push sound through it, it was necessary to attach a sound source. That source came in the form of an artificial larynx. It was that sound that was then projected through their 3-D-printed vocal tract to lend the mummy a voice.The artificial larynx has been around for about 100 years. It’s a small, hand-held device that enables people who have lost their voice boxes through disease or injury to have a voice. Early devices were simple mechanical buzzers. Many improvements were made over the decades. Modern devices are electronic and much more sophisticated in their functions.
A century later, with all those improvements, it’s still essentially just a buzzer. You hold it against your throat and it buzzes while you make word shapes with your mouth. It does enable ‘speech,’ but it doesn’t sound much like a human voice, at least not the humans I know. The artificial larynx makes the same sound you get after an incorrect response on a game show. That seems strangely appropriate here. There are no parting gifts, but thanks for playing.
Professor Howard states: “The team were able to accurately reproduce a single sound, which sounds a bit like a long, exasperated "meh" without the "m."
There is a lot wrong in just that one, short sentence.
Note the professor’s use of the word, “accurately.” Without ever having heard the priest’s actual voice, how does one determine accuracy? And to be accurate also assumes they were shooting for that specific sound, when in fact they had no way of knowing beforehand what sound would emerge.
The sound of the buzzer was shaped into “eh” as it passed through the vocal tract model. If they had used any other sound source, it too would have been shaped by the model into the same vowel sound. The priest’s vocal tract assumed the “eh” position during 3,000 years of decay. It could have been any vowel sound (consonants require moving parts). It was dependent upon how he was positioned when and after he died, and how the tissues randomly dried, shriveled and decayed over the centuries.
“Long.” The length of the sound has nothing to do with the vocal tract. In life, the length of the sound would have been determined by Nesyamun himself. The duration of the model’s sound was determined by how long the researcher held down the button on the buzzer.
“Exasperated.” That is a description of an emotion. There is no emotion in plastic parts, no matter how accurately they were modeled. There is no emotion that can be squeezed from electronic buzzers. Any hint of exasperation in the sound was an emotional interpretation by the researchers. It did not come from the model.
‘’…meh’ without the ‘m.’’ If the “m” was not there, why mention it? It’s not like ‘meh’ is an actual word that helps you identify the sound. Perhaps I’m just being picky on this one, but it’s the little things that differentiate between an objective, professional analysis and, well…this.
So the team recorded an “eh” sound. Good job.
The article provided a link to an audio clip. Do you recall the default electronic message on the old telephone answering machines? It sounded like an old man speaking from just below the surface of his bubble bath. “Please…record…a message.” The researchers’ “eh” sounded like the “e” in “message.”
I imagined the sound to be Nesyamun’s reaction after his wife told him to go out and sweep the pyramid after the last sand storm. “eh…”
The paper’s authors made other dubious claims, including this one, where they gave themselves credit for what would have been a profound achievement: “The synthesis of his vocal function allows us to make direct contact with ancient Egypt by listening to a sound from a vocal tract that has not been heard for over 3,000 years.” Wow! It sounds lofty. It’s just not true.
The real synthesis here was that overreaching statement. The sound they coaxed from their model was never heard 3,000 years ago. It is a brand new, synthetic sound. It has no relationship to the living Nesyamun.
No one knows what Nesyamun’s voice sounded like. There is no way for them to determine if they were anywhere close to replicating it. Ancient answering machine messages aside, there are no recordings. It truly is an impossible undertaking. Perhaps more importantly, science has nothing to gain.
Over the next few pages we’ll take an educational stroll through some basic science and try to shed some light on the subject. We’ll share a physiology lesson or two, and I will offer a few simple experiments you can conduct yourself, while you’re reading. No lab equipment is required. These experiments will help you understand what it takes to produce an individual human voice. You’ll have a greater understanding of physiology than the researchers demonstrated. You will be doing real science.
I will also present a couple hypothetical experiments. Those experiments will never need to be conducted, as just reading the premise will reveal the folly of Professor Howard’s methods and conclusions.
The starting point in all of this was a mummy. In this case, it was the poorly preserved remains of a priest that died over 3,000 years ago. Several structures in the vocal pathway were missing, notably the tongue and lips, and what was left was in varying stages of decomposition. Even with the deluxe Egyptian burial package, 3,000 years is a long, long time.
So, right from the start, the model that the researchers made was based on incomplete and inaccurate data. The larynx itself was not part of the model, nor were the lungs. The larynx, along with the air that’s being pushed through it by the lungs, is responsible for the actual sound that gets produced, while the rest of the vocal tract simply serves to shape that sound into what we recognize as a human voice and, of course, speech.
Also overlooked by the researchers was the considerable contribution to the sound of a voice that’s made by the nose, and indeed the entire nasal cavity. You have likely heard how a stuffy nose can dramatically alter a person’s voice. A fully functioning nasal cavity adds rich resonance to a voice. That organ was not represented at all. Its resonance and sonic characteristics, unique to each of us, were absent from the model and the recording.
If you want to reproduce an individual’s voice, you need to accurately reproduce all the structures that contribute to that voice, as each of those parts is unique to that individual. The vocal tract is but one small part.
A whole complex of seemingly unrelated resonating structures was absent from the model. When you’re speaking, everything in your upper body vibrates—head, neck and chest—and that translates to audible resonation that makes a huge contribution to the overall sound of the voice. Place your hand over your heart and read the next couple sentences aloud. Can you feel the vibration? That resonance is part of the sound of your voice.
Your voice did not make the vibrations; the vibrations made your voice. All those vibrations are the very essence of the sound of your voice. They are why you sound like you.
You may not have realized it, but we just conducted our first experiment…while you were reading. It was more of a demonstration, but we easily verified the presence of those overlooked vibrations. That’s a lesson for the research team: do the simple stuff first.
Summing up Professor Howard’s project, a sound was artificially generated with an electronic larynx—a buzzer. That sound was projected through a 3-D plastic model that was made with the CT-scanned measurements from a small portion of the mummy’s badly decomposed vocal tract, without any of the associated tissues and structures that are essential to the character of a voice. They got “eh…”
Compelling!
In humans, the larynx—our voice box—houses the vocal cords. That’s where the voice originates, and like any other human trait, it is different in everyone, so everyone’s voice box generates a different sound. A real larynx is much more complex than even the most advanced buzzer. It’s also highly personalized, and unique to each person.
Some people have deep voices, some high-pitched, some smooth, some raspy. Those qualities and others originate in each person’s larynx. It’s also worth remembering that our voices change significantly as we age. Imagine the changes that can occur in 3,000 years!
The scientists did not have the Priest’s larynx. They did not have the Priest’s lungs. And I think it is safe to say the priest did not have an electronic larynx over 3 millennia ago. The most important organ in the vocal pathway, maybe the single most defining physical element of the voice’s sound was missing.
Even if the 3-D model were an accurate model of healthy, living tissue, it failed because it was constructed of hard plastic, the same plastic that Legos are made of. If you play your buzzer through a rigid, hard plastic box, it will sound completely different than when played through a box made of soft, moist, flexible tissues. And if you minutely alter any component of the pathway—size, shape, texture—it alters the sound. That’s how we make different sounds for speech.
Also, when you speak, air from the lungs is moving through the entire pathway, escaping through the mouth and nose. Unlike a buzzer, the vocal cords do not vibrate without that air moving past them.
To demonstrate this, let’s try another little experiment. Start with the “eh” sound the researchers got by playing the game show buzzer through their Lego model. While saying the sound, close your mouth. You will hear the sound change to a hum. It is not the “eh” sound any longer. That’s because the vocal tract has changed, and there is no more air coming out of your mouth.
But you’re still making a sound, so the air is still moving. Where is it going?
While still humming, pinch your nose. What happens? That’s right, the humming stops.
You may not even have noticed the small flow of air coming out of your mouth while it was open, but that airflow shifted to the nose when you closed your mouth. It had to go somewhere. Then, when you pinched your nose, you cut off the airflow completely. With no airflow to stimulate the vocal cords, it is virtually impossible to make a vocal sound.
In a human voice, air from the lungs stimulates the vocal cords in the larynx. With both the mouth and nose closed, no air can move through the vocal tract, so no sound can be made.
Blocking the breath’s main exit made a big difference in the sound. More subtle changes anywhere along the pathway make more subtle differences.
In the researcher’s model, there was no breath being pushed through the fake larynx or any part of the vocal tract model. The artificial larynx vibrated to make a sound, but no air was being expelled. That’s another variable that affects the sound of a person’s voice: the actual sound of the breath itself.The sound of a human voice is determined by, among other variables, the complex interactions of many components of organic tissue—live, flexible, moving tissue. Minute changes to the shapes of the various parts, all working in harmony, produce the human voice. Every little shape and every little change to those shapes make audible differences.
The audio clip of the recording the researchers made features the “eh” sound falling in pitch over its very short duration. In a human voice, that falling pitch is the result of many structures working in concert. The muscles in the larynx stretch or relax the vocal cords to change pitch, but there are also changes to the shape of the throat and other structures that contribute.
Here is one more experiment you can do to demonstrate this function. Make a continuous “eh” sound. Start with a high pitch, and then lower the pitch as you continue. You can feel the vibration at the top of your throat when you start. As the pitch lowers, you can feel the throat opening wider as the vibration moves back down your throat. If you take the note low enough, you will feel the sound resonating deep down in your chest. That resonation is audible. It contributes to the voice, and it was totally missing from the model.
And just try talking without a movable jaw!
The researchers’ model allowed no movement of any of those structures. It didn’t even have some of those structures. That change in pitch the model demonstrated was accomplished solely with the artificial larynx.
There are 7 billion people on the planet right now, and every single one of them has a different-sounding voice resulting from individual differences—large and small—in all these structures. There have been billions of people who lived and died before us, and each of them had unique voices. Many are very similar, but there are differences, no matter how small.
What are the chances that a scientist can get every variable exactly right to accurately reproduce the sound of a specific person’s unique voice? And if that person spoke his last word over 3,000 years ago, how would they ever know if it was accurate? Cue the game show buzzer.
Professor Howard’s team modeled a small fraction of the parts they would need to accurately reproduce a specific person’s voice. Most of the parts they did have were modeled from inaccurate samples. They failed to account for hundreds of other vital variables.
Compounding their failure, the researchers suggested that they wanted to use the artificial voice to record the priest singing one of his worship chants so we could hear how he sounded 3,000 years ago. That goes well beyond ridiculous. You can randomly choose anyone alive today to sing the chants, and even accounting for all the personal differences, it will be closer to sounding like Nesyamun than any buzzer-actuated, 3-D printed, Lego-plastic parts.
Steve Martin’s rendition of “King Tut” was closer than they’ll ever get.
Professor Howard might just as well have tried to build a complete, functional eyeball using only the information from a pencil sketch of an eyelash.
If the team had conducted some simple experiments first—like the ones you just performed—they could have saved lots of time and money. If you’re trying to prove something with an experiment, it helps to know if your experimental methods work in a situation where you already know the expected outcome. Instead, we have a short audio recording, just a novelty, and whatever the researchers did manage to accomplish can never be verified.
In that spirit, I would like to propose a verifiable experiment. This is going to sound gruesome. You’ll need some tools the researchers didn’t use, a pick and shovel at the very least, and you may need a permit. But again, it would be verifiable.
For this experiment, you will dig up Frank Sinatra. For obvious reasons, this part should be done under a full moon, on a cool evening with a low, ground-hugging fog. Don’t forget the tarp, trench coat and trilby. You’ll want to be fashionable. You’re not a grave robber after all. You’re just borrowing a corpse.
Once safely back in the lab with Sinatra in tow, you will replicate his vocal tract using the CT scanning and 3-D printing technology.
Connect your buzzer to the model you produce and use the resulting voice to rerecord one of Sinatra’s hits—your choice. Compare it to the original recording. You have a lot invested in this, and you really want it to be a success, but be honest. Can you tell the difference?
Of course you can, because there is only one vowel. You will need many static models to represent each of the myriad sounds a human can make, or one accurate, highly controllable, flexible model.
This is relatively new ‘science,’ so let’s play along and pretend that researchers have developed all the necessary technology. They have the exact measurements. They have the proper materials. And most importantly, they have respectfully replanted Frank Sinatra.
They have built a perfect replica of Sinatra’s vocal tract, including functional lungs, larynx, throat, mouth, tongue, teeth, lips, jaw and nasal airways. It’s all housed in an accurate model of the head, neck and chest. Everything that vibrates and resonates, everything that moves, everything that physically contributes even minutely to the sound of the voice is included.
It still doesn’t sound like Frankie. Who’s surprised? It’s disappointing, but not unexpected. They are obviously still missing something. Again, that isn’t surprising. There’s a lot going on with a voice.
But this is exactly what professor Howard hoped to achieve: record the authentic sound of Nesyamun’s living voice singing, using a decayed corpse’s vocal tract.
Where do we go from here? Professor Howard’s team seems clueless to the immense hurdles they face. They will surely forge ahead to the inevitable dead end.
There is at least one variable we have not yet touched upon, and it’s another of the many variables the researchers completely ignored. It’s an aspect that affects the character of his voice as much as any other.
For an analogy of a vocal tract, let’s look at a saxophone and maybe we can discover what went wrong.
The saxophone’s voice box, the reed, vibrates as air from the player’s lungs pushes by it. That vibration can be heard as sound. As that sound passes through the horn it is shaped by the brass material, the bends of the horn, and the holes and valves. The entire length and architecture of the sax is the vocal tract.But it will not play itself. It needs a musician’s lungs and fingers.
That saxophone, played by John Coltrane or Charlie Parker, would make some sweet music. Played by me, that same sax’s sound would not even qualify as music. Why not? It’s the same saxophone, and I’m a musician with lungs and fingers.
Similarly, our high-tech Frank Sinatra vocal tract is as perfect a reproduction as science and technology permits. So what is it missing?
Frank Sinatra’s brain. We modeled the physical structure of his brain as part of our model simply for the resonance, so it vibrates as it would in life, but it doesn’t function. It is not an operating brain. It is not Frank Sinatra’s living brain.
I stated earlier that the actual sound of the voice originates in the voice box, but the electrical signal that tells the lungs, voice box and all the regions of the vocal tract what to do originates in and is processed by the brain.
Beyond all the physical conditions we have studied, the brain is the organ that actually controls the muscles and directs the movements that are responsible for making the sound and shaping it on its way through the vocal tract. I chose Frank Sinatra specifically to illustrate this point. He had a very distinctive way of phrasing his lyrics. He sang a song the way he felt it. And he felt it differently every time. The brain is responsible for that feeling.
Simply put, my brain controls my speech differently than your brain controls your speech. A human brain controls a voice with finesse. It can produce a gentle whisper, a blood-curdling scream, and everything in between. It controls a voice with intellect and with feeling, with seriousness, and humor.
It controls with soul!
To complicate things even more, those feelings and emotions, already unique to each of us, change second by second, thought by thought, phrase by phrase. Frank’s vocal inflections would have been just a bit different at each night’s performance. It’s what made him Frank. It’s what makes your voice uniquely yours.
It’s also why my sax performance will never sound like Charlie Parker’s. Well, that and I’m a drummer. That’s why, without Sinatra’s brain controlling it, even his own vocal tract will not sound like Sinatra.
With that in mind, there is one more experiment that we need to consider. It’s a bit less grizzly than our moonlight date with Frank Sinatra. It would best be performed in a dank castle against the backdrop of a small, 19th century European village. This experiment should also be performed at night. The angry villagers’ torches and pitchforks appear much more menacing in the dark.
In this experiment, Professor Howard and I will exchange our brains. We will surgically place his brain into my head, and mine into his.
Will my voice sound the same with Professor Howard’s brain controlling it? It should be similar, tonally, because all the physical vocal parts are still there. But with his brain now in control, the first thing you’ll probably notice is Howard’s English accent, taking my American vocal tract to places it has never been. His vocal inflections and vocabulary are completely different than mine. The way he puts thoughts to words is different. He’ll say, “the team are” instead of “the team is.” It does not sound like me.
How will Professor Howard’s voice sound with my brain controlling it? My brain doesn’t know how to sound British, so he won’t have that accent. And it would not sound like me, either, except for that familiar ring of cynicism.
Without even performing the experiment, it is evident that there would be two brand new voices resulting from the two new unique combinations of parts. Our original voices would no longer exist.
Scientists will never be able to get even close to replicating what that old dead Egyptian guy sounded like. The sound and character of a human voice results from the combination of a unique system of physical parts, controlled by a unique brain. It takes both body and soul. The scientists will never have either of those essential qualities of Nesyamun’s voice. With only the information from a mummy that has suffered the ravages of time, an accurate physical model is impossible.
And if it were possible, where are we going to get a copy of his soul?
All in all, it really doesn’t matter what Nesyamun sounded like. This was a novelty project that was started on a whim. It’s vanity research. I’m sure they had fun, but there is no value to archaeology. It isn’t science, at least not good science. I’m glad it’s being done in England and it’s not my tax dollars funding it.
It doesn’t appear that the research team had any idea how to conduct research or a meaningful experiment. That’s what happens when you allow an electrical engineer into the archaeology lab and give him free rein to perform experiments with expensive equipment in areas he does not understand.
He should stick to changing light bulbs and repairing toasters.
He and his team spent hundreds of thousands of dollars on the CT scan, Lego model and researchers’ salaries. For what? It brought them no closer to the mummy’s voice. It brought them no closer to Nesyamun’s voice. What they have achieved is nothing more than electronic ventriloquism, and we can see professor Howard’s lips moving.
Who really cares, anyway? I would be much more interested in what that priest had to say than how his voice sounded. Family relationships, culture, government, current events—those things were all very different in his time. What did he talk about in everyday conversations with his peers? Healthcare? Taxes? Did he complain about his boss, the Pharaoh?
Better yet, if he were alive today what would that old priest Nesyamun have to say?
Old Egyptian priest walks into a bar. He sits down and quietly enjoys a beer while watching the TV news. Finally, he speaks. “Holy dung beetle! This is your president? And I thought Pharaoh was an asshole!”
Scott Wright © 2020