Skip to Main Content

There’s a growing consensus among physicians and government regulators that pulse oximeters measure oxygen levels less accurately in patients with darker skin and need to be fixed.

There’s another problem, however, that needs to be fixed first. Much of the work and research to understand the devices’ shortcomings and devise solutions is focused on race. But the issue with pulse oximeters is not one of race — it’s very clearly one of skin tone. The light used in the devices to detect oxygenated blood can be blocked by melanin in the skin.

advertisement

You might think the two are the same, or at least so similar as not to matter. Many do. Indeed, it was largely race and not skin pigment that was discussed when a Food and Drug Administration panel met last month to advise the agency on what is needed to improve the devices.

And race has long been a proxy for skin tone in research studies because it’s something that’s recorded in both medical and census records, while skin tone is not. The medical studies that have come out highlighting that the devices missed dangerously low oxygen levels in patients with darker skin used race as well.

But the two are very different. People who are Black can have a huge variety of skin tones, ranging from very dark to very light. And some people who are Asian, Hispanic, or Indigenous have darker skin than people who are Black.

This can seem obvious. Yet there hasn’t been a good way to characterize these differences in skin tone in medical research, especially for those whose skin is of darker shades. It’s something Ellis Monk wants to fix. “In order to do the foundational research that we’re going to need to get this right … to get pulse oximetry that works for everyone, we’re going to need to think very deeply about skin tone,” Monk told STAT.

advertisement

An associate professor of sociology at Harvard, Monk has spent more than a decade studying colorism — a form of discrimination based on skin color that tends to favor lighter-skinned people over darker-skinned people. He’s among social scientists who have published a stream of studies showing that skin color, not just race, is a major factor in health and other disparities. People with darker skin are more likely to receive the death penalty, earn less, and have poorer health.

But as with pulse oximeters, Monk says a limitation in colorism research has been the lack of a reliable way to measure skin tone. So, working with Google, he developed the Monk Skin Tone Scale, which includes a fuller range of darker skin tones than the cruder tools currently used. Such a scale, he said, is essential to ensure pulse oximeters — and many other technologies and medical devices — work equally well for all people.

Monk’s scale, with its broader range of standardized colors, has advantages over other scales, said Michael Lipnick, an associate professor of anesthesia at the University of California, San Francisco, and an investigator at UCSF’s Hypoxia Lab. The lab has started the Open Oximetry Project, which is working to help the FDA and other stakeholders to assess the limitations of pulse oximeters. Other scales, he said, “leave too much room for subjectivity and may not adequately account for color at the site of pulse oximeter measurement.”

The Monk Skin Tone Scale has 10 shades, including six to represent medium and darker shades. Courtesy Ellis Monk

The scale most commonly used in labs is the Fitzpatrick scale, one that by its very reason for existence is skewed to assess lighter skin. The widely used scale was developed by Harvard dermatologist Thomas B. Fitzpatrick in 1975 to assess both sunburn risk and the risk of skin damage during medical treatments with UV light for conditions like psoriasis or eczema.

Because lighter skin has less melanin to filter out harmful UV rays, it is considered more susceptible to damage. The original scale had just four shades, all light. It wasn’t until 10 years later that two more shades were added, one for brown skin tones and one for black skin tones — woefully inadequate to represent the almost infinite shades of skin tone in the real world. The scale is increasingly seen as inadequate for dermatology as well because it does not contain enough dark tones, implies that darker skin doesn’t burn, and is often used by physicians to conflate skin color and race or ethnicity.

But because it was there, the Fitzpatrick scale became the de facto standard for engineers and researchers who needed to measure skin tone, Monk said. It’s also been the basis for the six skin colors used in emojis and the standard used in developing machine learning algorithms for a range of technologies. The paucity of skin tones used in machine learning has become abundantly clear in work from such scholars as MIT’s Joy Buolamwini and Princeton’s Ruha Benjamin, who have pointed out racist algorithms that lead automatic light switches to stay off when people with darker skin walk into a room, faucets to stay dry when darker-skinned hands are placed beneath them, and self-driving cars to not detect and stop for people with darker skin.

“To get pulse oximetry that works for everyone, we’re going to need to think very deeply about skin tone.”

Ellis Monk, Harvard associate professor of sociology

“That’s computer vision using light to sense whether there’s a hand there. With dark skin, not enough light came back to the sensor. That means they didn’t test whether their sensor worked with enough skin colors,” Monk said.

Monk’s scale has 10 shades compared to Fitzpatrick’s six. Both scales contain four swatches of light skin shades, but Monk’s has six to represent medium and darker shades. The scale, he believes, is a sweet spot between too few and too many shades, and was developed based on his work on colorism in two countries that have populations that are highly racially mixed, the U.S. and Brazil. (Another scale, the Massey-Martin scale developed for use in immigrant surveys in 2003, has 10 shades, but did not take hold widely in research labs and has been criticized by some because the darker shades are too similar.)

Ten shades may not seem enough: Some racially aware cosmetics companies offer hundreds of shades to customers choosing foundation, and Crayola now offers 24 skin-tone crayons. But in medical research, an exact match is not as important as practicality. “You can’t have more than 10 or 12, or at a certain point trying to pick out differences gets really hard to do,” Monk said. The scale, he said, involved “making some hard choices because no scale, even one with 150 points, can represent every skin tone out there.”

For those developing improved pulse oximeters, a more diverse scale can help determine how well the devices work on people with a range of skin colors by allowing more precise ratings of the skin color of test subjects. It could also facilitate the creation of guidelines that require manufacturers to test their devices on a range of skin tones, including those that are very dark.

Current FDA guidelines for pulse oximeter approval state merely that two “darkly pigmented” subjects must be included in testing. “Two darkly pigmented people? You can interpret that however you want,” said Grace Wickerson, a policy entrepreneurship fellow at the Federation of American Scientists who has been pushing for stronger regulation of medical devices, more diversity in populations that are tested, and more objective measures of skin tone such as Monk’s scale. “This is a scale that’s about skin pigmentation. It’s not a scale that’s about UV exposure,” Wickerson said.

Monk is teaming with Robert H. Wilson, a physicist and optics expert at the University of California, Irvine, to develop a better device, and was recently awarded a $2.5 million NIH Director’s New Innovator Award to assess and try to fix biases in the algorithms used in pulse oximeters. That grant will also fund a longitudinal survey to examine how skin tone, colorism, and social stress affect mental and physical health among Black Americans.

Many device manufacturers have said that their pulse oximeters work better on darker-skinned test subjects than the recent medical studies conducted on hospitalized patients suggest. This could be because the devices worked better in idealized lab conditions than in the real world, but could also be, Monk said, because researchers using color scales with few choices found it easy to rate subjects as having darker skin than they actually do.

Another way to measure skin color would be to use highly precise devices such as spectrophotometers. But these machines never took hold in dermatology offices because they are expensive and inconvenient, and may, ironically, be less accurate than simple paper or digital color scales because they’re influenced by features such as vascularity and erythema that can darken a patient’s skin tone. “In being so precise in measuring the skin, some of these objective measures actually end up bringing in confounders,” Monk said.

Monk didn’t set out to fix pulse oximeters. His project got off the ground when Google reached out to him nearly three years ago in an attempt to solve problems with its smartphone cameras, which did not work as well on darker skin; with its Google Photos app, which now includes filters to enhance images of darker skin; and with search algorithms that often spit out image collections that only include lighter-skinned people. The company apologized in 2015 when its newly released Photos app labeled Black people as gorillas.

In a series of product updates, the company said it was using Monk’s skin tone scale to “better understand representation in imagery, as well as evaluate whether a product or feature works well across a range of skin tones,” something critically important for computer vision work.

The scale involved “making some hard choices because no scale, even one with 150 points, can represent every skin tone out there.”

Ellis Monk, Harvard University

Google engineers say their collaboration with Monk has already resulted in more diverse imagery when running a search such as “bridal beauty looks.” Indeed, that particular search yields a number of medium-skinned and darker-skinned brides. But it’s a work in progress: In a search for “cute babies,” the first dozen babies mostly have light skin.

Monk, who has a salaried position as a visiting research professor for Google, said he has been impressed with how open Google has been in admitting its past mistakes and how quickly Google has moved to improve the use of skin tone within the company. “I know a lot of people have a mistrust of business and I get it, I do, but I see this as a hugely positive project,” he said. Google has made the scale open-source so anyone can use it.

Monk has been working for years to understand the impact of colorism on health. It’s been a hard topic to study, and often a taboo one because it is tied historically to the sexual violence of white slave owners against Black women and also because the issue of colorism is not only one of tension between people who are white and those with darker skin, but also an issue within Black, Hispanic, and Asian communities.

For decades, many Black churches and social clubs employed something known as the “paper bag test” in which people whose skin was darker than the bag were denied entry. The global skin-lightening cream industry takes in more than $7 billion a year. But this within-race colorism is something many don’t want to acknowledge, said Monk, who describes the issue of colorism as “a complicated and unpalatable target” and a “blind spot in our civil rights framework.” In academic papers, he calls the U.S. “a pigmentocracy.”

Early in his career, Monk worked on colorism with colleagues conducting the National Social Life, Health, and Aging Project, as an assistant professor of sociology at the University of Chicago. That work was stymied by the lack of a skin color scale. “They said, ‘We want to do this, but we need a better scale of skin tone,’” he said. The group there now uses Monk’s scale in their work.

Many people first learned that pulse oximeters were less accurate for people with darker skin during the Covid-19 pandemic, when the devices became indispensable for determining who might need hospitalization or supplemental oxygen. But the fact that the devices he’s now trying to help fix didn’t always work well on people with darker skin was no surprise to Monk. “My mother had a lung condition,” he said. “So I knew they were problematic.”

This is part of a series of articles exploring racism in health and medicine that is funded by a grant from the Commonwealth Fund.

Get your daily dose of health and medicine every weekday with STAT’s free newsletter Morning Rounds. Sign up here.

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.