The US of Bey

May 20, 2018 00:00 · 3064 words · 15 minute read R data tidying stringr dataviz ggplot gganimate maps beyonce

We stan a legend.


Beyoncé Knowles-Carter is an icon who needs no introduction. In her 36 years, she has become the top selling artist of the 21st century, the most-nominated woman in Grammy history, and the highest-paid black artist of all time. Time put her on the cover of its 100 most influential people issue, effectively crowning her the Most Influential Person. Other incredibly famous performers LOSE IT when they see her. Millionaire pop stars go nuts just watching her on TV. Even her hair adoringly does her bidding. (I’ve been convinced she owns a free-range ponytail farm in a country that only rich people know about, where Blue Ivy hand-feeds the ponytails organic apples until Bey has to select one for a gala or performance.) She has surpassed fame.

iconic.


Like other legends (Martin Luther, Barack Obama, Arya Stark, etc.), Queen Bey’s prominance has spawned a wave of namesakes in the United States. Mrs. Knowles-Carter needs no last name to be recognized - But what of the Beyoncés who do? Where are they? Is the name getting more popular as her fame grows? What’s it like to be named Beyoncé in high school in 2015?

Thanks to the babynames package in R - along with some additional open data from the Social Security Administration - it’s pretty easy to answer the first few questions. For the last, we have Humans of New York.

This post looks at baby name trends in the US, focusing on babies named after Queen Bey. It includes all the data and code necessary to generate these graphs and maps. Note that many of the plots use themes from my own bundle of ggplot convenience functions on Github, and will not work without running devtools::install_github("brooke-watson/bplots") and installing the Raleway font from Google. However, these functions (theme_raleway() and theme_fancy(), for example) can be omitted or approximated with other calls to theme().

library(tidyverse) 
library(janitor)
library(stringr)
library(here)
library(scales)
library(babynames)
library(bplots) # devtools::install_github("brooke-watson/bplots")
library(viridis)
library(gganimate) # devtools::install_github("dgrtwo/gganimate")

About the data

The United States Social Security Administration releases yearly summaries of baby names reported on Social Security card applications, sorted by sex, dating back to 1910. The data is aggregated at both the national and state level. To safeguard privacy, the SSA restricts the list of names to those with at least 5 occurrences per year, per sex. At the state level, these same restrictions apply, so if less than 5 (but more than 0) names of either sex are reported in a given year and state, the sum of the state totals will be less than the national total.

This is why no Beyoncés show up in the data in 1981, the year Queen Bey was born.

Where they at? Where they at?
data(babynames)
yonces <- babynames %>%
  filter(str_detect(name, "yonce") == TRUE) 
rm(babynames)
knitr::kable(head(yonces))
year sex name n prop
1998 F Beyonce 18 9.30e-06
1999 F Beyonce 67 3.44e-05
2000 F Beyonce 197 9.88e-05
2000 F Keyonce 7 3.50e-06
2000 F Deyonce 6 3.00e-06
2000 F Beyoncee 5 2.50e-06

It’s possible that Beyoncé was not the first of her name - all we know is that there are no recorded years before she was born in which more than 5 baby Beyoncés were born in the US. While some sparsely populated Yoncés could have trickled, unaggregated, through Social Security records throughout the 20th century, the first sizable cohort - 18 Beyoncés - was born in 1998, when Queen Bey was only 17 years old. This is the year of Destiny’s Child’s first self-titled album, and the first year that this girl group explodes onto the pop music scene.

wrap90 <- wrap_format(90)
wrap95 <- wrap_format(95)

g1 <- yonces %>% 
    filter(name == "Beyonce") %>% 
    ggplot(., aes(x = year, y = n)) +
    geom_line(aes(col = "#e41a1c")) + 
    ggtitle("Birth of a Beyhive") +
    labs(subtitle = "Children named Beyoncé born in the US, 1910-2015", 
         caption = wrap95("Data from the Social Security Administration. To protect privacy and prevent identification, if a name has less than 5 occurrences for a year of birth, those names are not recorded. Thus, Beyoncé Knowles' birth (in 1981) is not actually included in the public SSA data."), y = "Annual Beyoncés born in the US") +
  annotate("text", x = 1965, y = 30, label = "Beyoncé Knowles", family = "Raleway") +
    theme_raleway() + 
    theme_fancy() + 
  xlim(1910, 2015) + 
  geom_point(aes(x = 1981, y = 1, col = "#e41a1c"))  + 
  guides(col = F)

ggsave(filename = here::here("content/figs/2018-05-20-the-us-of-bey/beyhive.png"), width = 8, height = 6)

The peak of the Beyoncé name wave came in 2001, followed by a much smaller splintering of Alternative Yoncés in the early 2000s. How does this map onto Queen Bey’s discography?

album_join <-  data_frame(name = "Beyonce", year = c(1998, 1999, 2001, 2003, 2006, 2008, 2011, 2013), album = c("Destiny's Child", "The Writing's on the Wall", "Survivor", "Dangerously in Love", "B'Day", "I Am... Sasha Fierce", "4", "Beyoncé"))

yonces <- 
  left_join(yonces,album_join, by = c("name", "year") )

# add album annotations 
g2 <- yonces %>%  
  select(-sex, -prop) %>% 
   ggplot(., aes(x = year, y = n)) +
    geom_line(aes(col = name)) + 
    xlim(1995, 2015) + 
    ggtitle("Destiny's Children") +
    # labs(subtitle = "Children born in the US, 1995-2015") + 
    scale_color_manual(values = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628", "#f781bf", "#999999", "#999999")) +
    theme_raleway() + 
    theme_fancy() +   
  geom_point(data = filter(yonces, !is.na(album)), aes(col = NULL, label = album)) + 
        ggrepel::geom_label_repel(aes(x = year, y = n, label = album), 
                                 family = "Raleway") + 
    labs(subtitle = "Children born in the US whose names ended in 'yonce' \n vs Destiny's Child and Beyoncé albums, 1995-2015", caption = wrap90("Naming a child Beyoncé reached it's peak in 2001, around the release of the Destiny's Child album 'Survivor'. 2000 saw the first cohort of Alternative Yoncés."), y = "Annual Yoncés born in the US")

ggsave(filename = here::here("content/figs/2018-05-20-the-us-of-bey/albums.png"), width = 8, height = 6)

Somewhat surprisingly, the trend of naming a child Beyoncé peaked quickly in 2001, around the release of the Destiny’s Child album Survivor. This was before her solo career began, let alone before she took over the world. She’s only grown in prominence since then - why the decrease in BaBeys1?

Perhaps in 2001, it was easier to think of “Beyoncé” as merely a beautiful, unique name that was growing in familiarity on local radios - much like the names Belcardis or Kehlani may be today. Naming a child Beyoncé in the year of Beychella seems like a lot of pressure. We likely haven’t seen the last Beyoncé baby, but the wave hasn’t continued to swell. I wouldn’t be surprised if we start to see spikes in babies named Blue, Rumi, and Sir in its stead.

young icon.

Blue Ivy Carter, the world’s most powerful six-year-old, politely silencing two of the most successful Americans alive.

Alternative Yoncés

By 2001, names that rhyme with Beyoncé were also picking up steam. They get massively overshadowed in the larger graph, but there are quite a few of these Alternative Yoncés. and the cohorts persist over several years before puttering out around 2007. Which ones are most popular?

g1b <- yonces %>% 
    filter(name != "Beyonce") %>% 
    ggplot(., aes(x = reorder(name, n), y = n, fill = cumsum(n), frame = year)) +
    geom_col(aes(cumulative = TRUE), width = .2) +
    coord_flip() + 
    ggtitle("Cumulative Growth of Alternative Yoncés: ") +
    scale_fill_viridis(option = "C") +  
    theme_raleway(plot_title_size = 26, base_size = 16, axis_text_size = 16, strip_text_size = 16) + 
    theme_fancy() +
    theme(axis.title.y = element_blank()) + 
    guides(fill=guide_legend(title="Total"))

gganimate::gganimate(g1b, "../figs/2018-05-20-the-us-of-bey/alt-yonces.gif", ani.width = 800, ani.height = 600)

Perhaps the hundreds of parents of Deyonce, Veyonce and Keyonce saw the approaching Beyoncé wave, and decided to make the set of syllables even more unique for their own newborns with a different leading consonant. Bonus points for the parents of those seven male Deyonces - innovation is born from connecting the dots that others won’t or can’t, and the bold creativity in these families is huge.

fig1 <- yonces %>% 
  group_by(name, sex) %>% 
  summarise(n = sum(n)) %>% 
  mutate(n = ifelse(sex == "F", n, -n)) 

g3 <- ggplot(fig1, aes(x = reorder(name, n), y = n, 
                      fill = sex, col = sex)) + 
  geom_point() + 
  geom_col(width = .2) +
  theme_raleway() + 
  coord_flip() +
  ylim(-200, 2500) +
  geom_text(data = filter(fig1, sex == "F"), aes(reorder(name, n), y = n, label = n), hjust = -.5) +
  geom_text(data = filter(fig1, sex != "F"), aes(reorder(name, n), y = n, label = -n), hjust = 1.5) +
  labs(x = "", y = "n") + 
  ggtitle("American Yoncés, 1995-2015.") +
  labs(caption = wrap90("Data from the Social Security Administration. To protect privacy and prevent identification, if a name has less than 5 occurrences for a year of birth, those names are not recorded.")) + 
  theme_fancy()

ggsave(filename = here::here("content/figs/2018-05-20-the-us-of-bey/alt-yonces-still.png"), width = 8, height = 6)

The US of Bey

We’ve looked at how these patterns vary over time. But how do they vary spatially? Are there more Beyoncés in Texas, her home state? Did the south catch on earlier than the north, or the east before the west?

Because of the privacy protections around open and public SSA applications, some of the data gets lost when disaggregated at the state-wide level. However, there are still some name patterns that can be picked up in the peak years.

I wrapped the state level data up in a Github package, so that’s what I’ll use to find statewide patterns. It’s available for anyone to install, but be aware - the package and related dataset is pretty big (17.1 MB, well over the 5 MB CRAN limit).

if(!require(statebabynames)) {
  devtools::install_github("brooke-watson/statebabynames") } 
data("statebabynames")

# filter out just the yonces by state 
ybs <- statebabynames %>% 
    filter(str_detect(name, "yonce") == TRUE) 
unique(ybs$name) 
## [1] "Beyonce"
sort(unique(ybs$year)) 
##  [1] 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
## [15] 2013

Unsurprisingly, all of the less common variants - Keyonce, Deyonce, and Breyonce - are masked in the state-level data. There are just too few in each year in each state. However, there are enough Beyoncés left to pick up some patterns across states.

yikes
# get a data frame of all the state-year combos to fill in missing NAs, starting in 1995 and ending in 2015 to match with the earlier data 
big <- statebabynames %>% 
    select(state, year) %>% 
    unique() %>% 
    filter(year >= 1995 & year < 2016)

# get the total number of applications per year for later use 
totals <- statebabynames %>%
  group_by(state, year) %>% 
  filter(year >= 1995 & year < 2016) %>% 
  summarise(total = sum(n))

# join both to fill the empty state-year combos with NAs 
ybs <- full_join(ybs, big, by = c("state", "year"))
wrap120 = wrap_format(120)

# match state abbreviations with "region" names in the usa map
state_lookup = data_frame(state = state.abb, 
                          region = tolower(state.name))

# load state map data 
usa_map <- map_data("state") 

# full join the DFs to get NAs for all missing year-state combos
state_map <- ybs %>% 
    full_join(., state_lookup, by = "state") %>% 
    full_join(usa_map, by = "region")  

# plot the data 
g <- ggplot() + 
    geom_polygon(data = state_map, aes(fill = n, x = long, y = lat, group = group, frame = year)) +
    scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "lightgray", guide = "legend") + 
    ggtitle( "The US of Bey:" ) +
    labs(subtitle = paste0("Children named Beyonce born in each state, 1995-2015"), 
         caption = wrap120("Data from the Social Security Administration. To protect privacy, if a name has less than 5 occurrences for a year of birth in any state, those names are not recorded. The sum of the state counts is thus less than the national count.")) + 
    bplots::theme_blank(element_text(family = "Raleway")) + 
    theme(
        plot.title = element_text(size = 28, hjust = .5, 
        margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
        plot.subtitle = element_text(size = 16, hjust = .5, vjust = -2, color = "#4e4d47"), 
        axis.text = element_blank()
    ) + theme_fancy()

gganimate::gganimate(g, "../figs/2018-05-20-the-us-of-bey/USofBey.gif", ani.width = 800, ani.height = 600, interval = .5)

In state-wide trends over time, Texas pops out. It would be nice to believe that this is because Houston stans its hometown legend, but it’s most likely because of its large population size. We see California, New York, and Florida stand out as well, as well as states with large cities, like Illinois and Georgia.

To really look at the difference in rates, we have to standardize the data. To calculate Beyoncés per Million, I divided the annual number of babies named Beyoncé in each state by the state-level intercensal population estimates from census.gov.

# if the population by state data has not been generated yet, run the file that generates it 
# if it has, load the data into memory 
if(!file.exists(here::here("content/data/2018-05-20-the-us-of-bey/population_by_state.RDS"))) {
  source(here:here("content/R/2018-05-20-the-us-of-bey/get_pop_data.R"))
} else {
    nbipop <- readRDS(here::here("content/data/2018-05-20-the-us-of-bey/population_by_state.RDS"))
    }

# combine with the state pop to plot the normalized map 
nbi <- full_join(state_map, nbipop, by = c("region", "year")) %>% 
  mutate(bpm = n*1e6/pop)
g <- ggplot() + 
    geom_polygon(data = nbi, aes(fill = bpm, x = long, y = lat, group = group, frame = year)) +
    scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "lightgray", guide = "legend") + 
    ggtitle("BPM (Beyoncés per Million):") +
    labs(subtitle = paste0("Children named Beyoncé born annually per 1,000,000 residents in each state, 1995-2015"), 
         caption = wrap120("Data from the Social Security Administration. To protect privacy, if a name has less than 5 occurrences for a year of birth in any state, those names are not recorded. The sum of the state counts is thus less than the national count.")) + 
    bplots::theme_blank(element_text(family = "Raleway")) + 
    theme(
        plot.title = element_text(size = 28, hjust = .5, 
        margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
        plot.subtitle = element_text(size = 16, hjust = .5, vjust = -2, color = "#4e4d47"), 
        axis.text = element_blank()
    ) + theme_fancy()

gganimate::gganimate(g, here::here("content/figs/2018-05-20-the-us-of-bey/USofBey_normalized.gif"), ani.width = 800, ani.height = 600, interval = .5)

This metric isn’t perfect - annual births and total population are certainly correlated within a state, but there is some variance. A better denominator would be the number of babies born in each state each year as my denominator, rather than the whole population. This reflects the proportion of Bey’s in a birth cohort, rather than in the entire population BPM (Beyoncés per Million).

(Working on this in the dark at 1:30 AM on a Wednesday, I furiously but unsuccessfully Googled state-level births per year from US census data. After longer than I am proud to admit, it dawned on me that I was Doing Too Much. I went to bed.

The next morning, I realized that I had obviously had the state-level annual number of births all along, in the original Social Security Application data. Rest, people.)

yikes

Anyways, below is the state-level number of Beyoncés per both cohort.

totals <- full_join(state_lookup, totals, by = "state") %>% 
  select(-state) 

normalized <- full_join(state_map, totals, by = c("region", "year")) %>%
  mutate(percent_cohort = n*100/total) 

g <- ggplot() + 
    geom_polygon(data = normalized, aes(fill = percent_cohort, x = long, y = lat, group = group, frame = year)) +
    scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "lightgray", guide = "legend") + 
    ggtitle("Destiny's Children: Beyoncés per Birth Cohort, ") +
    labs(subtitle = paste0("Percent of children born annually in each state who were named Beyoncé, 1995-2015"), 
         caption = wrap120("Data from the Social Security Administration. To protect privacy, if a name has less than 5 occurrences for a year of birth in any state, those names are not recorded. The sum of the state counts is thus less than the national count.")) + 
    bplots::theme_blank(element_text(family = "Raleway")) + 
    theme(
        plot.title = element_text(size = 28, hjust = .5, 
        margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
        plot.subtitle = element_text(size = 16, hjust = .5, vjust = -2, color = "#4e4d47"), 
        axis.text = element_blank()
    ) + theme_fancy()

gganimate::gganimate(g, "../figs/2018-05-20-the-us-of-bey/USofBey_birth_cohort.gif", ani.width = 800, ani.height = 600, interval = .5)

0.084% of children born in Mississippi in 2001 were named Beyoncé! That’s not a ton, but it’s more than I would’ve expected. Those kids are juniors and seniors in high school now, carrying the weight of Lemonade on their shoulders. My fondest congratulations to all of 2018’s Beyoncé grads.

Beyond Beyoncé

This is a particularly fun analysis to do for Beyoncé, because not only is she an icon, her first name is uniquely associated with one individual. Parsing the impact of Justin Bieber, for example, on baby names after his rise to fame is trickier, because the name is much more common in America.

Howeer, many names in America go through cyclical trends. The popularity of boys’ names seems to fluctuate less than that of girls - James, John, Robert, and Michael remain popular throughout the decades, while Stacy, Linda, Jessica and Jennifer go through phases of popularity. My own name, Brooke, spikes at the start of the 80s before really hitting its stride in the late 90s and early 2000s, before starting to dip out of fashion in the past decade or so.

By wrapping most of this into a function, we can make this map for any name that is listed in Social Security data. The map_babynames function in the statebabynames package plots the popularity of a baby name throughout the years. Again, you’ll need the babynames dataset by state, which is hosted on Github as an .rda file. The map_babynames function is in the package as well.

# download the state babynames file from github 
if(!exists("statebabynames")) { 
  data(statebabynames::statebabynames) # load data if not loaded already 
}

statebabynames::map_babynames(nam = "Brooke", filename = here::here("content/figs/2018-05-20-the-us-of-bey/Brooke.gif"), start_year = 1950, stop_year = 2016)

Post Script

statebabynames didn’t need to be a package, but in the continued spirit of Doing Too Much, I hope people have fun with it. The package is huge and riddled with dependencies, so I’d happily accept pull requests that make it cleaner. This post was partially an excuse to play with the babynames and the gganimate packages, and partially an excuse to scour the internet for my favorite Beyoncés gifs, in the service of ✨ data science ✨ . I’m not sorry.

sorry


notes

1 I recognize that “BaBeys” is a terrible pun. However, it felt an appropriate place to remind the reader that in the context of Beyonce, “Bey” is not pronounced “Bay.” It’s pronounced “Bee”, like a Queen Bee, like the first half of Be-Yoncé’s name. I don’t want to hear any “but It’s spelled like”Hey!“” commentary. We put up with a lot of strange spellings in the English language. Please don’t subject yourself to the ridicule of the Beyhive. I don’t make the rules.

tweet Share