Developing Race and Gender Estimates for US Law Enforcement Leadership
A Use-Case for the predictrace
Package
Introduction
Researchers might be interested in developing a descriptive understanding of the gender and race composition of a particular industry, organization, or other institution. Oftentimes this is done with sampling from a population. This is the case in law enforcement. With approximately 18,000 sub-federal law enforcement agencies in the United States, and somewhere around 800,000 officers, it can be a challenging environment for researchers. Given the huge variation in agency type, size, composition, etc., generalizing across “law enforcement” is tricky at best.
In this preliminary analysis, I attempt a population-level inference for US law enforcement agencies, to develop estimates of race and gender proportions in the “chief executive” spot. The chief executive for a sherrif’s office is the Sheriff (often elected), while in a state-level agency it might be Executive Director - there is a lot of variation.
Gender and Race in US Law Enforcement
John Shjarback and Natalie Todak (2019) use data from the 2013 Law Enforcement Management and Administrative Statistics (LEMAS) survey to analyze correlates of women in supervisory, mid-level, and chief executive roles in 2,826 municipal police, sheriff’s offices, and primary state law enforcement agencies. The 2013 LEMAS data was the first national survey to report on this level of data, and just 2.7% of the agencies were led by women. My goal here will be to see if using a commercial database of a much larger set of agencies, combined with a probabilistic estimate of gender and race, compares to the estimates from the 2013 LEMAS.
The 2016 LEMAS estimates that for chiefs across all size of local agencies, 89.6% were White, 4% Black, 3.1% Hispanic, and 2.4% other. It also estimates that in those same agencies, just 2.6% of chiefs were female. However, this 2016 sample design results in 2,612 local agencies (rather than the larger sample of all agencies), and uses a stratified sampling that intentionally oversamples from the largest agencies (+100 full-time officers).
But another method might be obtaining population-level information and inferring race and gender for the individuals based on that information. Jacob Kaplan has developed the predictrace
package to do just that. The package develops a probability of race and gender based on the first name of a subject. This is from the package’s introduction:
The goal of predictrace is to predict the race of a surname or first name and the gender of a first name. This package uses U.S. Census data which says how many people of each race has a certain surname. For first name data, this package uses data from Tzioumis (2018). From this we can predict which race is mostly likely to have that surname or first name. The possible races are American Indian, Asian, Black, Hispanic, White, or two or more races. For the gender of first names, this package uses data from the United States Social Security Administration (SSA) that tells how many people of a given name are female and how many are male (no other genders are included). I use this to determine the proportion of each gender a name is, and use the gender with the higher proportion as the most likely gender for that name. Please note that the Census data on the race of first names is far smaller than the SSA data on the gender of first names, so you will match far fewer first names to race than to gender.
Data
In this short demonstration, I will attempt to develop race and gender estimates for individuals who lead US law enforcement agencies. To do so, I will rely on a commercial dataset from the National Directory of Law Enforcement Administrators (NDLEA). The dataset contains just over 37,000 listings for the chief administrator of law enforcement organizations at every level of the US system - from municipal police to heads of major federal agencies like the FBI, and everything in-between. The company that puts this database together commits to contacting every agency on the list at least once a year, and the company representative I spoke to said they are closer to once every three months. In my experience the dataset has been very reliable when I need to contact a head administrator directly.
However, in order to constrain the analysis, I will just be looking at Campus Law Enforcement, County Sheriffs, and Municipal Law Enforcement agencies (n=17,104). Because I look at some correlations later with population, I drop any observations missing that information (missing n= 204), leaving a total of 16,900 observations. I’ll also reduce this to a simpler dataset by retaining only the department type, first name of administrator, state, and population served.
Let’s check and see if that looks right.DeptType | FirstName | MailingState | Population |
---|---|---|---|
Campus Law Enforcement | Dave | MA | <25,000 |
Municipal Law Enforcement | Paul | VA | <25,000 |
Municipal Law Enforcement | Justin | AK | 100k-1M |
Municipal Law Enforcement | Thomas | MI | 25k-50k |
Municipal Law Enforcement | Troy | MO | <25,000 |
Municipal Law Enforcement | Ronald | MA | <25,000 |
Municipal Law Enforcement | Julian | CA | 100k-1M |
Municipal Law Enforcement | John | OK | <25,000 |
Municipal Law Enforcement | David | MI | <25,000 |
Municipal Law Enforcement | Matt | PA | <25,000 |
Looks like population data is pretty spotty (there’s an outlier from a typo that had the population of Shelby County, TN, at over 93 million! I fixed it behind the scenes here), but that’s not our main focus here today. Overall, it’s looking pretty good!
Inferring Race and Gender from First Name Data
Kaplan’s package predictrace
will derive a gender and race classification for first names contained within our dataset. First we’ll use the predict_gender
call, and then the predict_race
functions to build the initial lists.
As you can see, the package reports probabilities for each entry, and gives a best-guess (likely_gender
and likely_race
) given those probabilities.
name | match_name | likely_race | probability_american_indian | probability_asian | probability_black | probability_hispanic | probability_white | probability_2races |
---|---|---|---|---|---|---|---|---|
Steve | steve | white | 0.0024 | 0.0721 | 0.0221 | 0.0483 | 0.8540 | 0.0010 |
Eliezer | eliezer | NA | NA | NA | NA | NA | NA | NA |
Hector | hector | hispanic | 0.0000 | 0.0135 | 0.0045 | 0.9270 | 0.0550 | 0.0000 |
Ron | ron | white | 0.0034 | 0.0469 | 0.0402 | 0.0235 | 0.8844 | 0.0017 |
James | james | white | 0.0012 | 0.0147 | 0.0328 | 0.0100 | 0.9402 | 0.0012 |
Desiree | desiree | white | 0.0030 | 0.0334 | 0.1246 | 0.1155 | 0.7143 | 0.0091 |
Kevin | kevin | white | 0.0006 | 0.0324 | 0.0284 | 0.0082 | 0.9296 | 0.0009 |
Rick | rick | white | 0.0029 | 0.0284 | 0.0073 | 0.0277 | 0.9314 | 0.0022 |
Christopher | christopher | white | 0.0013 | 0.0140 | 0.0200 | 0.0179 | 0.9454 | 0.0014 |
Karl | karl | white | 0.0007 | 0.0260 | 0.0281 | 0.0070 | 0.9374 | 0.0007 |
name | match_name | likely_gender | probability_female | probability_male |
---|---|---|---|---|
Michael | michael | male | 0.0049518 | 0.9950482 |
Berkley | berkley | female | 0.6417722 | 0.3582278 |
Kelly | kelly | female | 0.8523312 | 0.1476688 |
Donald | donald | male | 0.0039238 | 0.9960762 |
Alfonzo | alfonzo | male | 0.0000000 | 1.0000000 |
Joseph | joseph | male | 0.0040515 | 0.9959485 |
Scott | scott | male | 0.0033662 | 0.9966338 |
Christopher | christopher | male | 0.0046306 | 0.9953694 |
Dennis | dennis | male | 0.0042935 | 0.9957065 |
Donald | donald | male | 0.0039238 | 0.9960762 |
predictrace
package back to our original data, and quickly get a feel for the overall distribution of gender and race.
Variable | N | Percent |
---|---|---|
DeptType | 16900 | |
… Municipal Law Enforcement | 11697 | 69.2% |
… Campus Law Enforcement | 2038 | 12.1% |
… County Sheriffs | 3165 | 18.7% |
Population | 16900 | |
… <25,000 | 13614 | 80.6% |
… 25k-50k | 1562 | 9.2% |
… 50k-100k | 868 | 5.1% |
… 100k-1M | 797 | 4.7% |
… 1M-10M | 59 | 0.3% |
gender | 16619 | |
… male | 15583 | 93.8% |
… female | 1036 | 6.2% |
race | 16175 | |
… white | 15844 | 98% |
… black | 67 | 0.4% |
… hispanic | 234 | 1.4% |
… hispanic, white | 2 | 0% |
… asian | 27 | 0.2% |
… asian, white | 1 | 0% |
Results
Let’s breakdown race and gender estimates by population of the area served by the agency. Because of the very low counts in Hispanic/White, and Asian/White, I’m going to collapse those into Hispanic and Asian categories respectively. As population data for very small areas (<1000 pop.) can be spotty in the NDLEA, we lose some observations.
Variable | Overall, N = 16,9001 | <25,000, N = 13,6141 | 25k-50k, N = 1,5621 | 50k-100k, N = 8681 | 100k-1M, N = 7971 | 1M-10M, N = 591 |
---|---|---|---|---|---|---|
race | ||||||
White | 15,844 (97.95%) | 12,796 (98.13%) | 1,483 (97.95%) | 793 (97.42%) | 720 (95.87%) | 52 (92.86%) |
Black | 67 (0.41%) | 50 (0.38%) | 6 (0.40%) | 6 (0.74%) | 4 (0.53%) | 1 (1.79%) |
Hispanic | 236 (1.46%) | 175 (1.34%) | 23 (1.52%) | 14 (1.72%) | 21 (2.80%) | 3 (5.36%) |
Asian | 28 (0.17%) | 19 (0.15%) | 2 (0.13%) | 1 (0.12%) | 6 (0.80%) | 0 (0.00%) |
Unknown | 725 | 574 | 48 | 54 | 46 | 3 |
gender | ||||||
male | 15,583 (93.77%) | 12,569 (93.85%) | 1,456 (94.12%) | 791 (93.94%) | 718 (92.17%) | 49 (83.05%) |
female | 1,036 (6.23%) | 823 (6.15%) | 91 (5.88%) | 51 (6.06%) | 61 (7.83%) | 10 (16.95%) |
Unknown | 281 | 222 | 15 | 26 | 18 | 0 |
1
n (%)
|
Perhaps unsurprisingly, law enforcement agencies are predominantly led by males. However, there may be progress over the decade or so. Compared to the LEMAS 2013 data, which estimated just 2.7% of agencies were led by women, my analysis estimates that overall 6.2% of agencies are led by women. The proportion of women-led agencies tends to be stable around 6% until we get to the larger population centers, and in the largest (between 1M and 10M pop.), 17% of the agencies are led by women. This is much larger than the 8.5% suggested by the 2016 LEMAS, though the largest category there is 250,000+ population.
In terms of racial characteristics, this analysis suggests that, overall, 98% of agencies are led by White chief executives. This percentage is negatively correlated with population. In other words, the percentage of White chief executives tends to decrease as the size of population served increases. Even at the top-end of population size, however, these positions are heavily skewed, as seen in the largest (1M to 10M) areas, where 93% of chief executives are estimated to be White.
Let’s see if the proportions hold across agency types as well.
Variable | Overall, N = 16,9001 | Municipal Law Enforcement, N = 11,6971 | Campus Law Enforcement, N = 2,0381 | County Sheriffs, N = 3,1651 |
---|---|---|---|---|
race | ||||
White | 15,844 (97.95%) | 11,035 (98.12%) | 1,866 (96.73%) | 2,943 (98.10%) |
Black | 67 (0.41%) | 40 (0.36%) | 15 (0.78%) | 12 (0.40%) |
Hispanic | 236 (1.46%) | 155 (1.38%) | 45 (2.33%) | 36 (1.20%) |
Asian | 28 (0.17%) | 16 (0.14%) | 3 (0.16%) | 9 (0.30%) |
Unknown | 725 | 451 | 109 | 165 |
gender | ||||
male | 15,583 (93.77%) | 10,910 (94.56%) | 1,729 (86.71%) | 2,944 (95.37%) |
female | 1,036 (6.23%) | 628 (5.44%) | 265 (13.29%) | 143 (4.63%) |
Unknown | 281 | 159 | 44 | 78 |
1
n (%)
|
As you can see, based on these results, agency type does not seem to be correlated with higher percentages of non-white chief executives. However, campus law enforcement agencies are much more likely than other agency types to be led by women - over 13% compared to the average of 6.3% overall.
Conclusion
There is a lot of investigation needed before relying on these estimates, as they are even more overwhelmingly White than previous reporting would suggest. Recall that the 2016 LEMAS estimated that among local agency chiefs, 89.6% were White, 4% Black, 3.1% Hispanic, and 2.4% other race. The differences here suggest more analysis is needed, but several obvious options present themselves. It may be there are substantial gaps between the sampling in the LEMAS versus a population-level estimate. Alternatively, the probabilities themselves are skewing towards White likelihoods. The inclusion of more than just local agencies in this analysis also deserves some thought, as there may be agency characteristics that lead to higher proportions of non-Whites to be selected for the top job.
Some of the gaps are too large to comfortably chalk up to sampling or research design. The 2016 LEMAS estimated that in agencies serving over 250,000 people, just 65% of chiefs were White, while the current analysis would suggest this number is between 92-96%. That large of a gap is a strong suggestion that the inference of race for this population is questionable. On the other hand, the gender inferences seem much more stable across this analysis and previous ones.
As always, lots of warnings here about how seriously we should take these estimates. They are, after all, based on probabilistic inferences about race and gender given only a first name. There are lots of weaknesses to consider in that approach. On the other hand, this gives a much broader look at nearly the entire population of US law enforcement agencies in their respective categories (municipal, sheriff’s, campus, and state law enforcement).
Many thanks to Jacob Kaplan, who developed the predictrace
package for R, as this quick analysis would not be possible without his hard work.