Media Resource

Race and Ethnicity Keyword Thesaurus for Chronicling America

Three men, two reading newspapers, standing in front of Office of Reports, Free Press. Mountains in distance.
Photo caption

Manzanar Free Press editor Roy Takeno (left), Yuichi Hirata, and Nabou Samamura in front of the newspaper office at the Manzanar Relocation Center, California, 1943.

Created by partners in the National Digital Newspaper Program, this resource hopes to serve researchers at all levels through demonstrations and explanations of search terms related to race and ethnicity in Chronicling America. Established in 2005 through a partnership between the National Endowment for the Humanities and the Library of Congress, Chronicling America contains nearly 20 million pages of newspapers supported by the work of partners in 50 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands.  

Searching a database of this size can seem like an impossible feat at the onset. To help, Chronicling America features two search functions that allow for search terms, or keywords, to be run through the texts of newspaper pages. This is an immensely efficient and effective way to search millions of pages of content, yet keyword searches still do not always produce the results we want or need. Developing the most useful keywords requires knowing what terms were used in the period you are searching, rather than what the common, contemporary terms might be.

Identifying keywords can be particularly challenging when searching for news about race and ethnicity, since much of the language describing such communities has evolved and changed throughout the centuries, and their meanings may vary depending on who is using the terms and the context in which they are using them. In some cases, those part of a group (what the Thesaurus refers to as “insiders”), whether identifying by race, ethnicity, region, gender, or class, and those outside of that group (what the Thesaurus refers to as “outsiders”) use the same terms differently, and in such instances, we have tried to capture that distinction.

The pages in this Race and Ethnicity Keyword Thesaurus serve as a guide to searching topics of race and ethnicity in Chronicling America, including lists of words used in the past that may help produce more results, as well as strategies for navigating the database. When using this resource, keep in mind that historical newspapers, like all primary documents from the past, use the language of the time they were written, which may include terms considered offensive today. While efforts have been made to include and increase the ethnic press content in Chronicling America, most of the newspapers currently in the database are English-language and produced by white publishers and editors. In this iteration, this Thesaurus is intended to primarily assist researchers in identifying terms in relation to race and ethnicity in the English-language press. However, efforts are underway to expand the Thesaurus to include terms in Spanish; you will see, for example, Spanish-language content under the following entries, “Indígena” and “Negro/a.”. Researchers should be mindful that race and ethnicity do not exist independently but intersect with other aspects of society like community size, class, and gender. Researchers should also be mindful that Chronicling America is not a comprehensive source of historical U.S. newspapers. Explore Chronicling America's newspapers on an interactive map and timeline interface and view additional data visualizations to better understand the scope and coverage of newspapers currently available in the collection. 

EDSITEment also offers a thesaurus for searching keywords related to immigration and citizenship

Starting Your Search

Keyword Searching

When searching on the web, you can enter your complete research question in the search box and get a number of results. However, if you enter your entire research question in the Chronicling America search bar, you probably won't get any results. Keyword searches rely on exact words that you enter in the search box, so if the search can't find all the words that you entered in the search boxes in the information about an article, it won't bring back any results. 

Keywords, also commonly called search terms, are the words that you enter into the database search boxes. They represent the main concepts of your research topic and are the words used in everyday life to describe the topic, often specific to a time and a place. Without the right keywords, you may have difficulty finding the articles that you need. 

The Chronicling America help section offers advice for performing basic and advanced searches, as well as general tips for using the database.  

Choosing your Keywords

Usually, the nouns and adjectives of your research question will give you a good idea of what your keywords will be. From these keywords, make a list of synonyms to use as alternatives. Since different writers will describe the same thing using different words, creating a variety of keywords to use in your search will increase the number of results. This thesaurus aims to assist you with synonyms for researching questions around race and ethnicity.  

Because historical primary documents like newspapers use the language of the past that is often no longer common parlance, some of today’s language will not make for good keywords to search on Chronicling America. For example, searching the phrase “African American” in Chronicling America produces only 671 results, with the earliest result occurring in 1814. This search produces zero results from the eighteenth century, so almost 40 years (1777-1814) of print is missed by using this phrase. If you search the term “Negro,” however, 2,686,588 results, from May 29, 1777 to December 31, 1963, are returned. While “African American” is a phrase we are familiar with in our current time, these results, when compared, show how infrequently it was used in previous centuries. This thesaurus will assist you with related terms from the past that were used to describe race and ethnicity. Even with the suggested keywords, you should be prepared to perform multiple searches to determine which keywords work best for your topic. Eventually, trial and error will help you find the materials you need.

OCR Considerations

Optical Character Recognition, or OCR, is a technology that transforms the images of the pages into the text that we, and by extension, the software, reads. Chronicling America uses OCR to process images of newspaper pages by locating and recognizing characters, such as letters, numbers, and symbols, and presenting them as words that keyword searches identify and return as search results. 

OCR works best when the images are clear and free of imperfections, but documents from the past do not always fit this description. Historic newspapers are usually digitized from microform copies of the original pages, which makes for some messy images full of smudges, blurs, and imperfections.  

Unfortunately, OCR software is unable to distinguish intentional characters, like the individual letters of a word, from the unintentional marks found on the page, like the smudges or blurs. For example, a blurry “c” might be read as an “o” by the OCR software, which would make a word like “cat” read as “oat.”  

To account for these errors, researchers must adjust some keywords to search for the correct term as well as the “OCR possibles.” Because of this, when you perform a search for “African American,” it may also be useful to perform a search for “Atrican American” to get any results that may have errors in the OCR text.  

The resource in this Race and Ethnicity Keyword Thesaurus include potential OCR blunders for each of the keywords provided under the section “OCR Considerations” to encourage you to consider how OCR might be determining your search results.   

Working with the Thesaurus

Created by nationwide partners in National Digital Newspaper Program, this resource hopes to serve researchers at all levels through demonstrations and explanations of search terms related to race and ethnicity in Chronicling America. Launched in September 2022, this site hopes to expand its coverage and reach in the years ahead.

Organized around seven broad categories for race and ethnicity, the Thesaurus offers keywords that were used in the past to describe this category and therefore will likely be helpful in searching for articles about a given group. For each keyword under the umbrella category, the Thesaurus offers the following information: 

Related Terms 

Here find other terms to try in your searches. Some of these terms are synonyms, but some are also terms from different periods of time to help searching in different time periods.

Definitions 

Provided are definitions of these terms throughout the years. This section also includes how various groups of people have used these terms. Many of these definitions have been sourced from the Oxford English Dictionary.

Contextual Considerations, or "How these Terms were Used"

Contextual considerations provide examples of how various groups of people have used these terms, as well as some of the historical reasons surrounding its use.

Insider / Outsider Use of the Term

Meanings may vary depending on who is using the terms and the context in which they are using them. In some cases, those part of a group (what the Thesaurus refers to as “insiders”), whether identifying by race, ethnicity, region, gender, or class, and those outside of that group (what the Thesaurus refers to as “outsiders”) use the same terms differently, and in such instances, we have tried to explain that distinction.

Examples from Chronicling America 

The images provided in this section show examples of these terms being used in newspapers found on Chronicling America. As you will notice, the examples selected illuminate the “Contextual Considerations” described in the previous section. If a distinction between “insider” and “outsider” uses of a term apply, then an additional example from Chronicling America will be provided that captures the “insider” use of the term. Please note that, at this time, such “insider” examples are limited to English-language newspapers, and many more may be found in non-English language newspapers. We encourage you to explore!

OCR Considerations, or "How the Computer Sees it"

Also provided are a few variants that consider the errors made in the OCR. You may want to search a few of these in addition to the correctly spelled term. This section does not aim to be comprehensive; instead, it features a few examples of how the OCR may incorrectly transform the page image into text.  

Thesaurus Categories and Keyword Lists

Each of the links below will open a new page in the Thesaurus organized around a broad category of race and ethnicity. The keywords suggested on each page represent related race and ethnicity terms that were used by newspapers in various historical periods and may improve your Chronicling America search results. 

 

A Note on Harmful Language

On each keyword list page, a pop-up will appear with the following message about harmful language:

The following resource presents a list of terms that may reflect racist and xenophobic opinions and attitudes. In providing this list of keywords, our goal is to support research into the lives and experiences of various communities, rather than to propagate the use of derogatory or harmful language.

When researching historical materials, especially newspapers, it is often necessary to use language in common use at the time of publication. Historical newspapers reflect the opinions and attitudes of their time. Their pages often contain biased, offensive, and outdated words and images that are now understood to be harmful.    

As responsible researchers, we should acknowledge and be mindful of how these terms oppressed groups of people, and take great care in using these terms to conduct research in the present. 

This pop-up reminds users to take a physical and mental pause when reading these words to consider the ways this language has been used to oppress communities of people. The terms themselves detail the complicated and painful relationship that the country has had with understanding race and ethnicity, and they remind us that language holds power in the ways it can support or dismantle systems of oppression.

Acknowledgements

The work for our Race and Ethnicity Keyword Thesaurus for Chronicling America was truly collaborative. The National Digital Newspaper Program’s Working Group on Race and Ethnicity, including Mary Feeney (AZ), Sarah Lynn Fisher (TX), Molly Hardy (NEH), Melissa Jerome (FL), Ana Krahmer (TX), Sheila McAlister (GA), Sativa Peterson (AZ), Katherine Poland (IL), Ann Sneesby-Koch (CO), Randi Ramsden (WI), and William Schlaack (IL), initiated the project. The group's ideas became realized thanks to the skillful work of NEH’s Pathways Intern Samantha Gilmore. We are very grateful for the assistance of scholars Hannah Alpert-Abrams, Jim Casey, Sarah Salter, Kimberly Toney, Edlie Wong, and Jewon Woo, as well as the advice of Ahmed Johnson and Jaime Mears from the Library of Congress. We would also like to thank the previous NEH Division of Preservation and Access interns, Jeanette Schollaert and Joshua Ortiz-Baco, for their support and suggestions.