Skip to content

Introductory Text Analysis with Google’s ngram Viewer

Posted in Digital Humanities, and Percolating Ideas

Simple curiosity motivated the creation of these few graphs, but they will also be used in a graduate text analysis for humanists class as examples of the kinds of questions we can ask and answer with simple and accessible tools.

Is there a correlation between uses of the words “frontier,” “settler,” and “indian,” in published works between 1750 and 1800? What do the frequencies of these words tell us about how contemporaries viewed the conquest and settlement of the American Midwest?

It is clear that the word “frontier” is used far more frequently than the other two. As a result, the scale required to accommodate the difference makes it difficult to analyze the concomitant uses of “settler” and “indian.” In the next chart, I remove the word “frontier” so I can take a closer look at the other two words in relation to each other and then compare these patterns with the first chart.


As the chart above reveals, the most dramatic peak use of the word “indian” in published works occurs in 1754, just as the French and Indian War ignites conflict in the backcountry. During the war, in 1758 and 1760, there are two smaller peaks in the usage of “indian,” but they are quickly overtaken by use of the word “settler” in 1763 when Great Britain establishes a territorial boundary along the Allegheny Mountains to keep settlers on the east coast and separated from the Native communities. A second peak in the usage of “settler” occurs in 1773-1774 as more settlers begin crossing the border established by the Proclamation of 1763 and entering territory that would become Kentucky. As more settlers encroached on native lands, tensions rose, exploding in violence in Lord Dunmore’s War in 1774. If we ask Google to search for the plural, “settlers,” we see the most dramatic spike in frequency:

In 1795, both the words “settler” and “indian” appear more frequently, but use of “settler” more than doubled that of “indian” in this corpus. In 1795 an American military unit under William Henry Harrison finally defeated a confederated Native force, marking the first American victory in the Northwest Territory. Following the military rout, the Native militants were forced to sign a treaty that transferred much of their land to the United States. With this victory and the Treaty of Greenville, the American conquest began in earnest, as the increase in uses of both “indian” and “settler” after 1795 may indicate.

The commentary above may be interesting, but it confirms patterns of which we were already aware. However, a comparison of the first three charts indicates the greater attention given to the land in the backcountry “frontier” than to the people who lived or had moved there. Incidents of “frontier” in the published literature between 1750 and 1800 dwarf the occurrence of “settler” and “indian*”. By adding “settlers” to our analysis, we see that the frequency of this plural term often keeps pace with that of “frontier.” This suggests that while eighteenth-century authors’ attention was frequently directed at the land, the settlers, in aggregate, rather than as individuals, were equally important.


Here, we see tantalizing hints that the development of settler colonies in American Midwest became an intentional project over time, rather than mere happenstance. However, more research needs to be conducted with a corpus that includes not only published public domain books, but also newspapers, correspondence, and Congressional records.

To explore this data further, follow the links below to search in Google Books:

1750 – 1760 1761 – 1773 1774 1775 – 1795 1796 – 1799 settler English
1750 – 1760 1761 – 1773 1774 1775 – 1796 1797 – 1800 settlers English
1750 – 1752 1753 – 1759 1760 1761 – 1800 1801 – 1800 indian English
1750 – 1759 1760 1761 – 1787 1788 – 1800 1801 – 1800 indians English
1750 – 1757 1758 – 1787 1788 – 1790 1791 – 1795 1796 – 1800 frontier English

Culturomics’ corollary project, Bookworm, allows us to take a deeper dive. Using Bookwork with HathiTrust’s extensive digital library reveals the individual texts in which the words of interest appear. Since the corpus is different from that used in Google’s ngram viewer, a contrasting picture emerges. The word, “indian” appears with much greater frequency, which reveals the importance of the corpus selected and serves as a warning to those who unwittingly place much stock in the results of these tools without digging into the underlying sources.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *