Word Frequency Comparison

Drag in two plain-text files to compare their vocabularies

Drop .txt file here or click to browse

No file loaded

vs

Drop .txt file here or click to browse

No file loaded

Load example:
Load two texts and click Analyse to begin.

Shared words coloured by which text uses them more — hover for details

More characteristic of Text A More characteristic of Text B

Top 100 unique words by frequency — hover for details

Unique to Text A

Unique to Text B

Unique hapax legomena — Text A

Unique hapax legomena — Text B

This app aims to find interesting word usage differences between two texts.

All words are ranked by their counts in each text, filtered to words that are common to the two texts, and normalized to the same ranking scale. Words with the biggest differences in normalized rank value are displayed on the dot plot. This allows texts of very different lengths to be meaningfully contrasted.

The bible is copyright-free and there are some really interesting differences here. The preloaded examples are: New Testament vs. Old Testament, gospel of Mark vs. gospel of John, American Standard Version (1901).

The Top N filter controls how many of the most disproportionately used words are shown. The Stopwords filter removes English's most common short and boring words if enabled. (Usually these are already filtered out anyway, because they have similar high rankings in both texts, but there are exceptions.) The Count diff filter removes words with count differences that are less than N (ten by default). The Sign consistency filter is on by default, and removes words that are, for example, characteristic of text A despite having a higher count in text B, which can happen when text B is much longer than text A. The unique words menu shows the top 100 words that are unique to each text. Hapax legomena are words that occur only once in an entire text.

A writer's tics and tonal shifts between books are not always obvious at a granular level on a casual read-through, but they can really pop when you run them through this tool.