About

Who we are

This resource is provided by Jonah Berger, Garrick Sherman, and Lyle Ungar of the University of Pennsylvania. Please send comments, suggestions, and bug reports to info@TextAnalyzer.org. Please cite tool this as follows:

Berger, J., Sherman, G., & Ungar, L. (2020). TextAnalyzer. Retrieved from http://textanalyzer.org.

Please also cite the lexica that you use. These citations can be found here.

Technical details

Upload a .cvs or .xlsx file in which each row is a single entry an English language text file, like this one. The first column must be an ID for that entry, and the second column must be the text itself. Any other columns will be ignored.

The uploaded text is first tokenized (split into individual words and punctuation marks) using the NLTK tokenizer. We then return a score for each lexicon for each entry (row), which is the sum of weights associated with each word in the entry (with 0 for words not in the lexicon and 1 for lexica which lack weights) divided by the number of words in that entry. E.g. for a lexicon containing the words “feel” and “odd” with weights 1 and 2, respectively. The entry “I feel odd!” would yield a score of (2+1)/4 = .75. (Note that punctuation count as words.)

The resulting output is a tabular data file in which each row corresponds to a row in your uploaded text file, and each column is the score computed for that row using a given lexicon category.