As a teenager I was introduced to the writings of Edgar Allen Poe, both his poetry and his short stories. While his poems are iconic, his short stories had a larger impact on my understanding of language and writing. In just a few pages, Poe takes his reader on journeys that end, often tragically, but always with lessons on humanity, the frailty of life, and the importance of taking chances. He so clearly communicates not just the life lessons but also the feelings and visualization of his stories in so few words. In a course focused on the importance of clear communication and visualization, what could be more appropriate to study? Poe’s works were the perfect focus for this foray into textual data analysis and the use of Voyant tools.
Textual analysis is an important tool for researchers, particularly for those who use qualitative methods or who work with audiences requiring both stories and numerical data. Voyant tools allows a researcher to upload a body of work or corpus, to set parameters (such as removing commonly used but irrelevant words ex. is), and to visualize analysis such as word count, frequency, correlation, trends, and more. These visualizations may offer insight into an author’s frame of mind or frame of reference or show unintended focus of the author’s (possibly through repetition of terms/phrases or correlation of term usage). While Tools like Voyant allow for textual analysis it is important to remember that the analysis depends on the researcher’s understanding of the text being analyzed and that it reflects their own assumptions.
One of the simplest tools in Voyant is the data summary. This provides an overview of the corpus broken down by document. This project reflects an analysis of five of Poe’s short stories listed here in order of length, The Fall of the House of Usher, The Pit and the Pendulum, The Masque of the Red Death, The Cask of Amontillado, and The Tell-Tale Heart. We see the total word count of the corpus (20,307) and the total number of unique word forms (3,849). After filtering out the common stop words and adding words such as ‘said’ and ‘Usher’, we are shown the most frequently used words. This summary gives the researcher a place to start looking for more in-depth information and allows for comparisons to be drawn between included documents.
The cirrus, or word cloud, is a fun and simple visualization that depicts the most frequently used terms in the corpus. Size of the term shown is relative to the frequency of the term within the corpus. The interactive display ranges from 25 to 500 words, though I find that anything over 50 words is distracting and unreadable. This tool can help the researcher pinpoint additional stop words to filter out of the analysis. This is where I realized that ‘Usher’ and other names were throwing off an analysis of imagery and needed to be removed.
The trends tool allows the investigator to visualize how often specific words or phrases are used in the corpus in among either arbitrary or defined sections. The section of this corpus was defined as individual short stories. Of note, “death,” often associated with Edgar Allen Poe’s work, is rarely mentioned in most of the corpus, being confined primarily to The Pit and the Pendulum and The Masque of the Red Death. Instead, Poe’s works are more focused on the way the way a person perceives the world. “Thought” underlies most of the works, with the thoughts being informed by the senses. Poe uses anatomical terms rather than abstract to describe how the world is interpreted: hands, eyes, and hearts. Focusing on the gross anatomy, Poe brings his audience back to the literal corpus, celebrating the mundane body and its tactile wonders. (It is important to note that the below visualization has limitations. In particular the color palette used may be difficult to interpret, especially when adding larger numbers of words to the visualization. The interactive nature of this tool helps to clarify but this is not useful in printed or static displays.)
The contexts tool goes beyond mere word counts, showing how an individual word is used in context throughout the corpus. Throughout every short story, Poe uses “long” more often than any other word. While this is apparent from the cirrus instrument, the word cloud does not provide any context beyond the raw value of repetitions. The context tool provides more insight into the terms use. Rather than “long” used as a length of measurement (“the long night”) it is also used to express desire (“for whom I long’). “Long” was also highly associated with time (“long minutes”, “long ages”, “long eons”). This context is understandable, especially when considering the cirrus in which many words representing time appear. While the context tool is useful, it becomes considerably more powerful when coupled with the other tools of Voyant.
This is a tool best used as a suite of tools rather than as individual components. Deeper insights come from putting the information from all the tools together. Word clouds or summaries are a good place to start, but it is only when combining insights from all the tools that you get deeper textual analysis. Additionally, the Voyant tool is helpful for pointing out what isn’t there. I expected to see “death” or “darkness” or other tropes associated with Poe appear in the analysis. This forced me to confront my own biases about the material. Often, what we expect but is missing is as important as what is present. It will be interesting moving into our capstone process to see if we can use this tool to help analyze data from multiple focus groups or participant reflections. Further, the data visualization tools may make the data more accessible to our clients.