(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
Lucas Estevem set up this website in d3 as his final project in our statistical communication and graphics class this spring.
Copy any text into the window, push the button, and you get this clean and attractive display showing the estimated positivity or negativity of each sentence. The length of each bar is some continuously-scaled estimate of the sentiment, and the width is proportional to the length of the sentence.
But what’s it for?
This is great. And it also leads to the surprisingly subtle question: What’s the use of this tool?
The most obvious answer is, Duh, you use it to visualize a text sentiment analysis.
But I don’t think that’s the right answer. To see why, we must first ask ourselves why we want to estimate text sentiments in the first place. Why would someone want this tool? It’s not so it will help us read texts. No. I think the reason you’d want to estimate the sentiments in sentences of text is if for some reason you want to be classifying a large number of documents and getting a quick summary of each. In which case, what do you get out of a visualization? It won’t be particularly useful as part of a big loop.
No. What you get out of visualization is model checking, as described in my 2003 article on the Bayesian foundations of exploratory data analysis. The value of a display such as the one above
An interactive display is particularly valuable because we can try out different texts, or even alter the existing document word by word, in order to reverse-engineer the sentiment analyzer and see how it works. The sentiment analyzer is far from perfect, and being able to look inside in this way can give us insight into where it will be useful, where it might mislead, and how it might be improved.
Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.
P.S. It would also be good to have a link to the source code of the sentiment analyzer and also a document explaining how it works and giving details on the data that were used to train it.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science