Beyond Dante: Software being developed to monitor US citizen sentiment

Sunday, October 08, 2006

Software being developed to monitor US citizen sentiment

There was an article that appeared in the New York Times last week that outlined plans from the department of homeland security to develop software to monitor the opinions of US citizens towards their government. The article points out how Orwellian this process could become and states that the program could take "several years" to roll out.

My guess is that they are much closer than they are letting on and it will be used long before it is accurate enough for incrimination. The accuracy problem is in understanding context and detecting satire. These are very tough problems to crack.

Examples:

In a recent review posted on Rotten Tomatos about the movie, "The Queen" the reviewer writes, "Brilliant dissection of the rot at the top of British high society and politics" contains the adjective, "brilliant" w.r.t. the overall subject, "brilliant disection of the rot at the top of the British high society and politics." And also contains sentiment inside the subject, "rot at the top of the British high society and politics."

Or, as described in Bo Pang and Lillian Lee's paper, "Thumbs up? Sentiment Classification using Machine Learning Techniques", the sentences "How could anyone sit through this movie?" contains no single word that is obviously negative.

Or, in the following case that most algorithms would consider ambiguous, "This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can't hold up"

My company has been working on sentiment models like these for a couple of years now and much of the progress has been made using statistical language processing algorithms. Some of these are similar to algorithms used in email spam detection applications. Depending on the problem domain, I've seen upwards of 90% accuracy. This is acceptable in many commercial applications but will require a human net to cover the last mile of accuracy when perfect accuracy is required. I agree with the writer and think it is a bit Orwellian, and reflects the times in which we live. This may help in the war on terrorism but my guess is it will only catch the groups too stupid to cover their tracks.