Party Leader Speeches
With conference season looming, I thought it would be interesting to do a textual analysis of the two main party leaders’ speeches to their annual conference over the last few years. Sourcing the text from http://www.britishpoliticalspeech.org I’ve taken the last 10 years’ worth. I picked this period firstly because it’s a nice, round number but also because in 2006 David Cameron made two speeches to conference which would complicate things – how should I treat those two speeches? Should I merge into one and take the mean score of that? Should they be independent entries? Should I only take the first? Or the second? Simpler to start from there – since it was a short-lived experiment it was clearly wasn’t seen as a success.
For those that are interested in the mechanics the work was all done in Rstudio (a tool for developing R – a statistical programming language) and I’ve used the ‘afinn’ sentiment dictionary. This dictionary assigns a sentiment rating to individual words on a scale ranging from -5 for the most negative to +5 for the most positive. At its simplest level we can then take the mean word score for the body of text to express the overall sentiment.
The most interesting results were displaying by specific leader – rather than simply by party or government vs opposition. This is because there is a clear difference in tone between individual party leaders which doesn’t correlate to party or whether they are in power at the time.
In terms of tone, David Cameron erred towards a less positive tone, as has Jeremy Corbyn more recently. On the other hand both Ed Miliband and Theresa May have both tended to strike more positive tones – as did Gordon Brown. One thing of note is the overall trend over time of David Cameron’s tone – starting off more negative and moving over his career towards neutral.
For those of an geekier disposition, I’ve done all this at a unigram level (on a word-by-word basis) rather than n-gram (taking words together). While this might lower the accuracy of the sentiment analysis, the formatting of the raw text was a bit mixed. Early on in the data wrangling process I stripped out punctuation so n-grams would not work effectively since they would artificially span sentences.
Most tutorials and articles related to sentiment analysis use don’t use the mean score. Instead they subtract the count of negative words from the count of positive. I’ve chosen this approach as it allows for differing lengths of text. Using this simpler method if one speech had a generally negative tone but was longer in word length than its counterpart it would generate a lower sentiment score purely by dint of its length.