Political Party Demographics
I started this analysis with a simple question – can you determine the likely vote share of each party based on the underlying character of the country? How important are factors like demographics and the economic success of voters?
In the US when modelling and forecasting elections, the demographics of each individual state is taken into account and while not the only factor, it is an important driver in these models. Can the same approach apply to UK elections?
To this end I gathered together publicly available demographic data and ran analysis comparing this with vote share of the three main parties at the last three elections. For more detail on the exact approach, please see the Inspector Notes at the end. However, it is worth saying that the analysis has been confined to England only. This is because the different nationalist parties in Northern Ireland, Scotland and Wales would have confused the analysis.
The following charts compare the % of vote share of each party with the % of the population within that constituency. The scale is from -1 to +1. The nearer to -1 the more negative the correlation – i.e. the higher the demographic, economic or industry number the lower that party’s vote. Clearly nearer +1 is the opposite – the higher the demographic, the higher the vote. Nearer to 0 shows no relationship between them.
The first really obvious thing is that there are clear correlations for both the Labour and Conservative parties, there is far less correlation for the Liberal Democrats. This suggests that demographics do not explain Lib Dem vote share.
The second thing is that for Labour and Conservative it is an either/ or correlation. Where one is stronger the other is weaker and the strength of the correlation is of similar magnitude
The Conservatives show their strongest vote shares in areas with the highest Pensioner, White and Christian populations. Labour shows a stronger vote share in areas with higher Ethnic minority and minority religious populations. The highest Correlation for the Liberals is for those with no religion.
For this collection of economic indicators we again see a similar pattern of either/ or for Labour and Conservative, but there are clearer correlations for the Lib Dem share than for demographics.
In general areas with stronger economic indicator have voted Conservative. Those indicators of poverty are more associated with a strong Labour vote.
The Liberals correlation are a little inconsistent in comparison with the two main parties, though the strength of the correlation is weaker. Having a positive correlation for unemployment but a negative one for incapacity benefit is counter-intuitive.
One of the main differences between those data included here and the demographic sector is that I’ve included both the constituency % and the % that region belongs to. What does jump out is that the correlation is generally stronger for the Lib Dems at a regional level than constituency. This seems to suggest that the correlation for the Liberals is actually in a constituencies relative strength in relation to the region rather than the country as a whole – i.e. while the region as a whole is poorer, the constituency has a stronger economic performance than the surrounding constituencies.
This shows the proportion of people who have their ‘main job’ in a given industry.
For this dataset there are fewer correlating variables and again there is a slightly weaker correlation for the Lib Dems than for the two main parties. Again, in general we see strong correlations for the Conservatives being inverted for Labour.
I suspect that the strong Conservative share for Agriculture, Forestry and Fishing reflects the rural vs urban split one would expect, though Construction may run counter to that.
Labour appears stronger in blue collar and public sector employment areas.
The Liberal Democrats don’t have any correlations as strong as the other two parties. Their figures probably support the perception of University towns as being Lib Dem strongholds with higher vote shares as the proportion in Education, Research and IT increases. They also have a likely rural slice of support.
The first, and obvious, point is – correlation is not causation. However, the correlation results would support the general perception of the parties. The Conservatives are stronger in richer, whiter and less urban areas. Labour are stronger in urban, poorer and mixed ethnicity areas. The Liberal Democrats have a patchwork of support but for the most part this data does not really explain their levels of support.
Beyond this correlation, I have also created Machine Learning models using this data to attempt to predict the vote share for each party in each constituency. In validating these models it also supports these assertions. For both Labour and the Conservatives the demographic and economic factors can explain 75%-85% of the results depending on which algorithm is used. For the Lib Dems it is much, much lower – to the point that the model is of limited value.
My suspicion is that the history of the constituency will be the most important factor for the Lib Dems – i.e. their vote share in a given constituency is dramatically higher if they have been the winner or runner up in the previous couple of elections. They will struggle where they have no history of being competitive.
Comparing the correlation between parties and demographics, the Lib Dem vote is very slightly closer in pattern to the Conservatives but this isn’t particularly pronounced.
If I were part of the party machine of either of the two main parties I would consider (purely based on the data):
- Measuring the success of a candidate based on their performance against a demographically calculated baseline rather than in absolute terms
- The data suggests that there is a clear inherent vote share based on demographics. I would pick candidates who can appeal beyond their core demographic, particularly in marginal constituencies. This would mean minority candidates for the Conservatives and non-minority candidates for Labour
- There is a clear and unanswered question – if the character of the constituency explains 80% of the vote share, what explains the other 20%
It is far less clear what to draw from this for the Liberal Democrats. They do not have an obvious cohort of support in the same way as the other parties. Much of the suggested further development next is to address how badly this data explains their performance.
Here are some of the things I’d like to do with the model. However, as I have a day job to do it’ll be a while.
- Rerun only looking at the top two results in each constituency to try and get a clearer view of the Lib Dems’ demographics
- Factor in history as a factor or variable. What was the vote share in the previous election – or has the party won that seat in the previous few elections?
- Extend beyond just the last 3 elections. Has there been a change over time? In this period Labour has not been in government – did the same pattern exist when they were winning elections?
- Is there a tendency against an incumbent government? Looking over the last 30 years there was a swing towards the Conservatives under Thatcher which then reduced over time. A similar pattern was seen for Labour under Blair with a big swing that was slowly chipped away. I would expect economic under-performance to have a deflationary effect on vote share on the party in power at the next election
- Factor in more macro economic figures. Are the figures different if the economy is booming or in recession?
- Is there a correlation between party support and voting Leave or Remain in the referendum? Is there a demographic correlation with the Leave/ Remain vote?
The demographic data has been complied predominantly from the 2011 census made available online from the fabulous Office of National Statistics and supplemented where available with more current economic data taken from the awesome data.gov.uk website.
I have selected a sub-set of demographic data from the census as there is a clear overlap for some and for others there was no clear benefit for this analysis. Given more time to compile I would have liked to include all and extended further, but building the dataset was a very manual activity.
All the demographic data was available on a per-constituency basis. I decided early on that the constituency was the most sensible grain to operate at for two reasons. Firstly it is the most common unit across datasets. Secondly (and with future analysis in mind) I suspected that the history of an individual constituency would be an important factor which would need to be factored in. I expected that the Lib Dems would have less of a direct correlation with demographics, but are likely to have historic bastions where history is more important than demographics – i.e. this was a historical Liberal seat and the tradition of them being the first or second party is self-sustaining.
The election data was the last 3 results for England alone. I removed the constituency of the speaker on the grounds this would be an anomaly for vote share. I also removed any constituencies won by an independent for the same reason.
Building the analysis in R I ran a simple correlation analysis which the above is built on. I also built machine learning models for each of the parties to predict their vote share as a regression analysis. While the results are not included directly in this article the knowledge provided is.
There were a couple of gaps in the data where one of the main parties did not field a candidate. In these instances I imputed a consistent variable as most models cannot deal with gaps.
The most annoying part of this was the lack of a consistently applied key to join this data together. Not all data is supplied with the unique key and worse, the full names of the constituencies were inconsistent. Sometimes ‘and’ sometimes ‘&’. Sometimes ‘West Cityname’ sometimes ‘Cityname West’. Sometimes ‘Area1 and Area2’, sometimes ‘Area2 and Area1’. So the most time consuming thing of all was building and cleaning the datasets.