Using Data Visualisation To Analyse Refugee Movement
Introduction
Given the recent refugee crisis I wanted to do some analysis looking at how the scale and frequency has changed over the years to better understand how many people are affected and displaced.
Since 1956 the United Nations High Commissioner for Refugees has recorded and provided data. The data consists of the country of origin, the country of destination and the number of refugees. In the first decade the data is relatively limited, countries are often listed as “Various”. However from 1965 onward the data is consistently documented, up to the most recent release which was the end of 2017.
Continent Of Origin
The first metric that I investigated was the total number of refugees on a yearly basis aggregated by continent of origin.

Looking at the above visualisation we can see that there seems to be a very sharp rise in the late 70s and early 80s. This could be due to a number of things e.g. data being less consistently recorded, less social programmes offering refugees asylum etc. From the early 90s to mid 00s there was a consistent overall decrease in the total number of global refugees.
In 1990 the total number of refugees was 14.5 million, the highest it had ever been, it wasn’t until 2015 that this figure was surpassed. The largest number of recorded refugees is in 2017 at almost 19 million. Given the sharp upwards trend from 2013 to 2017 and the fact that UNHCR has released 2018 overall figures for refugees at approximately 25 million, it is not inconceivable to imagine 2019’s figures being significantly higher.
It is also clear that over the last 10 years Europe has seen a steady decrease compared to previous decades whilst South America has had a significant increase in volume.
Country Of Origin – Top 10
Due to the large number of years and countries in the data, visualising it in a static image is difficult to digest. As a result of this I created some animated graphs to allow the change over time to be understood and interpreted.

As can be seen in the above visualisation, Afghanistan went through a prolonged period of instability from the 80s through to the 00s and consistently appeared in the top 10 countries by volume, often being the largest.
The crisis in Syria is also clearly evident and the figures for 2017 is the largest number of refugees from any country ever recorded in a given year.
Whilst the above visualisation provides some insight and gives an idea as to how devastating the scale of these crises’s are, the figures may be slightly skewed by the overall population increasing over the years on both a country and global level.
The following visualisation looks at the number of refugees by country of origin as a percentage of the population. Doing so negates the overall global increase

The above visualisation tells quite a different story compared to the previous one. Many countries which have a smaller population appear on the list for this visualisation and this provides visibility of refugee crisis’s that seem smaller at a glance when looking at raw numbers but have clearly effected the country greatly.
We can also see that for a period of time approximately 50% of Afghanistan citizens were classified as refugees which is the highest percent of population for any country in a given year. The current Syrian crisis also effected a very large proportion of the population as of 2017 and seems to be continuing to rise in 2018 (As documented in the high level figures provided by the UNHCR for 2018)
World Map Visualisations
The final analysis I wanted to do was to pull in latitude and longitude coordinates for each country as this will allow for a geo-analytic based visualisation.

The first version seen above shows all refugee movement based on country of origin regardless of how small or large the volume is.

The second version seen above shows all refugee movement based on country of destination regardless of how small or large the volume is.
The above give us some visibility as to which countries take in refugees the most and how this has changed over time.
Conclusion
I initially wanted to discover how refugee volume has changed over the years, once the magnitude was clear, I wanted to create some easily digestible visualisations so that anyone could easily understand the figures.
It’s clear from the figures that the disasters that cause these refugee crisis’s displace and affect a large number of people every year. If you would like to find out how you can help visit –
https://www.unhcr.org/uk/get-involved.html
Inspectors Notes
Harib (Data Science) – The analysis and visualisation was done within R & Rstudio. The packages used are ggplot2 (leading data visualisation package), ggmap (GCP map API), ggnaimate (An excellent new package for animating ggplot outputs) and a number of common functions from the tidyverse.
Due to inconsistent data before 1970, this was excluded from the analysis. The fact that the data is so volatile with countries seeing significant change year on year made visualising the figures difficult, due to this I decided to fix the y axis rather than allow it to scale and prevented the numbers from changing every second (tweening) and rather only displayed the actual figures released by UNHCR.
Whilst some of the figures seem implausibly high when looking at the percentage of population I believe this can be explained as both the population and refugee figures have been manually checked and validated. A country can have a high % of refugees for multiple years, I believe this is due to the fact that Refugees are moved to multiple countries over a few years if the crisis is ongoing, this can mean that millions of refugees are moved in one year from country A to country B and the next year if the same refugees are moved from country B to country C, the data is recorded as refugees moving from country A to country C. This makes logical sense and can explain how some countries were consistently at a high percentage of their population for many years in a row.
Jordan (Data Engineering) – The refugee data for this project was taken from https://www.unhcr.org which was wrangled in Python using Pandas and Numpy. The data was subset with the ‘refugees (incl. refugee-like situations)’ filter applied, excluding all others.
The continents and country populations from each year were then added to each row where applicable, using the package fuzzywuzzy to create a lookup tables. The data for the populations was from https://data.worldbank.org
The new 2018 figures will be added to the data and visualisation when released.
GitHub
All scripts, data sources and outputs can be found at –
https://github.com/DataInspector/RefugeeAnalysis