Detailed Guide For Setting Up Jupyter On AWS EC2
Introduction
With the data industry’s ever-growing demand for cloud services and specialised environments, Data Scientists and Data Engineers need to be able to access these environments from a remote location. Jupyter is an IDE which is commonly used in Data Science and most DevOps Engineers should know how to securely set up a remote Jupyter environment for their team. I have been setting up The Data Inspector’s infrastructure for a while now and thought it might be useful for others to have this information readily available. When I started learning to build this environment, I found that some of the tutorials online were hard to follow or a bit outdated. Here is a step-by-step guide which I have compiled to help make the process as straightforward as possible.
What You Will Need
You will need to understand what it is we are trying to accomplish here, as well as some basic computing skills. To begin this set up you will need to create an Amazon Web Services account, if you haven’t got one already this is a simple set up and can be done with the Free Tier account to avoid cost. If you are working from a Windows PC like myself, you will also need to download an SSH program, I used Putty/PuttyGen. This is what will link your PC to the server you launch from AWS.
(AWS Free Tier Acc Link – https://aws.amazon.com/free/)
(PuTTY/PuTTYGen Link – https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html)
The Set Up
Once you have created your Free Tier account, head over to the EC2 section of the AWS directory, this is where you will launch the server which will host your Jupyter notebook.
(Instructions on setting up the Free Tier Ubuntu also explain the need to create a new Key Pair .PEM file; which will be distributed by Amazon on launching the instance).
Now that you have successfully launched an EC2 instance, you can connect to it from your PC using an SSH tunnel, this will give you the ability to program and make changes to the server from this Linux terminal.
(Mac SSH Instructions Coming Soon)
Putty (Windows)
- Download PuTTY and PuTTYgen from – here
- Now you will use PuTTYgen to convert .PEM file to .PPK file.
- Start puttygen and select “Load”
- Select your .PEM file.
- PuTTY will convert the .PEM format to .PPK format.
Select “Save Private Key” A passphrase is not required but can be used if additional security is required.
Time To SSH
Now, you should have an EC2 instance running on your AWS account, as well as a KeyPair saved on your PC which has been converted into a .PPK file.
Open the PuTTY program, copy the IPv4 address which is listed on your instance information on AWS website (pic) and paste this straight into the Host Name (or IP Address) field on PuTTY’s configuration page. Now that you have entered the Ipv4 address, scroll down PuTTY Category and expand through Connection, into SSH category; then select Auth (Authenticate) which is within the SSH category.
Browse your PC for the PPK file which you prepared in PuTTYGen previously.
Select Open and login with the username:
ubuntu
Well done, you have now successfully set up a Virtual Machine in the cloud and connected it to your PC. From here, you will need to install numerous packages in order to build up the environment which you will soon be accessible from a browser.
Install Anaconda
Let’s continue by installing Anaconda3, this will simplify the process of installing packages like Python, R and Jupyter Notebook. In your Linux terminal, use the ‘wget’ command to retrieve the latest Anaconda3 file from the internet. Be sure to use the latest version, at the time of writing this article, my command was as follows:
wget https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh
You will be prompted to confirm some terms conditions, hit Enter. Next, you can either hit Enter repeatedly to scroll down through the information to the next confirmation; or press the space bar once. Either way, make sure you do not hit Enter without typing the approving ‘yes’.
Now, your screen should look something like this.
Next, take the file name from the end of the link address which you used the ‘wget’ command with. Use the ‘bash’ command on this file, like this:
bash Anaconda3-2019.03-Linux-x86_64.sh
Type ‘yes’ and hit Enter where necessary, again being careful not to proceed without giving the approval. This will now take a few minutes to process, when it has finished; enter the following command to confirm that Anaconda is the preferred environment:
which python
You should receive an output which looks similar to this:
which python = /home/ubuntu/anaconda3/bin/python
Don’t worry if you did not get this confirmation, use the following command to adjust this:
source .bashrc
Now use the which python
command again and you should be shown the confirmation.
Create Your Jupyter Password
Now we will create a password for your Jupyter Notebook, we can do this using Python in the terminal, as follows:
ipython
from IPython.lib import passwd
passwd()
Now create a password and enter it again to verify. You will then be given a sha1 code, be sure to save this somewhere as you will need it again soon.
exit()
Generate Jupyter Config File
jupyter notebook --generate-config
Create SSL Certification
Now, we are going to create a directory to store our SSL certs:
mkdir certs
cd certs
Now, use the following line of code to create your SSL certificates, make sure you are using the latest OpenSSL version for Linux servers. At the time of writing this article, this is the most up to date method:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mycert.pem -out mycert.pem
We are now prompted to give some details for the certificate, just fill in the fields as requested.
Back To The Config
Going back to the Jupyter Configuration, change to its directory with:
cd ~/.jupyter/
Then, use this to open and edit the config file:
vi jupyter_notebook_config.py
When you enter the config file you might feel like you have stepped into another dimension, but keep your bearings and this will be over quickly. You can navigate the cursor using the arrow keys, press “I” to enter the insert/edit mode and paste the following block of code in the top of the document:
c = get_config()
# Kernel config
c.IPKernelApp.pylab = 'inline' # this enables plotting support in the notebook
# Notebook config
c.NotebookApp.certfile = u'/home/ubuntu/certs/mycert.pem' # stating your certificate file's location
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False # stops the ipython notebook from opening a browser by default
c.NotebookApp.password = u'sha1:225198744aa2:7bdb24d96fb69b4ee3fa11c75........' # here you should input the encrypted password we generated previously
# Set the port to 8888 or a port of your choice
c.NotebookApp.port = 8888
Now, after making sure you have adjusted the config file as above and replaced the sha1 code with the one you generated earlier when making your Jupyter Notebook password. Press Esc
to leave the Insert mode, then save it with a working quit by entering:
:wq
Launching The Notebook
Next, if you haven’t already, create a directory called notebooks and switch to that directory:
cd ~
mkdir notebooks
cd notebooks
Then, open a new screen in the terminal with:
screen
Use Shift-A N
to swap between the screens you have made in the terminal. Make sure you have two screens open so that you are able to change off of the screen which the notebook is running on, you can always stop the server with ctrl-C
Finally, we can launch the notebook:
jupyter notebook
Hopefully you are seeing green lights, now collect your IPv4 address from AWS and use it to search your Jupyter instance in a browser:
https://input-IPv4-Address:8888
Set the port as 8888 or whatever port you specified when you created your config file. Click the advanced settings and select proceed to website at the bottom of the page and enter your password.
Splendid!
You are now working from a remote Jupyter notebook in the cloud. I will update this tutorial soon to improve the security and provide details on how to customize your Jupyter workspace with themes and scripting tools/features.
Do let me know in the comments if this guide has helped you or if you think it could be improved, thanks for reading.