• Ben Butchart's picture

    Installing JupyterHub on Amazon AWS

    Ben Butchart / October 7, 2015
  • Recently, I've been evaluating JupyterHub - an open source multiuser platform for hosting Jupyter Notebooks.

    Quoting the Jupyter.org strapline "The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text with rich text and media."

    The way I like to explain it is that its a bit like a wiki - you can create documents with markdown and publish them to the web - but cool thing about these documents is that you can actually execute code snippets.

    And even better - you have up to 40 languages to choose from including Python, Jupyter, R, Julia. This makes Jupyter a perfect platform for collaborative data science and scientific computing.

    The documentation on the JupyterHub Git/Hub is pretty good, but there are a couple of gotchas if you're like me and just getting started.

    I wanted to install JupyterHub on Amazon AWS, so I can experiemnt later with scaling features such as AWS EC2 Container Service.

    So here are my notes on how to install Jupyterhub on bare bones AWS EC2 instance.

    1. Launch a new EC2 instance, choosing the default Amazon Linux AMI

     

    Note to self: if you are using a different AMI which is CENTOS based (not the Amazon default AMI), and you are trying to proxy Jupyterhub behind Apache with CoSign authentication, you might come across an issue where Apache cannot access the private key. You will see the following message in

    /var/log/audit/audit.log
    type=AVC msg=audit(1460559998.163:4977): avc:  denied  { read } for  pid=25300 comm="httpd" name="my-private-key.pem" dev="xvda1" ino=17345260 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file 

    and following error starting Apache:

    sudo systemctl status  httpd.service
    httpd[26243]: AH00526: Syntax error on line 50 of /etc/httpd/conf.d/jupyterhub.conf:

     

    This is due to restrictions imposed by SELinux. You can either disable SELinux in /etc/sysconfig/selinux, or apply following policy:

    semanage fcontext -a -t httpd_sys_content_t "/var/cosign/certs(/.*)?"
    semanage fcontext -a -t httpd_sys_rw_content_t "/var/cosign/filter(/.*)?"
    restorecon -Rv /var/cosign
    setsebool httpd_can_network_connect=1
    

     

    2. ssh into the new ec2 instance and open a root shell, and change to root directory

    sudo -s
    cd

     3.  install python 3 package (the AWS ami comes with pthon 2 - but Jupyter requires python 3)

    yum install python34-devel.x86_64

    4. install pip3 - note enabling of epel repo

    yum install  python34-pip.noarch --enablerepo=epel

    5. install nodejs and npm packages

    yum install nodejs npm --enablerepo=epel

    6. install jupyterhub with pip (make sure to sue pip3)

    pip-3.4 install jupyterhub

    7. install configurable-http-proxy

    npm install -g configurable-http-proxy

    8. and to run notebooks we have to install ipython notebooks too:

     pip-3.4 install "ipython[notebook]"

     

    9. include /usr/local/bin in roots PATH environment 

    PATH=$PATH:/usr/local/bin

    10. Run jupyterHub in on http 8000

    jupyterhub

    you should see something like this

    jupyterhub
    [I 2015-10-07 16:24:55.476 JupyterHub app:518] Loading cookie_secret from /root/jupyterhub_cookie_secret
    [W 2015-10-07 16:24:55.503 JupyterHub app:257] 
        Generating CONFIGPROXY_AUTH_TOKEN. Restarting the Hub will require restarting the proxy.
        Set CONFIGPROXY_AUTH_TOKEN env or JupyterHub.proxy_auth_token config to avoid this message.
        
    [I 2015-10-07 16:24:55.509 JupyterHub app:654] Not using whitelist. Any authenticated user will be allowed.
    [I 2015-10-07 16:24:55.518 JupyterHub app:1032] Hub API listening on http://localhost:8081/hub/
    [I 2015-10-07 16:24:55.522 JupyterHub app:789] Starting proxy @ http://*:8000/
    16:24:55.628 - info: [ConfigProxy] Proxying http://*:8000 to http://localhost:8081
    16:24:55.631 - info: [ConfigProxy] Proxy API at http://localhost:8001/api/routes
    [I 2015-10-07 16:24:55.726 JupyterHub app:1055] JupyterHub is now running at http://localhost:8000/
    ^C
    Interrupted
    [I 2015-10-07 16:25:40.453 JupyterHub app:924] Cleaning up single-user servers...
    [I 2015-10-07 16:25:40.455 JupyterHub app:935] Cleaning up proxy[2742]...
    [I 2015-10-07 16:25:40.455 JupyterHub app:961] ...done

    11. in the AWS Console you now need to open port 8000

    12. now you should be able to get the login page on http://ec2-xx-xx-xx-x.eu-west-1.compute.amazonaws.com:8000/hub/

    13. Now create a test system user

    useradd testuser
    passwd testuser

     

    14. Now log in using testuser credentials

    Note there are a couple of security issues we still need to sort out:

    1. we want to use https instead of http

    2. we don't really want to run the web server as root. I'll try to cover these problem in a subsequent blog post.