• Ian Stuart's picture

    Sweetness when developing, lesson number 427

    Ian Stuart / December 6, 2018
  • The Senario

    You are developing a plugin to an extension for a product, and you need to test functionality as you write.

    The Practical Example

    Let us assume we are developing an external exchange serivice for the NBGrader Extension (https://github.com/jupyter/nbgrader/) for Jupyter Notebooks (https://github.com/jupyter/notebook), running in a JupyterHub environment (https://github.com/jupyter/notebook)

    Let us further assume that our work will require changes to NBGrader as well as our own code, and that we are developing to a private Git repository until we are ready to release.

    What we know we will need

    • We know we will need a dummy Jupyterhub environment to be able to launch Notebooks with the NBGrader extension enabled
    • We will need a copy of NBGrader, because we know we need to make changes to enable our plugin
    • We will need code for our plugin.

    These are three separate, and independent, repositories.

    NBGrader & our NBExchange service are python modules, installed using pip

    The Final Service Solution

    I find it good to know where I want to end up... and this is generally a pip install from the source repo - so something like:

        FROM jupyterhub/jupyterhub:0.9.2
        ### stuff
        RUN pip install -e git+https://gitlab.example.com/mygroup/nbexchange
        RUN pip install -e git+https://gitlab.example.com/mygroup/nbgrader
        ### more stuff
        RUN jupyter nbextension enable --sys-prefix formgrader/main --section=tree
        RUN jupyter serverextension enable --sys-prefix nbgrader.server_extensions.formgrader
        ### even more stuff
    

    Build & deploy docker container.... and Robert is, as they say, your fathers brother

    The Naive Solution

    The naive solution is to simply use this final solution, maybe with Deploy Tokens & branch-specific pull-points and work that way.

    The downside is that you need to push all the code you want to test to Git, and that means a plethora of commit messages.... and that is both time-consuming, and leaves a horrible mess in commit history (every typo, every experiment, every failure, ...)

    The Tree Solution

    What you really want to do is do pip installs from local directories:

        FROM jupyterhub/jupyterhub:0.9.2
        ### stuff
        COPY nbexchange /tmp/nbexchange
        RUN pip install -e /tmp/nbexchange
        COPY nbgrader /tmp/nbgrader
        RUN pip install -e /tmp/nbgrader
        ### more stuff
        RUN jupyter nbextension enable --sys-prefix formgrader/main --section=tree
        RUN jupyter serverextension enable --sys-prefix nbgrader.server_extensions.formgrader
        ### even more stuff
    

    This is probably the best way to actually develop: you can code & test & build & recode & re-build until you are ready to commit a known-good piece of work [by whatever granularity you & your team choose to work in]

    The problem here is that nbexchange/* and nbgrader/* are subdirectories of dummy Jupyterhub, which means you need to add them to .gitignore (which is in Git)... and you now have different progect repositories spread over different levels of your filesystem, making finding them trickier

    The Copy Solution

    The step on from The Tree Solution is to have the working code & the git repo parallel to the rest of the project repos, and copy the files across to the dummy Jupyterhub tree for actual test-builds & trials

    This gives you a build command something like:

        cp -r ../nbgrader/* nbgrader/ && cp -r ../nbexchange/* nbexchange/ && docker build -t dummy-jupyterhub .
    

    To me, this gives the best solution:

    • All of the project repos in the group are parallel, making the directory organisation sensible
    • The build & test framework happens without any need to mess up the commit-log
    • It is a clean & understandable solution

    The Rsync Refinement

    This refinement came from a colleague: the downside to a copy is you're copying every file - and when projects get big, that's a lot of work. rsync only copies over changes

        rsync -r ../nbgrader/* nbgrader/ && rsync -r ../nbexchange/* nbexchange/ && docker build -t dummy-jupyterhub .
    

    The other niggle is that the copy could well be taking across a bunch of virtual-environment files installed for individual project testing code. rsync will allow you to exclude a bunch of stuff not needed in the Docker image

        rsync -rq --exclude='venv tests __pycache__' ../nbgrader/* nbgrader/ && rsync -rq --exclude='venv tests __pycache__' ../nbexchange/* nbexchange/ && docker build -t dummy-jupyterhub .
    

    footnote

    OK, so I just made the 427 number up - it just indicates there are a lot of lessons we can learn from others