Archiv für die Kategorie R-Bloggers

R Plumber API in a Docker container? Of course, but security matters…

Erstellt am: Dienstag, 25. Juni 2019 von Martin Hanewald

This guide is for you, if:

  • You are a data scientist and want to quickly publish a training or scoring function to your peers with a plumber API.
  • Your IT demands, that connections should be encrypted and password protected.
  • You are familiar with docker containers.
  • You are lazy and do not want to dive in the technical details of dealing with nginx or another reverse proxy

The open-source deployment framework AHUB has undergone a major rework, making it now even easier to setup a secured R based API deployment in seconds. The only pre-requisite is, that you have built a docker container image with your plumber app listening on port 8000. If you need help doing so, please refer to this tutorial.

AHUB is a framework for deploying analytical applications inside docker containers using Docker swarm.

The framework aims at providing a unified approach to run scripts in any language (R, Python, etc…) while offering common services for most deployment scenarios:

  • a graphical user interface
  • access control (via Basic Authentication, Active Directory or Cloud Identity Providers)
  • process management and logging functionality
  • easy scalability

Docker swarm is able to run a collection of containers simultaneously such that they can communicate with each other over a shared virtual network. Docker swarm has a multitude of features which makes it a powerful tool even in large scale deployment. AHUB provides a pre-configured swarm setup to deploy containers (based on R or any other language) with minimal effort.

Get started….

Clone the AHUB repo (https://github.com/qunis/ahub) to a folder of your choice.

Generating user credentials

For using the simple configuration with HTTP Basic Authentication AHUB comes with a pre-generated password file (username ahub, password ilikebigwhales). But of course you want to create your own. All you need to do is run the following command in your cloned folder (please fill in your username and password). This will create a .htpasswd file containing the MD5 hashed credentials for your user.

docker run --mount type=bind,src=$pwd,dst=/var qunis/htpasswd <username> <password>

Configuring the stack

Docker swarm operates with a recipe, telling it which containers to spin up, which ports to publish, which volumes to mount, et cetera. Everything you would normally  configure in a single „docker run …“ statement for a singular container instance, we write down in the so called Compose file instead when working with docker swarm. For a more detailed introduction see here.

Please inspect the demo compose file in the main folder:

version: '3.3'
services:

# -------------------------------------------
# NODE STACK (add analytical modules here)
# -------------------------------------------
# For compatibility with AHUB, container images
# need to comply with the following:
# - publish a REST API on port 8000
# - provide a swagger.json file in the "/" path (if you want to use the GUI)
# -------------------------------------------

  node1: 
    image: qunis/ahub_rnode:2.0  # this is a demo container showcasing the R package ahubr

# -------------------------------------------

  node2:
    image: qunis/plumberdemo  # this is a demo container showcasing R plumber

# -------------------------------------------

  node3:
    image: qunis/prophetdemo  # this is a demo container showcasing Facebook's prophet

# -------------------------------------------
# SERVICE STACK (DO NOT TOUCH)
# -------------------------------------------

  boss:
    image: qunis/ahub_boss:2.0
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    configs:
      - source: main_config
        target: /app/config.yaml

# -------------------------------------------
# CONFIGS & SECRETS
# -------------------------------------------

configs:
  main_config:
    file: ./config.yaml

secrets:
  htpasswd:
    file: ./.htpasswd

The first block defines the node stack. Here you can add as many container images as you like or exchange the existing ones. For compatibility with AHUB it is only required that plumber (or any other API) publishes on port 8000 and provides the Swagger definition file (if you want to use the GUI functionality). The latter is achieved by running the plumber $run command with parameter swagger=TRUE.

Important: The analytical nodes do not have to be R based. A python node running a combination of flask/flasgger would be compatible as well.

The second block is the companion container for running AHUB. This container will take care of ramping up all the sidecar containers of the service stack (nginx as a reverse proxy, certbot for SSL certificates, a reactJS web GUI, etc…) and configures them accordingly.

The third block references the main configuration file for AHUB (see next section) and your previously generated .htpasswd file.

For now you can either leave the demo compose file as is or add/substitute your own container images in the node stack! Please note, that your container image needs to be published in a container registry. If you are using a private registry instead of docker hub, you need to login with your credentials first via „docker login…“ before proceeding with the next steps.

Configuring AHUB

Please inspect the second YAML file in the main folder:

# --------------------------------------------------------------------
# --------------------------------------------------------------------
# AHUB CONFIG FILE
# --------------------------------------------------------------------
# --------------------------------------------------------------------
VERSION: 2.0

# --------------------------------------------------------------------
# TLS / SSL (Encryption)
# --------------------------------------------------------------------
# Currently the following encryption methods are supported
#     self-signed: A self-signed certificate will be created by ahub.
#                  This leads to browser warning, which can be skipped.
#     letsencrypt: AHUB will automatically apply for a certificate from
#                  Let's Encrypt certificate authority. For this to work
#                  you need to deploy AHUB on a public machine and provide
#                  a fully qualified domain name (FQDN).
TLS_TYPE: self-signed

# -- optional -- (for TLS_TYPE: letsencrypt)
#TLS_HOST: myserver.cloud.com # the public domain name (FQDN)
#                               you want to apply the certificate for
#TLS_EMAIL: me@cloud.com # contact email-address

# --------------------------------------------------------------------
# Authentication
# --------------------------------------------------------------------
# Currently the following authentication methods are supported
#     none: Authentication disabled
#     basic: HTTP Basic Authentication with username and password
#     aad: Authentication via Azure Active Directory
# when choosing 'basic' you need to provide a file '.htpasswd' in the main folder.
AUTH_TYPE: basic

(...continues...)

This file contains the configuration for authentication and encryption options. For Basic Authentication with username and password (with your previously generated .htpasswd file) and a self-signed certificate you can leave this as is. Otherwise change the settings according to the description in the config file.

Launching the stack

Before we launch AHUB we need to prepare the docker daemon to run in swarm mode:

> docker swarm init
Swarm initialized: current node (XXX) is now a manager.

The whole stack can be launched by docker in swarm mode with the following command

docker stack deploy -c ./ahub.yaml mystack

This command references the Compose file ahub.yaml to deploy a stack called mystack. Of course you can change the name of your stack to your liking.You should see the following output on the shell:

Creating network mystack_default
Creating secret mystack_htpasswd
Creating config mystack_main_config
Creating service mystack_node2
Creating service mystack_node3
Creating service mystack_boss
Creating service mystack_node1

>

AHUB comes with an instance of Portainer, a very powerful browser-based container management tool. We can start checking if everything ramped up fine, by navigating to http://localhost/portainer/ (the trailing slash is important!!!).

As you are starting this cluster for the first time, you need to set an admin account and then choose the Local mode. After that you get to the Portainer main page, where you can click through the items Services, Containers and what else piques your interest. With Portainer you can do almost anything you can do from the docker command line interface.

Under the Services tab you should see 9 services if you stuck to the demo file. Three of them being the nodestack comprising of node1, node2 and node3. Everything is fine when you see a 1/1 behind each service.

Checking the API endpoints

You can now navigate to your API endpoints via https://localhost/<nodename>/<endpoint>?<parameters>. For example https://localhost/node2/plot or https://localhost/node3/?n=24. You will be warned by your browser about the insecure certificate (because we have self-signed it, skip this warning) and be asked for the user credentials you created earlier.

There is also a rudimentary GUI at https://localhost/gui/ (still under development) showing you the various nodes and their endpoints so you can manually trigger a GET request for testing purposes.

 

Troubleshooting

If something does not work as expected, you can always check the logs of the boss service. You can list all services with the command


docker service ls

Then copy the ID of the boss service and retrieve the logs with


docker service logs <boss id>

I hope you enjoy deploying with AHUB. Please give me a shout over Github if you experience any problems or have any suggestions. I would love to hear from your scenarios.

So you want to deploy multiple containers running different R models?

Erstellt am: Dienstag, 2. April 2019 von Martin Hanewald

This tutorial is the second part of a series on professional R deployment. Please find the previous part here (How to make a dockerized plumber API secure with SSL and Basic Authentication).

If you followed the first part in this tutorial series, you have achieved the following things:

  • running your R code with a plumber API inside a docker container
  • activating HTTP Basic authentication and SSL encryption via the NGINX reverse proxy

For a single container deployment the approach we have seen is absolutely feasible, but once you repeat this process you will notice the redundancy. Why should we install and configure NGINX for every single container instance we are running in our environment? The logical thing to do is to configure this common task of authentication and encryption only once for all deployed containers. This can be achieved by utilizing docker swarm (https://docs.docker.com/engine/swarm/).

Docker swarm is able to run a collection of containers simultaneously such that they can communicate with each other over a shared virtual network. Docker swarm has a multitude of features which makes it a powerful tool even in large scale deployment, but in this tutorial we will apply a minimal setup to provide secured access for two or more containers running plumber APIs. To achieve this we will make use of the AHUB deployment framework (AHUB on Github). AHUB provides a pre-configured swarm setup to deploy analytical containers (based on R or any other language) with minimal effort.

The setup we want to achieve is the following:

Multiple containers running plumber APIs will be accessed via a single NGINX instance. The principle of AHUB is to provide a service stack for common tasks like access control & encryption and providing a GUI for manual API execution. The node stack contains multiple containers, all dedicated to a different analytical task. In this example we will run three node stack containers:

  • qunis/ahub_rnode: A minimal R plumber configuration with the endpoints /thread and /batch showcasing the use of the ahubr package
  • qunis/plumberdemo: A minimal R plumber configuration with the endpoints /echo, /plot and /sum (see the basic example on https://www.rplumber.io/)
  • qunis/prophetdemo: A more elaborate container running a timeseries forecast with the fantastic prophet library and producing an interactive dyplot 

Let‘s start!

First you need to clone the content of the AHUB repository on Github to your local machine (AHUB on GitHub) and open a Bash or Powershell session in the cloned folder.

Generating certificates and user credentials

AHUB comes with a pre-generated certificate and password file. But of course you want to change these. This is very quickly done, with two little helper containers. All you need to do is navigate to the subfolder ./configs and run the following commands (please fill in your username and password). This will create a new SSL certificate and key along with a .htpasswd file containing the MD5 hashed credentials for your user in the subfolder ./configs.

docker run --mount type=bind,src=$pwd,dst=/var qunis/openssl
docker run --mount type=bind,src=$pwd,dst=/var qunis/htpasswd username password

Configuring the stack

Docker swarm operates with a recipe, telling it which containers to spin up, which ports to publish, which volumes to mount, et cetera. Everything you would normally  configure in a single „docker run …“ statement for a singular container instance, we instead write down in the so called Compose file when working with docker swarm. For a more detailed introduction see here.

Please inspect the demo file in the main folder


version: '3.3'
services:

# -------------------------------------------
# NODE STACK (add analytical modules here)
# -------------------------------------------
# For compatibility with AHUB, container images
# need to comply with the following:
#   - publish a REST API on port 8000
#   - provide a swagger.json file in the "/" path (only for GUI)
# -------------------------------------------

  node1:
    image: qunis/ahub_rnode:1.0

# -------------------------------------------
  
  node2:
    image: qunis/plumberdemo

# -------------------------------------------
  
  node3:
    image: qunis/prophetdemo
    
# -------------------------------------------
# SERVICE STACK (DO NOT TOUCH)
# -------------------------------------------

  nginx:
    image: nginx
    ports:
      - "80:80"
      - "443:443"
    configs:
      - source: nginx_template.conf
        target: /etc/nginx/nginx.conf
    secrets:
      - source: server.crt
        target: server.crt
      - source: server.key
        target: server.key
      - source: htpasswd
        target: .htpasswd
    deploy:
      placement:
        constraints: [node.role == manager]

...(continues)...

The first block defines the node stack. Here you can add as many container images as you like. For compatibility with AHUB it is only required that plumber (or any other API) publishes on port 8000 and provides the Swagger definition file (if you want to use the GUI functionality). The latter is achieved by running the plumber $run command with parameter swagger=TRUE.

IMPORTANT: If you want to add your own images, you need to make sure, that they are hosted in a container-registry, otherwise docker swarm will not be able to find them. Either you use the public Docker Hub or set up a private registry with one of the cloud providers (Google, Azure or AWS).

The analytical nodes do not have to be R based. A python node running a combination of flask/flasgger would be compatible as well.

The second block constitutes the service stack and does not need to be changed, if you stick to the basic scenario with self-signed certificates and basic authentication. Changes here need to be made if you want to use more elaborate functionality like auto-refreshing Let‘s Encrypt certificates or Active Directory Authentication. These use-cases will be covered in future tutorials.

For now you can either leave the demo file as is or add/substitute your own container images in the node stack! Note: There is no need to configure nginx when adding containers in the node stack. This is all taken care of by AHUB.

Ramping up the swarm

Before we launch AHUB we need to prepare the docker daemon to run in swarm mode:

> docker swarm init
Swarm initialized: current node (XXX) is now a manager.

Then the whole stack can be launched by docker in swarm mode with the following command

docker stack deploy -c ./ahub.yaml mystack

This command references the Compose file ahub.yaml to deploy a stack called „mystack“. Of course you can change the name of your stack to your liking.

You should see the following output on the shell:

> docker stack deploy -c ./ahub.yaml mystack

Creating network mystack_default
Creating secret mystack_server.key
Creating secret mystack_htpasswd
Creating secret mystack_server.crt
Creating config mystack_location_template.conf
Creating config mystack_nginx_template.conf
Creating config mystack_upstream_template.conf
Creating service mystack_portainer
Creating service mystack_node1
Creating service mystack_node2
Creating service mystack_node3
Creating service mystack_nginx
Creating service mystack_redis
Creating service mystack_boss
Creating service mystack_gui
Creating service mystack_updater
>

Here is a quick intro on the components of the service stack (the code and Dockerfiles for them are located in the subfolder ./modules):

  • boss: The central management service. Besides other things, handles the detection of the node stack and fits the nginx configuration to that.
  • nginx: Our reverse proxy handling authentication, encryption and endpoint routing
  • redis: NoSQL DB as a process repository: For this minimal setup of no real importance. Comes to play when using the ahubr package and activating the advanced logging functionality of AHUB. I will dive into that in a subsequent tutorial.
  • gui: Provides a very basic GUI for interaction with the API endpoints. You can manually set parameters and view the output of an API call (only JSON output currently supported).
  • updater: Time triggering of certain activities, like node stack discovery or certificate renewal. Will be covered in future tutorials as well.
  • and finally portainer

Portainer is a very powerful browser-based container management tool. We can start checking if everything ramped up fine, by navigating to http://localhost::9000 (Portainer listens on this port). As you are starting this cluster for the first time, you need to set an admin account and then choose the “Local“ mode. After that you get to the Portainer main page, where you can click through the items Services, Containers and what else piques your interest. With Portainer you can do almost anything you can do from the docker command line interface.

Under the Services tab you should see 9 services if you stuck to the demo file. Three of them being the nodestack comprising of node1, node2 and node3. Everything is fine when you see a 1/1 behind each service.

Checking the API endpoints

You can now navigate to your endpoints via https://localhost/{nodename}/{endpoint}?{parameters}. For example https://localhost/node2/plot or https://localhost/node3/?n=24. You will be warned by your browser about the insecure certificate (because we have self-signed it, skip this warning) and be asked for the user credentials.

There is also a rudimentary GUI at https://localhost (still under development) showing you the various nodes and their endpoints so you can manually trigger a GET request for testing purposes.

Finally if you want to tear down the whole stack, you can do so with the following command


docker stack rm mystack

Summary

We were able to ramp up a set of analytical containers providing RESTful APIs by just changing a few lines in a Docker compose file. The AHUB deployment framework takes care of common tasks like encryption, authentication and a graphical user interface to the endpoints. In the next installment of this tutorial series, I will show how to run a stack on a cloud-based virtual machine with public DNS address and retrieve a proper SSL certificate from Let‘s Encrypt.

I am still looking for contributors to the AHUB project, especially a frontend developer. So if you are keen on ReactJS, give me a shout.

Securing a dockerized plumber API with SSL and Basic Authentication

Erstellt am: Freitag, 15. März 2019 von Martin Hanewald

The use of docker containers by now is a well established technique to make the deployment of R scripts to a stable environment incredibly easy and reliable. In cases where you dockerize a shiny app or want to provide a REST API with plumber, often it is mandatory to somehow restrict the access to the resource in a corporate environment. This tutorial aims at showing you the simplest solution by running an NGINX reverse proxy alongside the R application. NGINX handles SSL encryption and a simple username/password authentication (HTTP Basic Authentication). There will be a follow-up tutorial showing a more elaborate approach, by separating these tasks in multiple containers in a Docker swarm, utilizing the AHUB open source framework (github.com/qunis/ahub).

The sample files for this tutorial can be found on github.com/MartinHanewald/rbloggerstutorial1.

This tutorial assumes, that you are already familiar with the concept of Docker and have at least once built an R based container with a Dockerfile. If not, please refer to the excellent posts from Colin Fay (www.r-bloggers.com/an-introduction-to-docker-for-r-users) and Oliver Guggenbühl (www.r-bloggers.com/running-your-r-script-in-docker).

The following diagram shows the setup we want to achieve:

The plumber API listens on port 8000, which we not make available to the outside of the container. Instead we install the very lean NGINX http server listening on port 80 and route all traffic through it. Both encryption and password authentication can be enabled for NGINX with minimal configuration effort.

Basic setup

As a basis for a plumber application we use the simple example from the plumber main page (https://www.rplumber.io/).

start_api.R


library(plumber)
r <- plumb("api_functions.R")
r$run(port=8000, host='0.0.0.0')

api_functions.R

# plumber.R

#* Echo back the input
#* @param msg The message to echo
#* @get /echo
function(msg=""){
    list(msg = paste0("The message is: '", msg, "'"))
}

#* Plot a histogram
#* @png
#* @get /plot
function(){
    rand <- rnorm(100)
    hist(rand)
}

#* Return the sum of two numbers
#* @param a The first number to add
#* @param b The second number to add
#* @post /sum
function(a, b){
    as.numeric(a) + as.numeric(b)
}

When preparing an unsecured container we would use the following Dockerfile.

Dockerfile

FROM rocker/r-base

# install R packages
RUN install2.r \
plumber

EXPOSE 8000

ADD . /app
WORKDIR /app

CMD R -e "source('start_api.R')"

Building and running it with the following commands in bash or powershell:

docker build -t plumber_insecure .
docker run --rm -p 8000:8000 plumber_insecure

Now we can access the resource by browsing to http://localhost:8000/plot

Adding authentication

As a first step toward securing our API we want to install NGINX and route all traffic through it. For this we change our Dockerfile as follows.

Dockerfile


FROM rocker/r-base

# install R packages
RUN install2.r \
plumber

# setup nginx
RUN apt-get update && \
apt-get install -y nginx apache2-utils && \
htpasswd -bc /etc/nginx/.htpasswd test test

ADD ./nginx.conf /etc/nginx/nginx.conf

EXPOSE 80

ADD . /app
WORKDIR /app

CMD service nginx start && R -e "source('start_api.R')"

Note the following changes:

  • We added some apt-get commands for installing nginx and its dependencies
  • We created a password file with the username test and password test. Of course you should change these for your application. You can also add as many user credentials as you like by just repeating this line with different values.
  • We copy a configuration file nginx.conf to the appropriate location
  • We publish the Port 80 instead of 8000. Plumber therefore is not reachable directly anymore from outside the container.
  • We changed the CMD to first start the nginx service before running the R script

Now we need to prepare the configuration file nginx.conf, telling the proxy to listen on port 80 and that it should relay all requests to localhost:8000.

nginx.conf

user www-data;
worker_processes auto;

events {
  worker_connections 1024;
}

http {
  server {
    listen 80;
    auth_basic "Username and Password are required";
    auth_basic_user_file /etc/nginx/.htpasswd;
    
    location / {
      proxy_pass http://127.0.0.1:8000/;
    }
  } 
}

Save this file as nginx.conf in the main folder. Your directory should now contain the following files:


start_api.R

api_functions.R

Dockerfile

nginx.conf

Now we build the container again and run it while mapping port 80.

docker build plumber_auth .
docker run --rm -p 80:80 plumber_auth

The authentication popup should appear when navigating to http://localhost/plot (depending on your browser, this might look different).

After entering the credentials you are given access to the resource.

Adding SSL encryption

Since sending your credentials over an unencrypted connection is not very secure, we need to follow with the next step: activating SSL.

We change the Dockerfile as follows:

Dockerfile

FROM rocker/r-base

# install R packages
RUN install2.r \
   plumber

# setup nginx
RUN apt-get update && \
	apt-get install -y nginx apache2-utils && \
	htpasswd -bc /etc/nginx/.htpasswd test test 
    
RUN openssl req -batch -x509 -nodes -days 365 -newkey rsa:2048 \
		-keyout /etc/ssl/private/server.key \
		-out /etc/ssl/private/server.crt

ADD ./nginx.conf /etc/nginx/nginx.conf

EXPOSE 80 443

ADD . /app
WORKDIR /app

CMD service nginx start && R -e "source('start_api.R')"

Note the changes:

  • We added a command to create a self-signed certificate and key and store both files in the folder /etc/ssl/private
  • We additionally expose the port 443, which is the default HTTPS port

Finally an update to the nginx.conf:

nginx.conf

user www-data;
worker_processes auto;

events {
  worker_connections 1024;
}

http {
    server {
        listen 80;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl;
        ssl_certificate     /etc/ssl/private/server.crt;
        ssl_certificate_key /etc/ssl/private/server.key;

        auth_basic "Username and Password are required";
        auth_basic_user_file /etc/nginx/.htpasswd;
    
        location / {
            proxy_pass http://127.0.0.1:8000/;
        }
  } 
}

Here we have created two servers:

  • The first listens on port 80 and redirects all traffic to https://… on port 443
  • The second listens on the SSL port 443 and points to the key and certificate files we have created in the Dockerfile.

Now we build and run the container again. This time we need to publish the SSL port as well:

docker build -t plumber_auth_ssl .
docker run --rm -p 80:80 -p 443:443 plumber_auth_ssl

When navigating to http://localhost/plot we directly get transferred to https://localhost/plot and a browser warning should appear. This is because we signed the certificate ourselves and not issued an official certificate from a certificate authority. But the connection is encrypted eitherway and we can skip this warning, since we can trust ourselves.

Tadaa! Now we have secured our resource with encryption and password protection with minimal additional configuration.

On the next tutorial in this series, I will show how to enable security for multiple containers in a container swarm scenario.