Linear optimization with or-tools: containerizing a gunicorn web application

Previously, we left our app working with our local python+gunicorn+nginx installation. In order to get there we had to do quite a bit of configuration and if we wanted to deploy this in a server or send it to a friend, we would have to go through a very error-prone process subject to version changes and missing libraries. A potential nightmare if we contemplate switching from one operating system to another. Is there a way in which we could combine our code and configuration in a single easy to deploy multi-platform package?

Get the code here

One solution for this is to create a single Docker container that, when run, will create the environment and deploy our code in a controlled environment.

In the Docker hub you will find thousands of preconfigured containers. The best way to start is to find the closest one that would suit us and customize it. That way you avoid laying the ground work and just focus on the specifics of your application.

I tend to trust the containers built by larger vendors, organizations or open-source projects, because I find that they usually keep their containers up to date and —most importantly— they are heavily battle-tested in dev and production.

In this case, I chose a gunicorn container created by the Texas Tribune. To start, you download and install Docker, and then download your chosen container to your machine.

The way to customize a Docker container is to edit the Dockerfile. There you will specify commands to install, copy or run files specific to your project. In our case, I added an installation of python-dev, falcon and the google or-tools:

#install whats neccessary for or-tools
RUN pip install --upgrade pip
RUN pip install --upgrade wheel setuptools virtualenv
RUN apt-get -y install flex bison
RUN apt-get -y --fix-missing install autoconf libtool zlib1g-dev texinfo help2man gawk g++ curl texlive cmake subversion

#install gunicorn and falcon for providing API endpoints
RUN pip install gunicorn==19.6
RUN pip install falcon

#install or-tools
#https://github.com/google/or-tools/releases/download/v5.0/or-tools_python_examples_v5.0.3919.tar.gz
ADD or-tools_python_examples_v5.0.3919.tar.gz /app
RUN mv /app/ortools* /app/ortools && cd /app/ortools/ && python setup.py install --user

 

Then I created separate configuration files for gunicorn and nginx, and a couple of supervisor configurations. Supervisor will restart the services in case one of them goes down, which might happen if I introduce an unrecoverable error in the python script:

#copy configuration files
ADD gunicorn_conf.py /app/
ADD gunicorn.supervisor.conf /etc/supervisor/conf.d/
ADD nginx.conf /app/
ADD nginx.supervisor.conf /etc/supervisor/conf.d/


After the initial configuration, we build using the docker build command:

docker build --no-cache -t danielpradilla/or-tools .

And then, we run the container as a daemon:

docker run -p 5000:80 -d --name or-tools-gunicorn danielpradilla/or-tools

The web server port is specified as a parameter. This maps port 5000 in localhost to port 80 in the container.

Now, time to install our code. You can copy your code to the Docker container, but what I prefer is to have my code in a local folder in my machine, outside##italics of the Docker container. That way, I don’t need to copy the code to the container every time I change it, and I keep a single unmistakable copy of the code.

To do this, you mount the local folder as an extra folder inside the container. Change the Dockerfile and add

VOLUME ["/app/logs","/app/www"]

And then when, you run the container, you specify the location of your local folder:

docker run -v :/app/www -p 5000:80 -d --name or-tools-gunicorn danielpradilla/or-tools

This will allow you to experiment with multiple versions of the code (production and development) with a simple parameter change. You can run two docker containers pointing to different folders and opening different ports, and then compare the results side by side!

 

Get the code here

10 things I learned while setting up five masternodes

Photo by Denys Nevozhai on Unsplash

Over the past few weeks, I’ve been experimenting with masternodes as alternatives/replacements to traditional crypto mining rigs. Like with many other crypto-related things, I was surprised to find such a huge community and wealth of options. It’s akin to opening a window into another world.

What interests me the most is to learn to what extent Proof of Stake has the potential to replace Proof of Work, and the best way to learn —apart from formal reading— is to set up your own.

Masternodes deliver on the promise of you being an enabler of a decentralized network of value exchange by locking or “staking” a fixed amount of coins in exchange for the privilege to transmit or verify transactions. Basically, you buy a fixed amount of coin, say 1000, and lock them in a masternode.

I picked 5 projects at different price points: ALQO (XLQ), Ellerium Project (ELP), Rampant (RCO), High Temperature Coin (HTRC), and Madcoin (MDC). Just by the names, it sounded like a bad idea, but I cannot afford and will ever afford dumping $400K into a Dash masternode. Also, these… um, “coins”, offered the promise of a high risk/reward investment and the always underestimated chance of learning something by making a fool out of myself.

I had low expectations, I wanted some education and the possibility for the experiment to pay for itself with the rewards from HODLing the coins.

So, what did I learn?

1 You can set up a masternode anywhere, but it’s best if you get a VPS

The masternode can be any machine connected to the internet, but you need a fixed IP address. Exposing your home network to attackers is a bad idea, so the standard procedure is to get a VPS from a hosting provider and set up a masternode there.
I got a VPS from oOVH, just because they had an offer for a year-long plan of 2GB/10GB at €2.5/month;

 

2 It’s a scammers free-for-all

In an industry already filled with pyramid schemes, masternodes offer scammers an almost-frictionless way of stealing our money. See this article for a lengthy description of the different scamming methods.

 

3 It’s all —almost— the same code base

This one was quite surprising. All the clients I tested come from the same origin. I believe it’s either the Bitcoin or the Dash client (haven’t checked), they have all the same names for their command line tools and the same options.

However, I noticed some code smells: the clients for High Temperature Coin and Madcoin consist on a single application to run the daemon and query the status of the masternode, whereas ALQO uses the more sensible alqod as a daemon and alqo-cli for client-related queries.

I guess this makes it even easier to swindle a couple of hundred people.

 

4 Cheaper coins are harder to set up

You want the easiest setup procedure? There’s a markup for that. The best developers/marketers flock to the most popular projects. They are better debuggers, keen on following up on errors and setting up good documentation.

ALQO, the “premium” coin in this case, has flawless setup procedures. They also offer a monitored VPS themselves for $9.99 with minimal setup effort, a clever move by the team, given the hefty markup they charge. But on the other hand also worth it, if you don’t want to invest a few hours tinkering with settings and another chunk of your time monitoring if the masternode is still up.

My life saver was Nodemaster, an excellent tool that allows you to install around 60 different masternodes by just running a script.

 

5 You can set up more than one masternode per server

It kinda defeats the purpose of a supposedly decentralized network, but some masternode coins allow you to start up more than one daemon per machine, if you configure the ports correctly and you have extra IP addresses. As long as you don’t use the same IP and ports you can start as much daemons as your memory allows. Each of the daemons consume around 250 to 400MB.

This is a cheap way to hedge your bets: get onto several cheap-ish coins, find a high-memory, vps and load it up as much as you can.

 

6 I found a use case for Discord

No amount of customization will make me choose Slack over the traceability of a 15-year-old email inbox. But I found that almost all of these coins use Discord for their community engagement and support and, turns out, it works extremely well. I was able to get responses to my queries within minutes without the noise that Twitter brings. It works just like IRC did 20 years ago 😉

 

7 Decentralized exchanges offer the future now

Most of these coins need to be bought at decentralized exchanges. Learning how these exchanges work was worth all the trouble. They are one of the best representations of how we can become fully independent of banks and clearinghouses… or maybe we’ll never get there, but decentralized exchanges sure are extremely efficient and automated intermediaries.

 

8 It works!

I was amazed when I received my first reward. Mere cents, but satisfying nevertheless because it is, essentially, free money (after costs)

 

If you want to get into this, I have two recommendations… and this is NOT investment advice:

9 Check your expectations about how long-term can this be

If you are doing it for learning purposes, don’t overthink it. But if you’re planning medium term (months) or more, you need an exit strategy. Setting up a masternode might take you anywhere from 8 hours to 30 minutes, depending on the transaction time, network speed, who’s your VPS provider, and how good are the coin developers. Make it worth your while. The majority of the masternodes I’ve seen are short-term scams looking to make a million or two. You have to ask yourself when and how you are going to shut down the masternode and stick to that plan. Don’t be the last dummy holding, keep checking the volume of the exchanges where the coin are available.

 

10 How to pick up the right coin

Check the coin’s Discord or Telegram channel. Look for signs of trouble in the support area and look at how lively the community is. Do the team members write in a language they understand? Do they write at all? Spam and shitposting are signs of a badly-maintained community. The developers might be in the Bahamas by now.

Check other social proof: how many followers they have on Twitter, are they real or purchased followers? Do they seem to know what they are talking about? How many committers the project has on GitHub. A not-so-surprising majority of these projects have only one committer in Github. Either he has an earth-shattering idea or he’s in for a quick win.
Also-important-but-weirdly-enough-not-so-really: Does this coin has a purpose? Is it filling a real-world need?

In masternodes.online you will find a listing of a lot —if not all— of these coins. One of the elements of this list is the ROI (Return On Investment). Don’t fall for it. ROI can be made into whatever they want with proper monetary policy and price manipulation, especially in “young” coins with a few masternodes. Check the daily volume (the total value of daily operations) and the total market capitalization, take the two numbers and divide them, take the masternode worth (amount of US$ in coin you need to stake) and find a combination you like for the three numbers. Open Excel, do your own research.

The volume tells you the most brutal of truths: the price might be attractive, but if you cannot get your money out, it’s worthless.

Again, this is NOT investment advice. In fact, I guarantee that you will gain experience and lose whatever you invest! That should be your default expectation.

 

Pentaho is slow for servers with too many home directories

Over the course of two years, browsing solutions on our Pentaho 5.4 server became progressively slow. It came to a point in which you had to wait 2-3 minutes to see the list of solutions in the Pentaho User Console.

The catalina log didn’t say much and we didn’t have too many solutions (around 200), so I though it was perhaps a database bottleneck. It all came to a screeching halt on a Friday afternoon (as usual) when, after a restart, the Pentaho Console simply stopped responding.

I turned all logging on and found that Pentaho was complaining about a lot of invalid users. Googling around I found that 5.4 performs user permissions tests on first login, calling UserDetailService for each home directory owner in the Home directory. Examining logs, we had over 4000 folders in there, accumulated from two authentication scheme changes. I could not even open the Home folder in the user console.

Pentaho versions 6.1 and over have a config flag to skip this user verification. It’s called skipUserVerificationOnPrincipalCreation, inside pentaho-solutions/system/jackrabbit/security.properties

More info at Jackrabbit Repository Performance Tuning

 

All fine and dandy, but what to do with a Pentaho 5.4 server. Or even, how to fix this after your PUC becomes unresponsive?

I thought that the Pentaho REST API might help and, sure enough, we can delete folders with it. In our case, our users don’t save anything in their home folders, so all we needed to do was to delete these 4000+ folders.

This is a nuclear option, so don’t run this unless you know what you are doing. If your users have solutions saved in their home folders, you need to amplify the following script to check for that and back up the solution and/or avoid deletion.

Open up any browser javascript console, replace <server_url> by your pentaho url and run:

$.getJSON("http://<server_url>:8080/pentaho/api/repo/files/:home/children", function(data){
    $.each(data, function(i, nodes){
        nodes.forEach(function(node){
            console.log(node.path,node.id);
            jQuery.ajax({
                async:false,
                type: "PUT",
                url: "http://<server_url>:8080/pentaho/api/repo/files/deletepermanent",
                data: node.id
            });
        })
    })
});

You can fork the gist for this at https://gist.github.com/danielpradilla/72a603a5d0de71771e0b5836bde05479

 

 

 

Linear Optimization with or-tools — building a web front-end with falcon and gunicorn

In a previous post, I put together a script for solving a linear optimisation problem using Google’s OR-tools. This python script is callable from the command line and you kinda need to know what you are doing and how to organize the parameters.

So, in order to address this difficulty, I wanted to build a web front-end for this script, so that regular humans can interact with it.

We can generalize this problem as how to build a web interface for a python script.

First of all, we can split the concerns. What we need is a web page that serves as a user interface, and a web API connecting the webpage to the python backend.

Ideally, this API should be as light as possible. After all, the heavy lifting is going to be performed by the backend. Using Django would be easy but also overkill. I was looking for one of these microframeworks, you know, like Flask. And that’s how I got to Falcon.

Falcon is like 10 times faster than Flask, it is as bare bones as you can get insomuch as you need to bring your own WSGI server, like gunicorn (but you can use Waitress in Windows, or uWSGI).

 

TL;DR

Get the code here

 

1 Installing the dependencies

pip install falcon cython gunicorn

2 Creating a JSON output for the script

My plan was to use JSON to exchange data between the API and the webpage. So I needed a JSON response builder. I could add this functionality to the previously created python script, but I prefer to have it in a separate file which basically calls the main method of the solver script and returns a JSON payload.

Source code

import json
import interview_grocery_startup as igs


def get_json_response(cfg, what):
    results = igs.main(cfg, what)

    solver = results['solver']


    if results['result_status'] == solver.OPTIMAL:
        response = {'result_status': 'optimal answer'}

        variable_list = results['variable_list']

        print solver.wall_time()
        print solver.Objective().Value()

        response['wall_time'] = solver.wall_time()
        response['objective_value']= solver.Objective().Value()

        response['variables'] = dict()

        response['variables_sum']=0
        for variable in variable_list:
            response['variables'][variable.name()]= variable.solution_value()
            response['variables_sum']+=variable.solution_value()

    elif results['result_status'] == solver.INFEASIBLE:
           response = {'result_status': 'infeasible'}
      elif results['result_status'] == solver.POSSIBLE_OVERFLOW:
        response = {'result_status': 'overflow'}

    json_response = json.dumps(response, sort_keys=True)

    return json_response

def main(cfg, what):
    json_response = get_json_response(cfg, what)
    print(json_response)
    return json_response

 

3 Coding the API

The principle for creating an API in Falcon is very easy to understand: you define a class and then instantiate it and link it to a route. This ends up being very convenient and clear. You can define the routes and the methods as you please.

Source code

import falcon
import json
import interview_grocery_startup_json as igsj 

class InterviewGroceryStartupResource(object):
    def on_get(self, req, resp):
        #Handles GET requests
        
        resp.status = falcon.HTTP_200  # This is the default status

        resp.body =  igsj.main(cfg,cfg['what'])

    def on_post(self, req, resp):
        try:
            body = req.stream.read()
            body_json = json.loads(body.decode('utf-8'))
            cfg = body_json["cfg"]
        except KeyError:
            raise falcon.HTTPBadRequest(
            'Missing Config',
            'A config (cfg) must be submitted in the request body.')

        resp.status = falcon.HTTP_200
        resp.body = igsj.main(cfg,cfg['what'])

# falcon.API instances are callable WSGI apps
app = application = falcon.API()

# Resources are represented by long-lived class instances
igsapi = InterviewGroceryStartupResource()

# ApiTestResource will handle all requests to the '/apitest' URL path
app.add_route('/igsapi', igsapi)

As you may see in the class, I added two methods. on_get is not doing much, the interesting one is on_post. On each post to the specified route, the scripts decodes de body, extracts a JSON object, looks for the property cfg (config) and sends that to the JSON response builder.

(Yes, this means that whenever you POST, you need to send a JSON object in the body with a “cfg” attribute that looks more or less like this:

{
                    "cfg": {"what": "cost",
                            "maxWeight": 10,
                            "maxCost": 100,
                            "minCals": 14000,
                            "minShop": 0.25,
                            "total_cost": 0,
                            "food":  [["ham",650, 4],
                                        ["lettuce",70,1.5],
                                        ["cheese",1670,5],
                                        ["tuna",830,20],
                                        ["bread",1300,1.20]]
                        }
}

If you are running the API in a different machine than the one serving the webpage, you may have trouble with the same origin policy. In order to address this, you can enable cross-origin resource sharing, CORS.

Add the following class to the code above

ALLOWED_ORIGINS = ['http://localhost';]

class CorsMiddleware(object):
    def process_request(self, request, response):
        origin = request.get_header('Origin')
        if origin is not None and origin in ALLOWED_ORIGINS:
            response.set_header('Access-Control-Allow-Origin', origin)
        response.set_header('Access-Control-Allow-Origin', '*')

And call this class during the API instantiation:

app = application = falcon.API(middleware=[CorsMiddleware()])

Pluggable magic!

You may run this API in port 18000 (or the one you please) by calling gunicorn:

gunicorn interview_grocery_startup_api -b :18000 --reload

Check the gunicorn docs for more options

The Falcon documentation is here.

 

4 Testing the API

I use the wonderful Postman for testing all the APIs that I make

 

5 Coding the interface

The web interface can easily be an HTML+JS+CSS thingie. I tried to keep it simple creating a single page with 3 parts: a formulation, a data table and a fat green button to get the results.

From the functional perspective, the only thing you need to do is to perform POSTs to the /gunicorn/igsapi endpoint defined in the API script, and then process the response.

Here you can see the javascript file that does everything.

I keep a variable (igs.payload) constantly updated with the JSON payload I’m going to send. And then I just POST whenever I please:

    jQuery.ajax({
      type: "POST",
      url: "/gunicorn/igsapi",
      dataType: "json",
      data: JSON.stringify(igs.payload),

      error:function (xhr, ajaxOptions, thrownError){
          console.log(thrownError);
      },

      success:function(data, textStatus, jqXHR){
        console.log(data);
        igs.fillAnswerTable(data);
      }
    })

The result is sent to a very dirty fillAnswerTable function which builds and re-builds the HTML code for the solution table.

For the UI look and feel I used Semantic UI, my current favorite Bootstrap alternative.

I’m also using Bret Victor’s Tangle library to provide direct manipulation of the variables. Each manipulation fires an event that updates the igs.payload variable.

 

Next steps

We got our little web app, but it has so many moving pieces. Wouldn’t it be nice to package it in a way that it always works? This will be the subject of a future post.

Containerizing the solution with docker

 

Linear Optimization with or-tools

 

Getting started

Over the last couple of months I’ve been getting my feet wet with linear programming and mathematical optimisation. I got a sense of how it all worked from this Discrete Optimisation course in Coursera and googling around I discovered that there are a ton of tools out there to help you solve optimisation problems. Makes sense, why would you want to implement a solving algorithm from scratch when some of the best minds in the history of mankind have already given it a shot?

Solvers, as these tools are often called, can reach the hundreds of thousands of dollars… and they are worth every penny! But since I wanted to play with one without forking up the dough, I narrowed my search down to open-source options.

Hans Mittelmann from the Arizona State University performs regular automated benchmarks on different mathematical optimisation tools. He publishes the benchmarks at http://plato.asu.edu/bench.html. Following what I read in the results, I picked Google Optimization Tools (OR-Tools) because it performed fairly well and… well, because if you’re looking for a sinister tool to model and solve hard problems in the human world, you can rarely go wrong with Google.

GLOP has C++ and Python APIs. I’m better at Python and I expected to quickly put together a web front-end for this, so I picked the latter one.

TL;DR. Just gimme the code

You can get the full code here.

 

The Problem

I picked a variation of the knapsack problem: a grocery-shopping example in which you try to maximise the number of calories you can buy with a limited budget. I got the problem from this blog post:

http://www.jasq.org/just-another-scala-quant/new-agey-interviews-at-the-grocery-startup

Basically: You walk into a grocery store with a grocery bag and some cash, to buy groceries for a week. You need to follow these rules:

1. Your bag can hold ten pounds.
2. You have $100
3. You need about 2000 calories a day, so a weekly shopping trip is about 14,000 calories.
4. You must purchase at least 4 ounces of each grocery item.

These are the groceries you can by and their price per pound:

Ham:     650 cals,  $4
Lettuce:  70 cals,  $1.5
Cheese: 1670 cals,  $5
Tuna:    830 cals, $20
Bread:  1300 cals,  $1.20

 

Installing OR-Tools

Follow the instructions at https://developers.google.com/optimization/introduction/installing. You will need Python and Python setuptools installed in your machine.

 

Coding the problem

We can split the coding of the problem into 6 elements:

1. Identify the problem

We tell or-tools that we are attempting to solve a linear programming problem. We create a solver variable that is going to contain all the necessary items to solve the problem.

from ortools.linear_solver import pywraplp
solver = pywraplp.Solver('SolveSimpleSystem',pywraplp.Solver.GLOP_LINEAR_PROGRAMMING)

 

2. Ingest the input

We are going to send our table of possible groceries, calories and prices as a nested list:

food = [['ham',650, 4],
 ['lettuce',70,1.5],
 ['cheese',1670,5],
 ['tuna',830,20],
 ['bread',1300,1.20]]

 

3. Configure the decision variables

We need 5 decision variables which contain how many pounds of each product you are going to buy. Instead of creating 5 variables, we can create a list of size 5 (the size of the food list). Each item in the list will contain a decision variable.

Each decision variable will be created with a call to the NumVar method of the solver variable, passing the minimum amount of groceries we can buy, the maximum (infinity), and a unique name for the variable (contained in the previously-defined food list).

    #food is a list of groceries, calories and prices
    variable_list = [[]] * len(food)
    for i in range(0, len(food)):
        #you must buy at least minShop of each
        variable_list[i] = solver.NumVar(minShop, solver.infinity(), str(food[i][0]))

They pythonic way of writing that loop is using a list comprehension. However I’m using the loop for readability.

 
    #same thing but with comprehension
    variable_list=[solver.NumVar(minShop, solver.infinity(), str(food[i][0])) for i in range(0, len(food))]

 

4. Configure the constraints

This is where most of the magic happens. We will create one constraint per “rule” specified in the problem description.

In linear programming each constraint is specified in terms of addition of the decision variables:

lower bound <= var1+var2+var3… <=upper bound

In the knapsack problem, the conversion is pretty straightforward. In some other cases, you have to re-think and re-model your problem in these terms.

We will create a list of 3 constraints, calling Constraint(lower bound, upper bound) for each one, and then walk the variables list and call SetCoefficient for each of the variables.

 
    #Define the constraints    
    constraint_list=[]
    #Constraint 1: totalWeight<maxWeight
    #ham + lettuce + cheese + tuna + bread <= maxWeight
    constraint_list.append(solver.Constraint(0, maxWeight))
    for i in range(0, len(food)):
        constraint_list[0].SetCoefficient(variable_list[i],1)

    #Constraint 2: totalPrice<=maxCost 
    constraint_list.append(solver.Constraint(0, maxCost)) 
    for i in range(0, len(food)): 
        constraint_list[1].SetCoefficient(variable_list[i],food[i][2]) 

    #Constraint 3: totalCalories>=minCals
    constraint_list.append(solver.Constraint(minCals, minCals + 100))
    for i in range(0, len(food)):
        constraint_list[2].SetCoefficient(variable_list[i],food[i][1])

Note that the 4th rule of the problem, “You must purchase at least 4 ounces of each grocery item,” is already coded in the variables definition.

 

5. Configure the objective function

Similar to the constraint definition, the goal function is specified in terms of addition of the decision variables:

goal = Maximize/Minimize (var1+var2+var3…)

If we wish to minimize cost, we walk our variable list, get the price from the food list, and set the objective.

 
        for i in range(0, len(variable_list[)):
            objective.SetCoefficient(variable_list[i], food[i][2])
        objective.SetMinimization()

Say we wanted to maximize calories intake. We would do the same, but taking the calories value from the food list, and setting a maximization goal

# Define our objective: maximizing calories
for i in range(0, len(food)):
    objective.SetCoefficient(variable_list[i], food[i][1])
objective.SetMaximization()

 

6. Solve!

After all these configuration steps, we just call the solve method against the solver variable and print out a solution if we find it.

 
    result_status = solve(solver)

    if result_status == solver.OPTIMAL:
        print('Successful solve.')
        # The problem has an optimal solution.
        print(('Problem solved in %f milliseconds' % solver.wall_time()))
        # The objective value of the solution.
        print(('Optimal objective value = %f' % solver.Objective().Value()))
        # The value of each variable in the solution.
        var_sum=0
        for variable in variable_list:
            print(('%s = %f' % (variable.name(), variable.solution_value())))
            var_sum+=variable.solution_value()
        print(('Variable sum = %f' % var_sum));

        print('Advanced usage:')
        print(('Problem solved in %d iterations' % solver.iterations()))

        for variable in variable_list:
            print(('%s: reduced cost = %f' % (variable.name(), variable.reduced_cost())))
        
        activities = solver.ComputeConstraintActivities()
        for i, constraint in enumerate(constraint_list):
            print(('constraint %d: dual value = %f\n'
              '               activity = %f' %
              (i, constraint.dual_value(), activities[constraint.index()])))

    elif result_status == solver.INFEASIBLE:
        print('No solution found.')
    elif result_status == solver.POSSIBLE_OVERFLOW:
        print('Some inputs are too large and may cause an integer overflow.')

 

You can get the full code here.

 

Next steps

We got a solution, but you need to know a little bit of python to run this program, change its inputs or read the solution. Wouldn’t it be nice to have some sort of user-friendly UI? This will be the subject of future posts.

Building a web front-end with falcon and gunicorn

Containerizing the solution with docker