Pentaho is slow for servers with too many home directories

Over the course of two years, browsing solutions on our Pentaho 5.4 server became progressively slow. It came to a point in which you had to wait 2-3 minutes to see the list of solutions in the Pentaho User Console.

The catalina log didn’t say much and we didn’t have too many solutions (around 200), so I though it was perhaps a database bottleneck. It all came to a screeching halt on a Friday afternoon (as usual) when, after a restart, the Pentaho Console simply stopped responding.

I turned all logging on and found that Pentaho was complaining about a lot of invalid users. Googling around I found that 5.4 performs user permissions tests on first login, calling UserDetailService for each home directory owner in the Home directory. Examining logs, we had over 4000 folders in there, accumulated from two authentication scheme changes. I could not even open the Home folder in the user console.

Pentaho versions 6.1 and over have a config flag to skip this user verification. It’s called skipUserVerificationOnPrincipalCreation, inside pentaho-solutions/system/jackrabbit/

More info at Jackrabbit Repository Performance Tuning


All fine and dandy, but what to do with a Pentaho 5.4 server. Or even, how to fix this after your PUC becomes unresponsive?

I thought that the Pentaho REST API might help and, sure enough, we can delete folders with it. In our case, our users don’t save anything in their home folders, so all we needed to do was to delete these 4000+ folders.

This is a nuclear option, so don’t run this unless you know what you are doing. If your users have solutions saved in their home folders, you need to amplify the following script to check for that and back up the solution and/or avoid deletion.

Open up any browser javascript console, replace <server_url> by your pentaho url and run:

$.getJSON("http://<server_url>:8080/pentaho/api/repo/files/:home/children", function(data){
    $.each(data, function(i, nodes){
                type: "PUT",
                url: "http://<server_url>:8080/pentaho/api/repo/files/deletepermanent",

You can fork the gist for this at




Linear Optimization with or-tools — building a web front-end with falcon and gunicorn

In a previous post, I put together a script for solving a linear optimisation problem using Google’s OR-tools. This python script is callable from the command line and you kinda need to know what you are doing and how to organize the parameters.

So, in order to address this difficulty, I wanted to build a web front-end for this script, so that regular humans can interact with it.

We can generalize this problem as how to build a web interface for a python script.

First of all, we can split the concerns. What we need is a web page that serves as a user interface, and a web API connecting the webpage to the python backend.

Ideally, this API should be as light as possible. After all, the heavy lifting is going to be performed by the backend. Using Django would be easy but also overkill. I was looking for one of these microframeworks, you know, like Flask. And that’s how I got to Falcon.

Falcon is like 10 times faster than Flask, it is as bare bones as you can get insomuch as you need to bring your own WSGI server, like gunicorn (but you can use Waitress in Windows, or uWSGI).



Get the code here


1 Installing the dependencies

pip install falcon cython gunicorn

2 Creating a JSON output for the script

My plan was to use JSON to exchange data between the API and the webpage. So I needed a JSON response builder. I could add this functionality to the previously created python script, but I prefer to have it in a separate file which basically calls the main method of the solver script and returns a JSON payload.

Source code

import json
import interview_grocery_startup as igs

def get_json_response(cfg, what):
    results = igs.main(cfg, what)

    solver = results['solver']

    if results['result_status'] == solver.OPTIMAL:
        response = {'result_status': 'optimal answer'}

        variable_list = results['variable_list']

        print solver.wall_time()
        print solver.Objective().Value()

        response['wall_time'] = solver.wall_time()
        response['objective_value']= solver.Objective().Value()

        response['variables'] = dict()

        for variable in variable_list:
            response['variables'][]= variable.solution_value()

    elif results['result_status'] == solver.INFEASIBLE:
           response = {'result_status': 'infeasible'}
      elif results['result_status'] == solver.POSSIBLE_OVERFLOW:
        response = {'result_status': 'overflow'}

    json_response = json.dumps(response, sort_keys=True)

    return json_response

def main(cfg, what):
    json_response = get_json_response(cfg, what)
    return json_response


3 Coding the API

The principle for creating an API in Falcon is very easy to understand: you define a class and then instantiate it and link it to a route. This ends up being very convenient and clear. You can define the routes and the methods as you please.

Source code

import falcon
import json
import interview_grocery_startup_json as igsj 

class InterviewGroceryStartupResource(object):
    def on_get(self, req, resp):
        #Handles GET requests
        resp.status = falcon.HTTP_200  # This is the default status

        resp.body =  igsj.main(cfg,cfg['what'])

    def on_post(self, req, resp):
            body =
            body_json = json.loads(body.decode('utf-8'))
            cfg = body_json["cfg"]
        except KeyError:
            raise falcon.HTTPBadRequest(
            'Missing Config',
            'A config (cfg) must be submitted in the request body.')

        resp.status = falcon.HTTP_200
        resp.body = igsj.main(cfg,cfg['what'])

# falcon.API instances are callable WSGI apps
app = application = falcon.API()

# Resources are represented by long-lived class instances
igsapi = InterviewGroceryStartupResource()

# ApiTestResource will handle all requests to the '/apitest' URL path
app.add_route('/igsapi', igsapi)

As you may see in the class, I added two methods. on_get is not doing much, the interesting one is on_post. On each post to the specified route, the scripts decodes de body, extracts a JSON object, looks for the property cfg (config) and sends that to the JSON response builder.

(Yes, this means that whenever you POST, you need to send a JSON object in the body with a “cfg” attribute that looks more or less like this:

                    "cfg": {"what": "cost",
                            "maxWeight": 10,
                            "maxCost": 100,
                            "minCals": 14000,
                            "minShop": 0.25,
                            "total_cost": 0,
                            "food":  [["ham",650, 4],

If you are running the API in a different machine than the one serving the webpage, you may have trouble with the same origin policy. In order to address this, you can enable cross-origin resource sharing, CORS.

Add the following class to the code above

ALLOWED_ORIGINS = ['http://localhost';]

class CorsMiddleware(object):
    def process_request(self, request, response):
        origin = request.get_header('Origin')
        if origin is not None and origin in ALLOWED_ORIGINS:
            response.set_header('Access-Control-Allow-Origin', origin)
        response.set_header('Access-Control-Allow-Origin', '*')

And call this class during the API instantiation:

app = application = falcon.API(middleware=[CorsMiddleware()])

Pluggable magic!

You may run this API in port 18000 (or the one you please) by calling gunicorn:

gunicorn interview_grocery_startup_api -b :18000 --reload

Check the gunicorn docs for more options

The Falcon documentation is here.


4 Testing the API

I use the wonderful Postman for testing all the APIs that I make


5 Coding the interface

The web interface can easily be an HTML+JS+CSS thingie. I tried to keep it simple creating a single page with 3 parts: a formulation, a data table and a fat green button to get the results.

From the functional perspective, the only thing you need to do is to perform POSTs to the /gunicorn/igsapi endpoint defined in the API script, and then process the response.

Here you can see the javascript file that does everything.

I keep a variable (igs.payload) constantly updated with the JSON payload I’m going to send. And then I just POST whenever I please:

      type: "POST",
      url: "/gunicorn/igsapi",
      dataType: "json",
      data: JSON.stringify(igs.payload),

      error:function (xhr, ajaxOptions, thrownError){

      success:function(data, textStatus, jqXHR){

The result is sent to a very dirty fillAnswerTable function which builds and re-builds the HTML code for the solution table.

For the UI look and feel I used Semantic UI, my current favorite Bootstrap alternative.

I’m also using Bret Victor’s Tangle library to provide direct manipulation of the variables. Each manipulation fires an event that updates the igs.payload variable.


Next steps

We got our little web app, but it has so many moving pieces. Wouldn’t it be nice to package it in a way that it always works? This will be the subject of a future post.

Containerizing the solution with docker


Linear Optimization with or-tools


Getting started

Over the last couple of months I’ve been getting my feet wet with linear programming and mathematical optimisation. I got a sense of how it all worked from this Discrete Optimisation course in Coursera and googling around I discovered that there are a ton of tools out there to help you solve optimisation problems. Makes sense, why would you want to implement a solving algorithm from scratch when some of the best minds in the history of mankind have already given it a shot?

Solvers, as these tools are often called, can reach the hundreds of thousands of dollars… and they are worth every penny! But since I wanted to play with one without forking up the dough, I narrowed my search down to open-source options.

Hans Mittelmann from the Arizona State University performs regular automated benchmarks on different mathematical optimisation tools. He publishes the benchmarks at Following what I read in the results, I picked Google Optimization Tools (OR-Tools) because it performed fairly well and… well, because if you’re looking for a sinister tool to model and solve hard problems in the human world, you can rarely go wrong with Google.

GLOP has C++ and Python APIs. I’m better at Python and I expected to quickly put together a web front-end for this, so I picked the latter one.

TL;DR. Just gimme the code

You can get the full code here.


The Problem

I picked a variation of the knapsack problem: a grocery-shopping example in which you try to maximise the number of calories you can buy with a limited budget. I got the problem from this blog post:

Basically: You walk into a grocery store with a grocery bag and some cash, to buy groceries for a week. You need to follow these rules:

1. Your bag can hold ten pounds.
2. You have $100
3. You need about 2000 calories a day, so a weekly shopping trip is about 14,000 calories.
4. You must purchase at least 4 ounces of each grocery item.

These are the groceries you can by and their price per pound:

Ham:     650 cals,  $4
Lettuce:  70 cals,  $1.5
Cheese: 1670 cals,  $5
Tuna:    830 cals, $20
Bread:  1300 cals,  $1.20


Installing OR-Tools

Follow the instructions at You will need Python and Python setuptools installed in your machine.


Coding the problem

We can split the coding of the problem into 6 elements:

1. Identify the problem

We tell or-tools that we are attempting to solve a linear programming problem. We create a solver variable that is going to contain all the necessary items to solve the problem.

from ortools.linear_solver import pywraplp
solver = pywraplp.Solver('SolveSimpleSystem',pywraplp.Solver.GLOP_LINEAR_PROGRAMMING)


2. Ingest the input

We are going to send our table of possible groceries, calories and prices as a nested list:

food = [['ham',650, 4],


3. Configure the decision variables

We need 5 decision variables which contain how many pounds of each product you are going to buy. Instead of creating 5 variables, we can create a list of size 5 (the size of the food list). Each item in the list will contain a decision variable.

Each decision variable will be created with a call to the NumVar method of the solver variable, passing the minimum amount of groceries we can buy, the maximum (infinity), and a unique name for the variable (contained in the previously-defined food list).

    #food is a list of groceries, calories and prices
    variable_list = [[]] * len(food)
    for i in range(0, len(food)):
        #you must buy at least minShop of each
        variable_list[i] = solver.NumVar(minShop, solver.infinity(), str(food[i][0]))

They pythonic way of writing that loop is using a list comprehension. However I’m using the loop for readability.

    #same thing but with comprehension
    variable_list=[solver.NumVar(minShop, solver.infinity(), str(food[i][0])) for i in range(0, len(food))]


4. Configure the constraints

This is where most of the magic happens. We will create one constraint per “rule” specified in the problem description.

In linear programming each constraint is specified in terms of addition of the decision variables:

lower bound <= var1+var2+var3… <=upper bound

In the knapsack problem, the conversion is pretty straightforward. In some other cases, you have to re-think and re-model your problem in these terms.

We will create a list of 3 constraints, calling Constraint(lower bound, upper bound) for each one, and then walk the variables list and call SetCoefficient for each of the variables.

    #Define the constraints    
    #Constraint 1: totalWeight<maxWeight
    #ham + lettuce + cheese + tuna + bread <= maxWeight
    constraint_list.append(solver.Constraint(0, maxWeight))
    for i in range(0, len(food)):

    #Constraint 2: totalPrice<=maxCost 
    constraint_list.append(solver.Constraint(0, maxCost)) 
    for i in range(0, len(food)): 

    #Constraint 3: totalCalories>=minCals
    constraint_list.append(solver.Constraint(minCals, minCals + 100))
    for i in range(0, len(food)):

Note that the 4th rule of the problem, “You must purchase at least 4 ounces of each grocery item,” is already coded in the variables definition.


5. Configure the objective function

Similar to the constraint definition, the goal function is specified in terms of addition of the decision variables:

goal = Maximize/Minimize (var1+var2+var3…)

If we wish to minimize cost, we walk our variable list, get the price from the food list, and set the objective.

        for i in range(0, len(variable_list[)):
            objective.SetCoefficient(variable_list[i], food[i][2])

Say we wanted to maximize calories intake. We would do the same, but taking the calories value from the food list, and setting a maximization goal

# Define our objective: maximizing calories
for i in range(0, len(food)):
    objective.SetCoefficient(variable_list[i], food[i][1])


6. Solve!

After all these configuration steps, we just call the solve method against the solver variable and print out a solution if we find it.

    result_status = solve(solver)

    if result_status == solver.OPTIMAL:
        print('Successful solve.')
        # The problem has an optimal solution.
        print(('Problem solved in %f milliseconds' % solver.wall_time()))
        # The objective value of the solution.
        print(('Optimal objective value = %f' % solver.Objective().Value()))
        # The value of each variable in the solution.
        for variable in variable_list:
            print(('%s = %f' % (, variable.solution_value())))
        print(('Variable sum = %f' % var_sum));

        print('Advanced usage:')
        print(('Problem solved in %d iterations' % solver.iterations()))

        for variable in variable_list:
            print(('%s: reduced cost = %f' % (, variable.reduced_cost())))
        activities = solver.ComputeConstraintActivities()
        for i, constraint in enumerate(constraint_list):
            print(('constraint %d: dual value = %f\n'
              '               activity = %f' %
              (i, constraint.dual_value(), activities[constraint.index()])))

    elif result_status == solver.INFEASIBLE:
        print('No solution found.')
    elif result_status == solver.POSSIBLE_OVERFLOW:
        print('Some inputs are too large and may cause an integer overflow.')


You can get the full code here.


Next steps

We got a solution, but you need to know a little bit of python to run this program, change its inputs or read the solution. Wouldn’t it be nice to have some sort of user-friendly UI? This will be the subject of future posts.

Building a web front-end with falcon and gunicorn

Containerizing the solution with docker


A hub to end all dongles

When I got the 2016 MacBook I knew that I was condemning myself to the dongle life. However, instead of complaining or getting on with the program and buy 2 or 10 dongles. I researched a bit and found the perfect solution for my desktop.

I wanted a docking station to connect two monitors (one DVI and another one HDMI), an Ethernet cable, several USB devices, and still have a single USB C cable going to my computer. This is exactly what this USB dock by KKmoon provides. You may find under several names: KKMoon USB 3.1 dock, Lapdock or WavLink. It uses a technology called DisplayLink, a standard to connect computers and displays using USB and, unlike many, many other dongles, it’s able to drive multiple displays.

If you need something more portable, Cable Matters has a good alternative. However it doesn’t allow you to drive two monitors a the same time.

If for some reason, you still want the dongle life, I recommend, a website that recommends you the best dongle for your need.



Stagehand, cool platformer for iOS

I have a lot of respect for Big Bucket, the mobile game company that set the bar on how to develop a high quality retro-inspired game. I wasted invested a lot of hours in the Barcelona metro playing The Incident, and another few more with Space Age, a game I recommended to all my friends who played adventure games back in the day.

Stagehand is their latest creation. On the surface it looks like your standard platformer. But in reality is a reinterpretation of its basic mechanics. Instead of controlling the character to jump over pits and spikes, you have a constantly-running character and you manipulate the environment, dragging platforms and pieces of the stage in order to navigate the obstacles. Hence the tagline “A Reverse Platformer”.

You can download Stagehand by following this link.