Protected: Readability scoring of document corpora

This content is password protected. To view it, drop me a line at info@danielpradilla.info, and enter your password below:

Recommender system for finding subject matter experts using the Enron email corpus

This is a little project to create a recommender system to find mentors inside an organization, using Natural Language Processing. It started as an excuse to build a data visualization I had in mind: an interactive word cloud that… did something. When I started, I didn’t know anything about Topic Modeling, Topic Extraction, or Natural Language Processing; and fell head first into a rabbit hole.

TL;DR:

Topic extraction is deep and potentially rewarding. Sanitize properly. SpaCy and Gensim are your friends. Search YouTube for knowledge. This is related to “Topic Extraction from Scientific Literature for Competency Management” and “The Author-Topic Model for Authors and Documents“. Get the code of this project at https://github.com/danielpradilla/enron-playground

»

Getting IP location information with Angular 7

Using Angular Maps Components and a new service called ipapi, you will be able to quickly put together something that will allow you to get IP information from a client and put it on a map.

Angular Maps Components is really great, and the setup with ipapi is a no-brainer (they have a free tier for 30,000 requests or under). It literally took me more time to wait for the angular project to be set up than to implement the whole thing!

The code is in github: https://github.com/danielpradilla/angular-ipapi

How to connect to SAP HANA using JDBC

Recently I had to connect a Java application to SAP HANA and I made some notes along the way:

The first step is to get the SAP HANA JDBC driver, a file called ngdbc.jar. The quickest way is to download the SAP Hana Cloud Platform SDK from here: https://tools.hana.ondemand.com/#cloud

Choose the latest “Java Web Tomcat 8” from the download section (a package starting with neo-).

Unzip the archive to any location in your machine.

Extract the JDBC driver (ngdbc.jar) from the archive. You will find the driver in the archive inside a hidden folder under: repository/.archive/lib/ngdbc.jar)

Use the driver with the connection string

jdbc:sap://<server>:<port>

Where the port is

3<instance_number>15

So if your instance number is 10, the port would be 31015.

The custom driver class name is

com.sap.db.jdbc.Driver

 

 

How to build an API for SAP HANA using strongloop/loopback

One of the aspects I like the most about SAP HANA is the cloud app development environment that allows you to quickly put together a data-entry app using Fiori.

Recently, I wanted to build a small JavaScript app for data querying and entry using the awesome ag-grid. The data was in SAP HANA but the prospect of building and testing a secure API was quite daunting (is it worth it? How long is it going to take? Who’s going to maintain it?) It was actually easier to switch to MongoDB, use Express or Parse and add an ETL process to sync the databases. Wouldn’t it be great if there was a way to create some sort of automatic API through configuration?

There is.

The Loopback component of Strongloop offers the possibility to quickly create secure APIs for CRUD operations against MySQL, Postgres, Oracle and other databases. In many cases, it allows you to completely bypass the development of boring and —commoditized— backend stuff. Using a convention over configuration approach, you can create endppoints for each of your tables in a matter of minutes.

But, can it connect to SAP HANA?

I googled and found a connector for HANA

The best way to set this up is to containerize the solution: create a docker container with an installation of strongloop and link a directory in the host machine to the working directory of the container, that way you keep the configuration outside of the container to quickly modify it, and you can quickly switch or upgrade the container.

I started from node official image, but you can start from the strongloop official image, and created my own, which you can use. Here is a link to my image.

FROM node

MAINTAINER Daniel Pradilla <info@danielpradilla.info>

RUN npm -g config set user root

RUN npm install -g --unsafe-perm strongloop 

RUN npm install loopback-datasource-juggler

RUN npm install loopback-connector-saphana

WORKDIR /app
EXPOSE 3000

Once you start the container, you need to create a connection to SAP HANA

docker run --name loopback -p 3000:3000 -v `pwd`:/app/ -t -i danielpradilla/loopback slc loopback:datasource

Then, you have to edit your server/datasources.json file and manually specify the schema name (This is something that you don’t need to do with other databases, and you may get stuck on a table not found error if you don’t do it)

{
  "db": {
    "name": "db",
    "connector": "memory"
  },
  "hana": {
    "host": "my_server_address",
    "port": my_server_port,
    "database": "MY_DATABASE_NAME",
    "name": "hana",
    "user": "my_HANA_user",
    "password": "my_hana_password",
    "schema": "MY_SCHEMA_NAME",
    "connector": "saphana"
  }
}

Then you create an endpoint to the table using the wizard

docker run --name loopback -p 3000:3000 -v `pwd`:/app/ -t -i danielpradilla/loopback slc loopback:model

Or using arc by running the API interface

docker run --name loopback -p 3000:3000 -v `pwd`:/app/ -t -i danielpradilla/loopback slc arc

And after that, you are ready to experience the awesomeness of having all the API endpoints created for you.

docker run --name loopback -p 3000:3000 -v `pwd`:/app/ -t -i danielpradilla/loopback slc run .

The next step would be to secure the API using microgateway for API Key validation, OAuth 2.0 and rate limiting,