Readability scoring of the United Nations Corpus

Imagine you could estimate how hard would be to read a document, before reading it. Imagine you could do it for entire batches of documents you need to process. Imagine you could have a recommender system that would help you prioritize unread documents according to their difficulty. A bit of experimentation with the public United… Continue reading Readability scoring of the United Nations Corpus

Recommender system for finding subject matter experts using the Enron email corpus

This is a little project to create a recommender system to find mentors inside an organization, using Natural Language Processing. It started as an excuse to build a data visualization I had in mind: an interactive word cloud that did something. When I started, I didn’t know anything about Topic Modeling, Topic Extraction, or Natural… Continue reading Recommender system for finding subject matter experts using the Enron email corpus

How to connect to SAP HANA using JDBC

Recently I had to connect a Java application to SAP HANA and I made some notes along the way: The first step is to get the SAP HANA JDBC driver, a file called ngdbc.jar. The quickest way is to download the SAP Hana Cloud Platform SDK from here: https://tools.hana.ondemand.com/#cloud Choose the latest “Java Web Tomcat 8”… Continue reading How to connect to SAP HANA using JDBC