Machine learning made easy

Loading data in Torch (is a mess)

Torch is a GPU accelerated deep learning framework. It had been rather obscure until recent publicity caused by adoption by Facebook and DeepMind. This entirely anecdotal article describes our experiences trying to load some data in Torch. In short: it’s impossible, unless you’re dealing with images.

Interactive in-browser 3D visualization of datasets

In this post we’ll be looking at 3D visualization of various datasets using data-projector from Datacratic. The original demo didn’t impress us initially as much as it could, maybe because the data is synthetic - it shows a bunch of small spheres in rainbow colors. Real datasets look better.

How to run external programs from Python and capture their output

Python, being a general purpose programming language, lets you run external programs from your script and capture their output. This is useful for many machine learning tasks where one would like to use a command line application in a Python-driven pipeline. As an example, we investigate how Vowpal Wabbit’s hash table size affects validation scores.

Geoff Hinton’s Dark Knowledge

Geoff Hinton had been silent since he went to work for Google. Recently, however, Geoff has come out and started talking about something he calls dark knowledge. Maybe some questions shouldn’t be asked, but what does he mean by that?

ICLR 2014 tidbits

We took a look at a few videos from the 2014 International Conference on Learning Representations and here are some things we consider interesting: predicting class labels not seen in training, benchmarking stochastic optimization algorithms and symmetry-based learning.

Comparing large-scale linear learners

Recently we’ve been browsing papers about out-of-core linear learning on a single machine. While for us this task is basically synonymous with Vowpal Wabbit, it turns out that there are other options.

Math for machine learning

Sometimes people ask what math they need for machine learning. The answer depends on what you want to do, but in short our opinion is that it is good to have some familiarity with linear algebra and multivariate differentiation.