Example : Use Scikit-Learn, PySpark ML Models in Java Using MLeap


Many of the most popular machine learning frameworks are based in python. The other fact is that java has been around for quite some time as preferred language for backend development. One way could be to expose ml models as APIs. Downside being need to manage another service and extra calls over network which could have been saved.

So, the question: How do we use scikit, pyspark based models in java?


This example uses mleap to demonstrate how to load the ml model. We will first write all the steps involving using ml models trained by scikit-learn or pyspark in java.

  1. Use scikit-learn or pyspark to export the ml models using mleap(for example: Logistic Regression or Random Forrest) using mleap. I will write some other post to show how to export a ml model. Refer this, a nice example demonstrating the export of ml model.
  2. We will load the data using the scala interface provided by mleap. Since both scala and java works on JVM, we can call scala methods in java.

Step 1: The Data and the model

Taking the example given here, we will download the model generated by it from here.

The model is logistic regression done on the airbnb data. Download the data from here.  The data contains following information about airbnb accommodations:

[‘id’, ‘name’, ‘price’, ‘bedrooms’, ‘bathrooms’, ‘room_type’, ‘square_feet’, ‘host_is_superhost’, ‘state’, ‘cancellation_policy’, ‘security_deposit’, ‘cleaning_fee’, ‘extra_people’, ‘number_of_reviews’, ‘price_per_bedroom’, ‘review_scores_rating’, ‘instant_bookable’]

Here, we have extracted features like bedrooms, bathrooms, square_feet etc. We then applied logistic regression to get relation between the features and price. And later exported the model using mleap. We will download the model generated by it from here.

Step 2: Loading the ml model in scala

A scala code demonstrating the loading of model and running the sample test.

import resource._
import ml.combust.bundle.BundleFile
import ml.combust.mleap.runtime.MleapContext.defaultContext
import ml.combust.mleap.runtime.MleapSupport._
import ml.combust.mleap.runtime.serialization.FrameReader
object HelloWorld {
def main(args: Array[String]) {
* Loading the model in zip file; deserializing it
val mleapTransformerLr = (for (bf < managed(BundleFile("jar:file:/Users/harshvardhan/Downloads/airbnb.model.lr.zip"))) yield {
* Test Data
val s = scala.io.Source.fromURL("https://s3-us-west-2.amazonaws.com/mleap-demo/frame.json").mkString
val bytes = s.getBytes("UTF-8")
* Running the test data against the model to get a prediction
for (frame < FrameReader("ml.combust.mleap.json").fromBytes(bytes);
frameLr < mleapTransformerLr.transform(frame);
frameLrSelect < frameLr.select("price_prediction")) {
println("Price LR: " + frameLrSelect.dataset(0).getDouble(0))

Running this code will give the following output:

Price LR: 232.62463916840318

All codes are inspired from mleap documenation.

Refer example of scikit model here.


If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter: https://skillcaptain.substack.com

Leave a Reply

Up ↑

%d bloggers like this: