Many of the most popular machine learning frameworks are based in python. The other fact is that java has been around for quite some time as preferred language for backend development. One way could be to expose ml models as APIs. Downside being need to manage another service and extra calls over network which could have been saved.
So, the question: How do we use scikit, pyspark based models in java?
This example uses mleap to demonstrate how to load the ml model. We will first write all the steps involving using ml models trained by scikit-learn or pyspark in java.
- Use scikit-learn or pyspark to export the ml models using mleap(for example: Logistic Regression or Random Forrest) using mleap. I will write some other post to show how to export a ml model. Refer this, a nice example demonstrating the export of ml model.
- We will load the data using the scala interface provided by mleap. Since both scala and java works on JVM, we can call scala methods in java.
Step 1: The Data and the model
The model is logistic regression done on the airbnb data. Download the data from here. The data contains following information about airbnb accommodations:
[‘id’, ‘name’, ‘price’, ‘bedrooms’, ‘bathrooms’, ‘room_type’, ‘square_feet’, ‘host_is_superhost’, ‘state’, ‘cancellation_policy’, ‘security_deposit’, ‘cleaning_fee’, ‘extra_people’, ‘number_of_reviews’, ‘price_per_bedroom’, ‘review_scores_rating’, ‘instant_bookable’]
Here, we have extracted features like bedrooms, bathrooms, square_feet etc. We then applied logistic regression to get relation between the features and price. And later exported the model using mleap. We will download the model generated by it from here.
Step 2: Loading the ml model in scala
A scala code demonstrating the loading of model and running the sample test.
Running this code will give the following output:
Price LR: 232.62463916840318
All codes are inspired from mleap documenation.
Refer example of scikit model here.