## Introduction

Optimal Substructure : A problem is said to be a optimal substructure if an optimal solution can be constructed from optimal solution of it’s subproblem.

Overlapping subproblems : A problem is said to have overlapping subproblems if the problem can be broken down into repetitive subproblems.

Greedy Algorithm is used to solve problems having optimal substructure. But if the same problem has overlapping subproblems, dynamic programming is used in such cases.

## The Problem

Longest Common Subsequence : Let’s say, we have two strings  ABXC and ADYC.

Here, the longest common subsequence(lcs) is AC. But how? If you see the strings backward, you will realise that both have ‘C’ at the end. That effectively means that our longest common subsequence has ‘C’ at the end and now the problem has reduced to find the longest common subsequence of ABX and ADY.

ABX and ADY don’t have anything common at the end, so what should will be the lcs here. Either it will lcs of AB and ADY or it will be lcs of ABX and AD; depending upon which one is greater.

LCS of AB and ADY in turn depends on A and ADY or AB and AD. Similarly we could go on recursively to see following pair of substrings evaluated for lcs between them

Recursive pairs are : A-A
Recursive pairs are : –
Recursive pairs are : AB-A
Recursive pairs are : A-A
Recursive pairs are : AB-
Recursive pairs are : ABX-A
Recursive pairs are : AB-A
Recursive pairs are : ABX-

You will notice that this problem can be solved recursively. Here’s the code for that:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 import java.util.Arrays; import java.util.Date; import java.util.HashMap; import java.util.Map; public class LongestCommonSubsequence { public static void main(String args[]) { long timestamp1 = System.currentTimeMillis(); System.out.println("—————Longest Common Subsequence Using Recursive Method—————–"); String string1 = "AGGTABNDSV"; String string2 = "GXTXAYBDFGHOI"; String longestCommonSubsequence = findLongestSubsequence(string1, string2); System.out.println("LCS is "+new StringBuilder(longestCommonSubsequence).reverse().toString()); long timestamp2 = System.currentTimeMillis(); System.out.println(" Time Taken : "+(timestamp2–timestamp1)); } private static String findLongestSubsequence(String string1, String string2) { char[] charArray1 = string1.toCharArray(); char[] charArray2 = string2.toCharArray(); int arraylength1 = string1.length() – 1; int arraylength2 = string2.length() – 1; String longestCommonSubsequence = ""; if (arraylength1 < 0 || arraylength2 < 0) return longestCommonSubsequence; String newStr1 = String.valueOf(Arrays.copyOfRange(charArray1, 0, arraylength1)); String newStr2 = String.valueOf(Arrays.copyOfRange(charArray2, 0, arraylength2)); if (charArray1[arraylength1] == charArray2[arraylength2]) { longestCommonSubsequence = longestCommonSubsequence + charArray1[arraylength1]; longestCommonSubsequence = longestCommonSubsequence + findLongestSubsequence(newStr1, newStr2); } else { String res1 = findLongestSubsequence(newStr1, String.valueOf(charArray2)); String res2 = findLongestSubsequence(String.valueOf(charArray1), newStr2); if (res1.length() > res2.length()) { longestCommonSubsequence = longestCommonSubsequence + res1; } else longestCommonSubsequence = longestCommonSubsequence + res2; } return longestCommonSubsequence; } }

The output for this code is:

—————Longest Common Subsequence Using Recursive Method—————–
LCS is GTABD
Time Taken : 192 ms

However, if you notice the recursive pairs that I mentioned above, there are many pairs in duplicate whose computation we can avoid with little space. We can store the recursive pair and lcs for that in a hashmap and check that before going for the recursive approach. Here’s the code for that:

The output for this code is :

—————Longest Common Subsequence Using Recursive Method And Memoization—————–
LCS is GTABD
Time Taken : 1 ms

Did you notice the difference in time taken in both case? Memoization increased the speed by 190 times.

## Introduction:

Many of the most popular machine learning frameworks are based in python. The other fact is that java has been around for quite some time as preferred language for backend development. One way could be to expose ml models as APIs. Downside being need to manage another service and extra calls over network which could have been saved.

So, the question: How do we use scikit, pyspark based models in java?

## Steps:

This example uses mleap to demonstrate how to load the ml model. We will first write all the steps involving using ml models trained by scikit-learn or pyspark in java.

1. Use scikit-learn or pyspark to export the ml models using mleap(for example: Logistic Regression or Random Forrest) using mleap. I will write some other post to show how to export a ml model. Refer this, a nice example demonstrating the export of ml model.
2. We will load the data using the scala interface provided by mleap. Since both scala and java works on JVM, we can call scala methods in java.

## Step 1: The Data and the model

Taking the example given here, we will download the model generated by it from here.

The model is logistic regression done on the airbnb data. Download the data from here.  The data contains following information about airbnb accommodations:

[‘id’, ‘name’, ‘price’, ‘bedrooms’, ‘bathrooms’, ‘room_type’, ‘square_feet’, ‘host_is_superhost’, ‘state’, ‘cancellation_policy’, ‘security_deposit’, ‘cleaning_fee’, ‘extra_people’, ‘number_of_reviews’, ‘price_per_bedroom’, ‘review_scores_rating’, ‘instant_bookable’]

Here, we have extracted features like bedrooms, bathrooms, square_feet etc. We then applied logistic regression to get relation between the features and price. And later exported the model using mleap. We will download the model generated by it from here.

A scala code demonstrating the loading of model and running the sample test.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 import resource._ import ml.combust.bundle.BundleFile import ml.combust.mleap.runtime.MleapContext.defaultContext import ml.combust.mleap.runtime.MleapSupport._ import ml.combust.mleap.runtime.serialization.FrameReader object HelloWorld { def main(args: Array[String]) { /** * Loading the model in zip file; deserializing it */ val mleapTransformerLr = (for (bf <– managed(BundleFile("jar:file:/Users/harshvardhan/Downloads/airbnb.model.lr.zip"))) yield { bf.loadMleapBundle().get.root }).tried.get /** * Test Data */ val s = scala.io.Source.fromURL().mkString val bytes = s.getBytes("UTF-8") /** * Running the test data against the model to get a prediction */ for (frame <– FrameReader("ml.combust.mleap.json").fromBytes(bytes); frameLr <– mleapTransformerLr.transform(frame); frameLrSelect <– frameLr.select("price_prediction")) { println("Price LR: " + frameLrSelect.dataset(0).getDouble(0)) } } }

Running this code will give the following output:

Price LR: 232.62463916840318

All codes are inspired from mleap documenation.

Refer example of scikit model here.

## An example : Unit test using Argument Captor

How do you write unit test for the argument of a method call inside the method which returns void?

Have a look at the code snippet below:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 public class MainClass { Utility utility; public MainClass(Utility utility){ this.utility = utility; } /** * Counts the occurrences of given character ch * in the given text * @param text * @param ch */ public void countNumberCharacter(String text, char ch) { char[] charArray = text.toCharArray(); int count = 0; int i = 0; while(i < charArray.length){ if(charArray[i] == ch) count += 1; i += 1; } utility.logToConsole(count); } }

view raw

gistfile1.txt

hosted with ❤ by GitHub

As mentioned in the comment, the method countNumberCharacter simply counts the number of occurrences of character ch in the given string text. The count is then logged to the console via a utility method called log to console.

What if you want to write the unit test for the count?

Fortunately, Mockito comes with a Class called ArgumentCaptor. Argument Captor provides an opportunity to capture argument values for assertion.

We need to first add dependency of Mockito before using the Arguement Captor.  Link to Mockito dependency.

Refer to the code snippet of junit test of the class shown above to see an example of Argument Captor at work.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 public class MainClassTest { Utility utility = mock(Utility.class); @Test public void testCountofCharacter(){ String testText = "Harvey : I don’t play the odds, I play the man."; MainClass mainClass = new MainClass(utility); mainClass.countNumberCharacter(testText, 'e'); ArgumentCaptor actualCount = ArgumentCaptor.forClass(Integer.class); verify(utility, times(1)).logToConsole(actualCount.capture()); //Expected Number of 'e' in test text. //Harv'e'y : I don't play th'e' odds, I play th'e' man. Assert.assertEquals(3, (long)actualCount.getValue()); } }

view raw

gistfile1.txt

hosted with ❤ by GitHub