Panda is a popular python library to explore and manipulate data.
Scikit is popular machine learning framework in python.
Regression is process to find relation between one variable and several dependent variable. There are many regression techniques like linear regression, simple regression ordinary least squares to name a few.
Decision Tree Regression
Suppose you have the following data with you about the number of a bathroom in a house and it’s price:
|Number of Bathroom||Price|
You might infer from the data above that whenever number of bathrooms in house is less than three, price is 10000 else it is 30000. Same inference can be put in the following way.
This is an example of decision tree, albeit very crude of level one. Here, we have only two leaf. So, the lack of data makes us think that if a house has 7 bathrooms, it will still have 30000 as price. Now, you can think that scikit models can go upto level 10 which will have around 1000 leaf and that model will be more accurate than this. We have used Decision Tree Regression to predict the pricing of House in Melbourne.
Please download the dataset from here.
In the example shown below, comments are added at each step. Please go through the code once and make your first ML model. This example has been done in jupyter notebook so ignore comments like #In
Find how to implement ml model in java here.
You can also see tutorial of DanB at kaggle here.