## Prerequisite

Python, Scikit and Panda installed in your laptop. It’s better to install conda as it has all the required libraries. Install Jupyter too. It really helps in python coding.

## Panda

Panda is a popular python library to explore and manipulate data.

## Scikit

Scikit is popular machine learning framework in python.

## Regression

Regression is process to find relation between one variable and several dependent variable. There are many regression techniques like linear regression, simple regression ordinary least squares to name a few.

## Decision Tree Regression

Suppose you have the following data with you about the number of a bathroom in a house and it’s price:

Number of Bathroom | Price |

1 | 10000 |

2 | 10000 |

3 | 30000 |

You might infer from the data above that whenever number of bathrooms in house is less than three, price is 10000 else it is 30000. Same inference can be put in the following way.

This is an example of decision tree, albeit very crude of level one. Here, we have only two leaf. So, the lack of data makes us think that if a house has 7 bathrooms, it will still have 30000 as price. Now, you can think that scikit models can go upto level 10 which will have around 1000 leaf and that model will be more accurate than this. We have used Decision Tree Regression to predict the pricing of House in Melbourne.

## Dataset

Please download the dataset from here.

## Example

In the example shown below, comments are added at each step. Please go through the code once and make your first ML model. This example has been done in jupyter notebook so ignore comments like #In[6]

Find how to implement ml model in java here.

You can also see tutorial of DanB at kaggle here.