Skip to content

Commit d5d6794

Browse files
committed
edit README
1 parent 1dcd75f commit d5d6794

File tree

2 files changed

+9
-5
lines changed

2 files changed

+9
-5
lines changed

KaggleTaxiTrip/README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,17 @@ The first part is to analyze the dataframe and observe correlation between varia
77
![Distributions](https://github.com/alexattia/Data-Science-Projects/blob/master/KaggleTaxiTrip/pic/download.png)
88
![Rush Hour](https://github.com/alexattia/Data-Science-Projects/blob/master/KaggleTaxiTrip/pic/rush_hour.png)
99

10-
### Second part - Cleaning and feature selection
10+
### Second part - Clustering
11+
The goal of this playground is to predict the trip duration of test set. We know that some neighborhoods are more congested. So, I used K-Means to compute geo-clusters for pickup and drop off.
12+
![Cluster](https://github.com/alexattia/Data-Science-Projects/blob/master/KaggleTaxiTrip/pic/nyc_clusters.png)
13+
14+
### Third part - Cleaning and feature selection
1115
I have found some odd long trips : one day trip with a mean spead < 1km/h.
1216
![Outliners](https://github.com/alexattia/Data-Science-Projects/blob/master/KaggleTaxiTrip/pic/outliners.png)
1317
I have removed these outliners.
1418

15-
I also added two features from the data available : Haversine distance and Manhattan distance.
19+
I also added features from the data available : Haversine distance, Manhattan distance, means for clusters, PCA for rotation.
1620

17-
### Third part - Prediction
18-
I am currently using a Random Forest.
19-
Current Root Mean Squared Logarithmic error : 0.45
21+
### Forth part - Prediction
22+
I compared Random Forest and XGBoost.
23+
Current Root Mean Squared Logarithmic error : 0.391
73.8 KB
Loading

0 commit comments

Comments
 (0)