Timeline for Nice real data sets for testing DBSCAN?
Current License: CC BY-SA 4.0
9 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 12, 2020 at 4:56 | comment | added | C8H10N4O2 | Another repository of datasets that comes to mind is Movebank, a data repository of animal movement datasets. Clustering comes in to play when for example trying to distinguish commuting vs. foraging, for example bats flying to a lake to forage. Site is here: datarepository.movebank.org | |
| May 11, 2020 at 0:13 | comment | added | math_lover | @C8H10N402: thanks for the suggestion. I've done the Iris dataset, it's super simple. Covid data turns out to be less evident! I was also hoping to compare strongly connected component cluster algorithms such as Tarjan with DBSCAN on a real dataset. This would require a dataset to be in the form of a graph, with distances between nodes (edge wights for example). But I can't think of any real data sets that would take that form... | |
| May 10, 2020 at 6:02 | comment | added | C8H10N4O2 | In that case Iris dataset would be a simpler start; there are two very clear clusters (two species will be in one cluster, the third in its own cluster). The Iris dataset is available on Kaggle and elsewhere. | |
| May 9, 2020 at 3:55 | comment | added | math_lover | @C8H10N402 : those data sets, as well as ones in kaggle are excellent. however i'm not sure what quanities i should be using to do cluster analysis. i thought about doing a 3-d cluster analysis to determine clusters of countries with many coronavirus cases using lattitude, longitude, and total # cases. however this would require me to define a distance function (weighted distance of great circle distance and diff in total # cases), which is somewhat arbitrary. Any ideas for something simpler? | |
| May 7, 2020 at 21:20 | history | edited | C8H10N4O2 | CC BY-SA 4.0 | elaborated on COVID |
| May 7, 2020 at 12:45 | comment | added | math_lover | @C8H10N402 : I would love to use a Covid-19 dataset. Could you elaborate? | |
| May 7, 2020 at 8:11 | history | edited | ebrahimi | CC BY-SA 4.0 | edited body |
| May 7, 2020 at 5:33 | review | First posts | |||
| May 7, 2020 at 8:11 | |||||
| May 7, 2020 at 5:33 | history | answered | C8H10N4O2 | CC BY-SA 4.0 |