Skip to main content
9 events
when toggle format what by license comment
May 12, 2020 at 4:56 comment added C8H10N4O2 Another repository of datasets that comes to mind is Movebank, a data repository of animal movement datasets. Clustering comes in to play when for example trying to distinguish commuting vs. foraging, for example bats flying to a lake to forage. Site is here: datarepository.movebank.org
May 11, 2020 at 0:13 comment added math_lover @C8H10N402: thanks for the suggestion. I've done the Iris dataset, it's super simple. Covid data turns out to be less evident! I was also hoping to compare strongly connected component cluster algorithms such as Tarjan with DBSCAN on a real dataset. This would require a dataset to be in the form of a graph, with distances between nodes (edge wights for example). But I can't think of any real data sets that would take that form...
May 10, 2020 at 6:02 comment added C8H10N4O2 In that case Iris dataset would be a simpler start; there are two very clear clusters (two species will be in one cluster, the third in its own cluster). The Iris dataset is available on Kaggle and elsewhere.
May 9, 2020 at 3:55 comment added math_lover @C8H10N402 : those data sets, as well as ones in kaggle are excellent. however i'm not sure what quanities i should be using to do cluster analysis. i thought about doing a 3-d cluster analysis to determine clusters of countries with many coronavirus cases using lattitude, longitude, and total # cases. however this would require me to define a distance function (weighted distance of great circle distance and diff in total # cases), which is somewhat arbitrary. Any ideas for something simpler?
May 7, 2020 at 21:20 history edited C8H10N4O2 CC BY-SA 4.0
elaborated on COVID
May 7, 2020 at 12:45 comment added math_lover @C8H10N402 : I would love to use a Covid-19 dataset. Could you elaborate?
May 7, 2020 at 8:11 history edited ebrahimi CC BY-SA 4.0
edited body
May 7, 2020 at 5:33 review First posts
May 7, 2020 at 8:11
May 7, 2020 at 5:33 history answered C8H10N4O2 CC BY-SA 4.0