Chapter 4: Big Stimulus Sets
This chapter focuses on how to create Big Datasets by thinking like a data scientist. It begins by discussing examples of impactful open access datasets. It then teaches the reader the basics of data scraping to allow them to create their own datasets, including an introduction to client-side web coding. The chapter concludes with discussion on the ethical questions around data scraping, and current practices in Open Science to make your datasets publicly available.
- Collection of large datasets
- Scene Understanding Database (SUN) - 16,000+ scene images
- ImageNet - 1.4+ million object images
- THINGS Database - 26,000+ object images sampling most concrete objects
- YouTube-8M - 8+ million video clips
- Moments in Time - 1+ million short video clips
- 10k US Adult Faces - 10,000+ diverse and naturalistic face images
- Chicago Face Database - ~600 posed model face images
- Human Motion Database - ~7000 video clips of actions
- Quick Draw Dataset - 50+ million drawings of 345 object categories
- Visual Experience Dataset - 240 hours of video from 58 people of their lives
- GazeCapture - 2.5+ million of frames of eye gaze data from 1,450 people.
- TIMIT Acoustic-Phonetic Continuous Speech Corpus - speech data from 630 people for phonetically varied sentences.
- Million Songs Dataset - metadata for 1+ million songs
- Choices13k Dataset - 13,000 choices
- Google Books Dataset - 10+ million volumes of written text
- Psych-101 - trialwise data of 60,000 participants making 10 million choices in 160 psychology experiments
- Learn to code a webpage
- Example HTML, Javascript, and CSS code
- Learn to scrape from a website
- Learn about regular expressions
- Learn about Bayesian statistics
Here is a list of big stimulus sets in the field of psychology. I will update this list as I come across more datasets:
A fantastic resource to learn HTML, Javascript, and CSS is W3 Schools.
I am also a big fan of HTML Goodies, which I used even back in the 90s (!!!) to learn how to make webpages.
If you are getting comfortable with coding, but need help making pretty webpages, HTML5 UP has beautiful free-to-use CSS templates (I used one for this site!)
Here is code for a simple webpage that combines HTML, Javascript, and CSS to make a page where a button click changes the appearance of a piece of text. I coded this all in a simple text editor (Notepad++ in just a single file.) To look at the code, either right-click and "Save As" on the link, or right-click on the page and "View Source".
Here is simple code I wrote in MATLAB to scrape flights from SkyScanner. It uses Java libraries and you could easily adapt this code for any other programming language (in fact this is not a very common type of use of MATLAB). This takes the point-and-click approach by moving the mouse and executing keyboard commands to navigate the site automatically.
Beautiful Soup is a very helpful python library that can also navigate and scrape data from the web.
Here is an online tutorial by Jan Goyvaerts on how to make and understand regular expressions. Here is a debugger that can help you generate and test your regular expressions.
Here is the textbook Statistical Rethinking by Prof. Richard McElreath on how to think using a more Bayesian framework.
Here is also a podcast teaching you Bayesian Statistics (use at your own risk - I have not tried this yet!)