Skip to content
This repository was archived by the owner on May 5, 2023. It is now read-only.

peterldowns/clickbait-classifier

Repository files navigation

Clickbait Classifier

This is a very simple attempt at classifying article titles into one of two groups: "clickbait" (a la Buzzfeed and Clickhole) or "news" (a la The New York Times). I was curious if this could be done accurately; I can't think of a good definition for "clickbait" but I know it when I see it.

Setup

poetry

If you have poetry installed, you shouldn't have to do a thing. You can install all necessary dependencies and run the demos with poetry run:

# train the classifier and show the top features poetry run python -m clickbait_classifier.classifier # enter an interactive classifier loop poetry run python -m clickbait_classifier.interactive

pip

If you don't use poetry, you can create a virtualenv, install the dependencies, and then run the code with pip:

python -m venv venv source venv/bin/activate pip install -r requirements.txt python -m clickbait_classifier.classifier python -m clickbait_classifier.interactive

nix

If you have nix, you can use nix-shell or nix develop or direnv or lorri to get all the necessary dependencies, including Poetry.

If you use flakes, you can run the demos without installing anything:

# train the classifier and show the top features nix run github:peterldowns/clickbait-classifier#classifier # enter an interactive classifier loop nix run github:peterldowns/clickbait-classifier#interactive

Usage

The code is pretty messy, but the general idea is that there is some article data in the data/ directory, and classifier.py uses this for training. You can download more data from Buzzfeed and Clickhole using the tools in scripts/.

python ./scripts/scrape_buzzfeed.py > ./clickbait_classifier/data/buzzfeed2.json python ./scripts/scrape_clickhole.py > ./clickbait_classifier/data/clickhole2.json 

If you feel like testing a few article titles, you can get a simple testing loop like so:

python ./clickbait_classifier/interactive.py

This will load the classifier, train it, and then present you with a simple loop where you can paste in article titles and see the results. You can quit using c-C. For example:

clickbait-classifier/ $ ./interactive.py Loading classifier (may take time to train.) Classification report:  precision recall f1-score support  clickbait 0.91 0.62 0.74 172  news 0.90 0.98 0.94 621 avg / total 0.91 0.91 0.90 793  -9.0500 10 things -5.3044 new  -9.0500 11 things -5.7492 bush  -9.0500 13 times -5.8460 overview  -9.0500 15 times -5.9519 iraq  -9.0500 19 puppies -5.9645 war  -9.0500 2014 -5.9828 president  -9.0500 2015 -5.9852 clinton  -9.0500 21 -6.1021 special  -9.0500 23 life -6.1206 nation  -9.0500 24 -6.1464 report  -9.0500 25 -6.1778 campaign  -9.0500 27 -6.2223 china  -9.0500 33 -6.2880 york  -9.0500 35 -6.2880 new york  -9.0500 90s -6.2994 plan  -9.0500 90s kid -6.3191 special report  -9.0500 90s kids -6.3523 says  -9.0500 90s kids rejoice -6.4277 big  -9.0500 90s sitcom -6.4423 challenged  -9.0500 absolute -6.4465 house Done. Article title: 43 Reasons 2014 Was The Best Year Ever To Be A Nerd (95.13% clickbait, 4.87% news) -> clickbait Article title: Protesters And Police Clash In Missouri For A Second Night (19.32% clickbait, 80.68% news) -> news Article title: 29 Christmas Vines That Will Make You Laugh Every Time (88.25% clickbait, 11.75% news) -> clickbait Article title: New Subprime Boom Ties Risky Loans to Car Titles (10.98% clickbait, 89.02% news) -> news Article title: ^C

About

Simple ML experiment to classify article titles as clickbait or news.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors