1

I have a data set consisting of census data (age, sex, employment type, race, education level etc.). My task is to write an algorithm that predicts whether a data point (30, male, white etc.) will have a gross annual income of above $50k.

So far I implemented a KNN algorithm that runs for 30 hours, but achieves ~90% accuracy on test data. I was hoping to achieve higher accuracy using a SVM algorithm, or Naive Bayes, or anything else that might work here.

I'm looking for an algorithm that will be relatively simple to implement(about as hard as KNN) in python, and is likely to achieve good accuracy. What is the best choice in this case? If KNN is the best choice, which algorithm will be easiest to implement for comparison purposes?

4
  • If the prediction is continuous why are not trying regression? Commented May 1, 2019 at 4:27
  • I only need to predict if its above or below 50k, rather than giving an estimate, so it can be treated as a classification problem. Also, I have the income for the test data as "<=50k" or ">50k" Commented May 1, 2019 at 4:38
  • You could always try simple NNs with Keras. Here's a quick starter post here Commented May 1, 2019 at 5:08
  • Try Random Forrest and Ensemble methods Commented May 1, 2019 at 5:09

1 Answer 1

1

It is hard to tell a priori which algorithm will perform better. Usually, for traditional classification tasks such as yours, random forest, gradient boosted machines and SVM are often giving the best results.

I dont' know what you mean by looking for an algorithm that is "relatively simple to implement", but if you use scikit-learn, a lot of algorithms are already implemented and will fit in one or two lines of code so you can try them all!

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.