Python Outlier Detection (PyOD)

Note: the project is still under development as of Feb 7th 2018.

Quick Introduction

PyOD is a Python-based toolkit to identify anomalies in data with unsupervised and supervised approach. The toolkits consist of two major functionalities:

Individual Algorithms
1. Local Outlier Factor (wrapped on sklearn implementation)
2. Isolation Forest (wrapped on sklearn implementation)
3. One-Class support vector machines (wrapped on sklearn implementation)
4. KNN Outlier Detection (implemented)
5. Average KNN Outlier Detection (implemented)
6. Median KNN Outlier Detection (implemented)
7. Global-Local Outlier Score From Hierarchies (implemented)
8. More to add
Ensemble Framework
1. Feature bagging
2. More to add

Before using the toolkit, please be advised the purpose of the tool is only for quick exploration. Using it as the final result should be understood with cautions. Fine-tunning may be needed to generate meaningful solution. I would recommend to use this as the first-step data exploration tool, and build your model/reuse the this model to get more accurate results.

Quick Start

"example.py" is an example to demo the basic API of PyOD. It first generate some sample data to run. normal data is generated by a 2-d gaussian distribution, and outliers are generated by a 2-d uniform distribution.

# percentage of outliers contamination = 0.1 n_train = 1000 n_test = 500 # generate sample data X_train, y_train, c_train, X_test, y_test, c_test = generate_data(n=n_train, contamination=contamination, n_test=n_test)

Then it initializes the classifier, fit the model, and make the prediction.

# train a HBOS detector clf = Hbos(contamination=0.1) clf.fit(X_train) # get the outlier score of the training data y_train_pred = clf.y_pred y_train_score = clf.decision_scores # make the prediction on the test data y_test_pred = clf.predict(X_test) y_test_score = clf.decision_function(X_test)

The evaluation of the data is generated by:

print('Precision@n on train data is', get_precn(y_train, y_train_score)) print('ROC on train data is', roc_auc_score(y_train, y_train_score)) print('Precision@n on test data is', get_precn(y_test, y_test_score)) print('ROC on test data is', roc_auc_score(y_test, y_test_score))

Here is a sample output:

Precision@n on train data is 0.78 ROC on train data is 0.9360 Precision@n on test data is 0.8780 ROC on test data is 0.9872

To check the result of the classification visually:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
figures		figures
models		models
pyador		pyador
utility		utility
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt
example.py		example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Outlier Detection (PyOD)

Note: the project is still under development as of Feb 7th 2018.

Quick Introduction

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Python Outlier Detection (PyOD)

Note: the project is still under development as of Feb 7th 2018.

Quick Introduction

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages