GitHub - visual-layer/fastdup at 53fa4ed6396672886fdb78727aa0d107ebc69dde

Name	Name	Last commit message	Last commit date
Latest commit History 854 Commits
.github	.github
examples	examples
fastdup	fastdup
gallery	gallery
src	src
.gitignore	.gitignore
CLOUD.md	CLOUD.md
Dockerfile	Dockerfile
INSTALL.md	INSTALL.md
LICENSE	LICENSE
README.md	README.md
RELEASE_NOTES.md	RELEASE_NOTES.md
RUN.md	RUN.md

Manage, Clean & Curate Visual Data - Fast and at Scale.

An unsupervised and free tool for image and video dataset analysis.
Explore the docs »
Features · Report Bug · Read Blog · Quickstart · Enterprise Edition · About us

🚀 Introducing VL Profiler! 🚀 We're excited to announce our new cloud product, VL Profiler. It's designed to help you gain deeper insights and enhance your productivity while using fastdup. With VL Profiler, you can visualize your data, track changes over time, and much more. 👉 Check out VL Profiler here 👈

Note: VL Profiler is a separate commercial product developed by the same team behind fastdup. Our goal with VL Profiler is to provide additional value to our users while continuing to support and maintain fastdup as a free, open-source project. We'd love for you to give VL Profiler a try and share your feedback with us! Sign-up now, it's free.

What's included in fastdup

fastdup handles both labeled and unlabeled image/video datasets, helping you to discover potential quality concerns while providing extra functionalities.

Why fastdup?

Quality: Find and remove anomalies and outliers from your dataset, including duplicates and similar images and videos at a large scale.
Cost: Reduce data operation costs by intelligently sampling high-quality or novel datasets before labeling and assessing labeled data quality.
Scale: fastdup's C++ graph engine is highly efficient and can handle up to 400M images on a single CPU machine.

Setting up

Prerequisites

Supported Python versions:

Supported operating systems:

Installation

Option 1 - Install fastdup via PyPI:

# upgrade pip to its latest version pip install -U pip # install fastdup pip install fastdup # Alternatively, use explicit python version (XX) python3.XX -m pip install fastdup

Option 2 - Install fastdup via an Ubuntu 20.04 Docker image on DockerHub:

docker pull karpadoni/fastdup-ubuntu-20.04

Detailed installation instructions and common errors here.

Getting Started

Run fastdup with only 3 lines of code.

Visualize the result.

In short, you'll need 3 lines of code to run fastdup:

import fastdup fd = fastdup.create(input_dir="IMAGE_FOLDER/") fd.run()

And 5 lines of code to visualize issues:

fd.vis.duplicates_gallery() # create a visual gallery of duplicates fd.vis.outliers_gallery() # create a visual gallery of anomalies fd.vis.component_gallery() # create a visualization of connected components fd.vis.stats_gallery() # create a visualization of images statistics (e.g. blur) fd.vis.similarity_gallery() # create a gallery of similar images

View the API docs here.

Learn from Examples

	Quick Dataset Analysis: In this example, learn how to quickly analyze a dataset for potential issues. Identify duplicates, outliers, dark/bright/blurry images, and cluster similar images with only a few lines of code. If you're new, start here.



	DINOv2 Embeddings: In this example, learn how to use DINOv2 models to visualize image embeddings of your dataset. Runs on CPU!



	Cleaning Image Dataset: In this tutorial, learn how to clean a dataset from broken images, duplicates, outliers, and identify dark/bright/blurry images.



	Analyzing Labeled Image Classification Dataset: In this tutorial, learn how to analyze a labeled image classification dataset for potential issues. We use the Imagenette dataset, a 10-class, 13k image subset of ImageNet as a working example.



	Analyzing Labeled Object Detection Dataset: In this tutorial learn how to load and analyze an object detection dataset with labeled bounding boxes and classes. We use the mini-coco dataset as a working example. Learn how to discover duplicates, outliers, and possible mislabeled bounding boxes.



	Analyzing Hugging Face Datasets: In this tutorial learn how to load and analyze datasets from Hugging Face Datasets.

Advanced Features

The following are advanced functionalities of fastdup which are still in the beta testing phase. Sign up for free to be a beta tester and get early access. Drop us an email at info@visual-layer.com .

	Face Detection Video Analysis: In this tutorial, learn how to use fastdup with a face detection model to detect and crop from videos. Following that we analyze the cropped faces for issues such as duplicates, near-duplicates, outliers, bright/dark/blurry faces.



	YOLOv5 Object Detection Video Analysis: In this tutorial, learn how to use fastdup with a pre-trained yolov5 object detection model to detect and crop from videos. Following that we analyze the cropped objects for issues such as duplicates, near-duplicates, outliers, bright/dark/blurry objects.



	Satellite Image Analysis: In this tutorial, learn how to use fastdup to load 16-bit grayscale satellite image, work with rotated bounding boxes, understand your dataset, find issues with the data and check the quality of annotations.



	Surveillance Camera Analysis: In this tutorial, learn how to use fastdup to analyze surveillance camera videos, caption the activity inside the videos and detect indoor/ outdoor.



	Image Search: In this tutorial, learn how to use fastdup to search through large image datasets for duplicates/similar images using a query image. Runs on CPU!



	Feature vectors: In this tutorial, learn how to read fastdup generated feature vectors in Python and use them for downstream processing, or run fastdup on your calculated feature vectors.

Getting Help

Get help from the fastdup team or community members via the following channels -

Slack.
GitHub issues.
Discussion forum.

Community Contributions

The following are community-contributed blog posts about fastdup -

What our users say

License

fastdup is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.

See LICENSE.

For any queries, reach us at info@visual-layer.com

Disclaimer

Usage Tracking

We have added an experimental crash report collection, using sentry.io. It does not collect user data other than anonymized IP address data, and it only logs fastdup library's own actions. We do NOT collect folder names, user names, image names, image content only aggregate performance statistics like total number of images, average runtime per image, total free memory, total free disk space, number of cores, etc. Collecting fastdup crashes will help us improve stability.

The code for the data collection is found here. On MAC we use Google crashpad.

It is always possible to opt out of the experimental crash report collection via either of the following two options:

Define an environment variable called SENTRY_OPT_OUT
or run() with turi_param='run_sentry=0'

About Visual-Layer

fastdup is founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush.

Learn more about Visual Layer here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Manage, Clean & Curate Visual Data - Fast and at Scale.

What's included in fastdup

Why fastdup?

Setting up

Prerequisites

Installation

Getting Started

Learn from Examples

Advanced Features

Getting Help

Community Contributions

What our users say

License

Disclaimer

About Visual-Layer

About

Uh oh!

Releases 136

Packages

Uh oh!

Contributors 19

Languages

License

visual-layer/fastdup

Folders and files

Latest commit

History

Repository files navigation

Manage, Clean & Curate Visual Data - Fast and at Scale.

What's included in fastdup

Why fastdup?

Setting up

Prerequisites

Installation

Getting Started

Learn from Examples

Advanced Features

Getting Help

Community Contributions

What our users say

License

Disclaimer

About Visual-Layer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 136

Packages 0

Uh oh!

Contributors 19

Languages

Packages