An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
- Updated
Dec 4, 2025 - Python
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
Mediapipe-based library to redact faces from videos and images
A curated list of resources related to privacy engineering
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Baseline Recipe for VoicePrivacy Challenge 2022: anonymization systems and evaluation software
Deidentify people's names and gender specific pronouns
DICOM gateway for publishing images in Kheops and for de-identification
A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.
A python client used to interact with the Private AI's API
Masking identifiable information from health related documents.
CliniDeID automatically de-identifies clinical text notes according to the HIPAA Safe Harbor method. It accurately finds identifiers and tags or replaces them with realistic surrogates for better anonymity.
Application of our De-identification Framework with open source technologies, enabling enterprises to take ownership of the de-identification process and deploy it in trusted environments.
PII Anonymizer service based on python with FastAPI
A pre-commit hook to check for PII in your code.
Named entity recognition framework
가명처리 라이브러리
Source code for the paper "Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records" in Future Internet (2021).
An named-entity-recognition (NER) based anonymizer for archival documents metadata.
Add a description, image, and links to the de-identification topic page so that developers can more easily learn about it.
To associate your repository with the de-identification topic, visit your repo's landing page and select "manage topics."