Skip to content

tiendc/go-profanity-out

Repository files navigation

Go Version GoDoc Build Status Coverage Status GoReport

Profanity detection library

This project is inspired by github.com/TwiN/go-away and github.com/finnbear/moderation. It also uses language data from the libs (with modifications).

This project is still in development and more tests are needed to ensure the accuracy. However, you may use it in your work as it can produce good results.

Highlights

  • Fully supports Unicode
  • Utilizes radix tree to improve performance

Installation

go get github.com/tiendc/go-profanity-out

How to use

import ( profanityout "github.com/tiendc/go-profanity-out" profanityDataEN "github.com/tiendc/go-profanity-out/data/en" ) detector := profanityout.NewProfanityDetector(). WithProfaneWords(profanityDataEN.DefaultProfanities). // required WithFalsePositiveWords(profanityDataEN.DefaultFalsePositives). // required WithSuspectWords(profanityDataEN.DefaultSuspects). // required WithLeetSpeakCharacters(profanityDataEN.LeetSpeakCharacters). // required WithSpecialCharacters(profanityDataEN.SpecialCharacters). // required WithWildcardCharacters(profanityDataEN.WildcardCharacters). // required WithSanitizeLeetSpeak(true). // default: true WithSanitizeSpecialCharacters(true). // default: true WithSanitizeSpaces(true). // default: true WithSanitizeRepeatedCharacters(true). // default: true WithSanitizeWildcardCharacters(true). // default: true WithSanitizeAccents(true). // default: true WithProcessInputAsHTML(false). // default: false WithConfidenceCalculator(calculator). // default: built-in WithCensorCharacter('*') // default: * // Scan for at most one profanity (result may contain found suspect words and/or false positives) matches := detector.ScanProfanity("fuck this $h!!t") // profane: true // Scan for all profanities matches := detector.ScanAllProfanities("fuck this $h!!t") // profane: true // Censor the profanities res, matches := detector.Censor("fuck this $h!!t") // res == "**** this *****" // WithSanitizeLeetSpeak: true ScanProfanity("$h!t") // profane: true // WithSanitizeLeetSpeak: false ScanProfanity("$h!t") // profane: false // WithSanitizeSpecialCharacters: true ScanProfanity("sh_it") // profane: true // WithSanitizeSpecialCharacters: false ScanProfanity("sh_it") // profane: false // WithSanitizeSpaces: true ScanProfanity("f u c k") // profane: true // WithSanitizeSpaces: false ScanProfanity("f u c k") // profane: false // WithSanitizeRepeatedCharacters: true ScanProfanity("fuuuuck") // profane: true // WithSanitizeRepeatedCharacters: false ScanProfanity("fuuuuck") // profane: false // WithSanitizeWildcardCharacters: true // NOTE: wildcard characters can be in both input and/or profanity dictionary ScanProfanity("f**k") // profane: true WithProfaneWords([]string{"f*ck"}).ScanProfanity("fxck") // profane: true WithProfaneWords([]string{"*fuck*"}).ScanProfanity("xfuckx") // profane: true // WithSanitizeWildcardCharacters: false ScanProfanity("f**k") // profane: false ScanProfanity("fxck") // profane: false ScanProfanity("xfuckx") // profane: false // WithSanitizeAccents: true ScanProfanity("fúck") // profane: true // WithSanitizeAccents: false ScanProfanity("fúck") // profane: false // WithProcessInputAsHTML: true ScanProfanity("<ock") // profane: true // WithProcessInputAsHTML: false ScanProfanity("<ock") // profane: false

Benchmarks

Benchmark code

tiendc/go-profanity-out tiendc/go-profanity-out-10 9024 129919 ns/op 44038 B/op 306 allocs/op TwiN/go-away TwiN/go-away-10 2745 415685 ns/op 444899 B/op 498 allocs/op finnbear/moderation finnbear/moderation-10 15432 77601 ns/op 2496 B/op 22 allocs/op

Help wanted

  • You are welcome to make pull requests for new functions and bug fixes.
  • It's really nice if you can add more input data for English and other languages.

License