Skip to content

Conversation

@JaniniRami
Copy link

Near Duplicate Files Remover

Brief about script
After running the script, it crawls a given directory returning all the files in the directory, after that it generates a hash for every file
and save them in a pandas dataframe hashtable. Then, The script looks for similar hashes in the hashtable and deletes all the files with the
similar hashtable only keeping the original one.

Issue no. - #770

Self Check(Tick After Making pull Request)

  • This issue was assigned to me.
  • One Change in one Pull Request
  • My file is in proper folder (Name of folder should be in lowercase with no space in between) (E.g. meet_schedular)
  • I am following clean code and Documentation and my code is well linted with flake8.
  • I have added README.md and requirements.txt (Include version numbers too e.g. pandas==0.0.1) with my script
  • I have used REPO README TEAMPLATE (Necessary)
  • Just including required dependencies in requirements.txt (Don't include Python version too)

If issue was not assigned to you Please don't make a PR. It will marked as invalid.

@JaniniRami JaniniRami changed the title Janinirami Added a near duplicate files remover Oct 19, 2021
@pawangeek pawangeek linked an issue Oct 19, 2021 that may be closed by this pull request
1 task
@pawangeek pawangeek merged commit 973396e into python-geeks:main Oct 19, 2021
@pawangeek pawangeek added the hacktoberfest-accepted Supporting completion of hacktober fest label Oct 19, 2021
@JaniniRami JaniniRami deleted the janinirami branch October 19, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest-accepted Supporting completion of hacktober fest

2 participants