Skip to content

Commit d331314

Browse files
authored
Merge pull request #121 from fieryash/watermark_removal
added a script to remove watermark from documents
2 parents 95aeaa5 + 74bc2d6 commit d331314

File tree

3 files changed

+48
-0
lines changed

3 files changed

+48
-0
lines changed

watermark_removal/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Script to remove watermark from images and pdfs
2+
## How to use:
3+
1. Call the function giving two arguments, the path to the images whose watermark is to be removed and the output path (where you want to store the cleaned images). (This script will remove the watermarks from all jpg images in the folder and store them into another folder "Cleaned")
4+
<br> example :-
5+
<br>```input_folder = "C:/User/Desktop"```
6+
<br>```output_folder = "C:/User/Desktop"```
7+
<br>```watermark_removal(input_file)```
8+
9+
2. If you have a pdf with watermarks to be removed, you need to call the pdf_to_jpg function. This takes two arguments the input folder (where your pdf exists) and the output folder(where the images of each page will be stored).
10+
<br> example :-
11+
<br>```input_folder = "C:/User/Desktop"```
12+
<br>```output_folder = "C:/User/Desktop"```
13+
<br>```pdf_to_jpg(input_folder, output_folder)```
14+
15+
3. After calling the pdf function you can call the watermark_removal function to remove the watermarks from the pdf.

watermark_removal/requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
opencv-python
2+
pdf2image
3+
glob
4+
numpy
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import numpy as np
2+
import glob
3+
import cv2
4+
from pdf2image import convert_from_path
5+
6+
7+
def pdf_to_jpg(path_to_folder, output_path):
8+
9+
for pdf in glob.glob(path_to_folder + "/*.pdf"):
10+
pages = convert_from_path(pdf, 500)
11+
i = 0
12+
13+
for page in pages:
14+
page.save(output_path + "/image%04i.jpg" % i, 'JPEG')
15+
i += 1
16+
17+
18+
def watermark_removal(path_to_folder, output_path):
19+
i = 0
20+
alpha = 2.0
21+
beta = -160
22+
23+
for img1 in glob.glob(path_to_folder + "/*.jpg"):
24+
originalimage = cv2.imread(img1)
25+
imgGrayscale = cv2.cvtColor(originalimage, cv2.COLOR_BGR2GRAY)
26+
imgcleaned = alpha * imgGrayscale + beta
27+
imgcleaned = np.clip(imgcleaned, 0, 255).astype(np.uint8)
28+
cv2.imwrite("Cleaned/image%03i.jpg" % i, imgcleaned)
29+
i += 1

0 commit comments

Comments
 (0)