Report On The Investigation Into
Russian Interference In The
2016 Presidential Election

Volume 1 to 11

Special Counsel Robert S. Mueller, III

2019

This document was generated with the Tesseract Optical Character Recognition engine. Because of the poor quality of the original pdf and the numerous redacted sections, the text contains many errors and should not be regarded as the definitive text. This document should be used by those who wish to process the text at scale. The difficulties associated with the OCR process also signals the failure of the Justice Department to accommodate citizens using screen readers to access this document.

from wand.image import Image from PIL import Image as PI import pyocr import pyocr.builders import io import os from PyPDF2 import PdfFileWriter, PdfFileReader

#splits full report into pages inputpdf = PdfFileReader(open("report.pdf", "rb")) for i in range(inputpdf.numPages): output = PdfFileWriter() output.addPage(inputpdf.getPage(i)) with open(f"pages/report-page{i:03}.pdf", "wb") as outputStream: output.write(outputStream)

#reports list of images file_list = [] for path, subdirs, files in os.walk("pages"): # change depending on system for file in files: a = os.path.join(file) file_list.append(a) file_list = sorted(file_list)

tool = pyocr.get_available_tools()[0] lang = tool.get_available_languages()[0]

image = [] text = []

for file in file_list: pdf = Image(filename="pages/"+file, resolution=300) jpg = pdf.convert('jpeg') img_page = Image(image=jpg) image.append(img_page.make_blob('jpeg'))

for img in image: txt = tool.image_to_string( PI.open(io.BytesIO(img)), lang=lang, builder=pyocr.builders.TextBuilder()) text.append(txt)

with open("report.txt", "w") as file: for page in text: file.write(page)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
Mueller Preprocessing.ipynb		Mueller Preprocessing.ipynb
README.md		README.md
report.txt		report.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Report On The Investigation Into
Russian Interference In The
2016 Presidential Election

Volume 1 to 11

Special Counsel Robert S. Mueller, III

2019

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Report On The Investigation IntoRussian Interference In The2016 Presidential Election

Volume 1 to 11

Special Counsel Robert S. Mueller, III

2019

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Report On The Investigation Into
Russian Interference In The
2016 Presidential Election

Packages