I also had some scanned color pdfs and grayscale pdfs that I wanted to convert to bw. I tried using gs with the code listed here, and image quality is good with pdf text still there. However, that gs code only converts to grayscale (as asked in the question) and still has large file size. convert yields very poor results when used directly.
I wanted bw pdfs with good image quality and small file size. I would have tried terdon's solution, but I could not get pdftk on centOS 7 using yum (at time of writing).
My solution uses gs to extract grayscale bmp files from the pdf, convert to threshold those bmps to bw and save them as tiff files, and then img2pdf to compress the tiff images and merge them all into one pdf.
I tried going directly to tiff from the pdf but the quality is not the same so I save each page to bmp. For a one page pdf file, convert does a great job from bmp to pdf. Example:
gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -r300x300 \ -sOutputFile=./pdf_image.bmp ./input.pdf convert ./pdf_image.bmp -threshold 40% -compress zip ./bw_out.pdf
For multiple pages, gs can merge multiple pdf files into one, but img2pdf yields smaller file size than gs. The tiff files must be uncompressed as input to img2pdf. Keep in mind for large numbers of pages, the intermediate bmp and tiff files tend to be large in size. pdftk or joinpdf would be better if they can merge compressed pdf files from convert.
I imagine there is a more elegant solution. However, my method produces results with very good image quality and much smaller file size. To get text back in the bw pdf, run OCR again.
My shell script uses gs, convert, and img2pdf. Change the parameters (# of pages, scan dpi, threshold %, etc) listed in the beginning as needed, and run chmod +x ./pdf2bw.sh . Here is the full script (pdf2bw.sh):
#!/bin/bash num_pages=12 dpi_res=300 input_pdf_name=color_or_grayscale.pdf bw_threshold=40% output_pdf_name=out_bw.pdf #------------------------------------------------------------------------- gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -q -r$dpi_res \ -sOutputFile=./%d.bmp ./$input_pdf_name #------------------------------------------------------------------------- for file_num in `seq 1 $num_pages` do convert ./$file_num.bmp -threshold $bw_threshold \ ./$file_num.tif done #------------------------------------------------------------------------- input_files="" for file_num in `seq 1 $num_pages` do input_files+="./$file_num.tif " done img2pdf -o ./$output_pdf_name --dpi $dpi_res $input_files #------------------------------------------------------------------------- # clean up bmp and tif files used in conversion for file_num in `seq 1 $num_pages` do rm ./$file_num.bmp rm ./$file_num.tif done