Malloc error when trying to create larger PDFs from a template PDF with acrobat form fields #4802
-
| I have a template document that has form fields on it, I'm taking these form fields and wanting to fill in the data with information, then keep track of them in order in a master document, then return that document. This then gets fed into an API request, and returned back to js for printing. The current code I have looks like this: import pymupdf import base64 pdf_bytes = base64.b64decode(pdf) master_doc = pymupdf.open() # New Document for update in updates: doc = pymupdf.open(stream=pdf_bytes, filetype="pdf") for page in doc: widgets = page.widgets() or [] for w in widgets: # print(w.field_name) if w.field_name in update: w.field_value = update[w.field_name] w.field_flags = w.field_flags | 1 # Set Read Only BitMask to true w.update()# MUST ALWAYS BE CALLED WHEN UPDATING WIDGETS # Append updated PDF to master master_doc.insert_pdf(doc) updated_pdf_bytes = master_doc.write() pdf_base64 = base64.b64encode(updated_pdf_bytes).decode("utf-8") return pdf_base64where updates are the key-value pair or updates to be applied, and pdf is the template pdf saved in an Azure Storage Blob. I have tried various forms of baking, copying the coords of the widgets and remaking them as textboxes. I am getting this error normally:
I am not sure if this is an issue with my Docker container where this runs, or if there is a limitation on file size through pymupdf that is causing my issue. Any help or ideas are appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
| Hard to say something without more details. |
Beta Was this translation helpful? Give feedback.
-
| The MuPDF message refers to a function called "graft_mapped_object" or similar. Your comment is not entirely clear, but it seems that your target PDF needs not be a Form PDF (i.e. having fillable fields). If so, then you could bake() the source PDF before insertion into the target and excluding When you are done, by all means specify garbage collection and compression options in the |
Beta Was this translation helpful? Give feedback.
-
| Ok, thanks for the clarifications.
This will at a minimum speed up the method - maybe also solve a few other issues. for i, update in enumerate(updates): doc = pymupdf.open(template_path) for page in doc: widgets = page.widgets() or [] for w in widgets: print(w.field_name) if w.field_name in update: w.field_value = update[w.field_name] w.field_flags = w.field_flags | 1 # Set Read Only BitMask to true w.update()# MUST ALWAYS BE CALLED WHEN UPDATING WIDGETS doc.bake() # Append updated PDF to master master_doc.insert_pdf(doc, links=False, annots=Fals, widegets=False, final=True) doc.close() if i and i % 10 == 0: # recycle master freeing some resources underway data = master_doc.write(garbage=3, deflate=True) master_doc.close() master_doc = pymupdf.open("pdf", data) updated_pdf_bytes = master_doc.write(garbage=3, deflate=True) pdf_base64 = base64.b64encode(updated_pdf_bytes).decode("utf-8") |
Beta Was this translation helpful? Give feedback.
Ok, thanks for the clarifications.
Then - before doing
.insert_pdf()- you shouldbake()the source. Then code the the PDF insertion astarget.insert_pdf(source, annots=False, widgets=False, links=False, final=True)This will at a minimum speed up the method - maybe also solve a few other issues.
You may also consider saving / recycling the target to help keeping memory requirements under control, e.g. after every 10th insert_pdf(). Rough idea