Malloc error when trying to create larger PDFs from a template PDF with acrobat form fields #4802

ODAFF-Zane · 2025-11-24T13:02:09Z

ODAFF-Zane
Nov 24, 2025

I have a template document that has form fields on it, I'm taking these form fields and wanting to fill in the data with information, then keep track of them in order in a master document, then return that document. This then gets fed into an API request, and returned back to js for printing. The current code I have looks like this:

import pymupdf import base64 pdf_bytes = base64.b64decode(pdf) master_doc = pymupdf.open() # New Document for update in updates: doc = pymupdf.open(stream=pdf_bytes, filetype="pdf") for page in doc: widgets = page.widgets() or [] for w in widgets: # print(w.field_name) if w.field_name in update: w.field_value = update[w.field_name] w.field_flags = w.field_flags | 1 # Set Read Only BitMask to true w.update()# MUST ALWAYS BE CALLED WHEN UPDATING WIDGETS # Append updated PDF to master master_doc.insert_pdf(doc) updated_pdf_bytes = master_doc.write() pdf_base64 = base64.b64encode(updated_pdf_bytes).decode("utf-8") return pdf_base64

where updates are the key-value pair or updates to be applied, and pdf is the template pdf saved in an Azure Storage Blob.

I have tried various forms of baking, copying the coords of the widgets and remaking them as textboxes. I am getting this error normally:

"Traceback (most recent call last): File "/usr/python-runtime/back/projects/python-runtime/runner/bakery_safe_env/lib/python3.11/site-packages/pymupdf/mupdf.py", line 54886, in pdf_graft_mapped_object pymupdf.mupdf.FzErrorSystem: code=2: malloc (578833 bytes) failed "

I am not sure if this is an issue with my Docker container where this runs, or if there is a limitation on file size through pymupdf that is causing my issue.

Any help or ideas are appreciated.

Answered by JorjMcKie

Nov 24, 2025

Ok, thanks for the clarifications.
Then - before doing .insert_pdf() - you should bake() the source. Then code the the PDF insertion as

target.insert_pdf(source, annots=False, widgets=False, links=False, final=True)

This will at a minimum speed up the method - maybe also solve a few other issues.
You may also consider saving / recycling the target to help keeping memory requirements under control, e.g. after every 10th insert_pdf(). Rough idea

for i, update in enumerate(updates): doc = pymupdf.open(template_path) for page in doc: widgets = page.widgets() or [] for w in widgets: print(w.field_name) if w.field_name in update: w.

View full answer

JorjMcKie · 2025-11-24T13:05:43Z

JorjMcKie
Nov 24, 2025
Maintainer

Hard to say something without more details.
Can you reproduce the problem outside docker?

2 replies

ODAFF-Zane Nov 24, 2025
Author

A quick write of this

import pymupdf import base64 # Template update object base_update = { '# of products': '1', 'Contact': '2', 'Email': '3', 'Phone': '4', 'reference#': '5', 'City, State/Country, Zip': '6', 'Address Line 2': '7', 'Address Line 1': '8', 'Company Name': '9', 'Certificate Date': '10', 'Admin Email': '11', } # Create 100 identical update objects updates = [base_update.copy() for _ in range(100)] # pdf_bytes = base64.b64decode(pdf) template_path = r"file_location" master_doc = pymupdf.open() # New Document for update in updates: doc = pymupdf.open(template_path) for page in doc: widgets = page.widgets() or [] for w in widgets: print(w.field_name) if w.field_name in update: w.field_value = update[w.field_name] w.field_flags = w.field_flags | 1 # Set Read Only BitMask to true w.update()# MUST ALWAYS BE CALLED WHEN UPDATING WIDGETS # Append updated PDF to master master_doc.insert_pdf(doc) updated_pdf_bytes = master_doc.write() pdf_base64 = base64.b64encode(updated_pdf_bytes).decode("utf-8") # print(pdf_base64)

works and prints out the pdf_base64

ODAFF-Zane Nov 24, 2025
Author

The final length of the bytes for the final output for the document is 54876778 bytes.

JorjMcKie · 2025-11-24T13:49:50Z

JorjMcKie
Nov 24, 2025
Maintainer

The MuPDF message refers to a function called "graft_mapped_object" or similar.
This is internally called inside insert_pdf() and prevents multiple copies of the same source PDF objects in the target.
This is of significance primarily for multi-source-page insertions. You seem to append source pages one by one, each in a separate .insert_pdf invocation. It is probably beneficial if you specify final=True to let the grafting algorithm know that there is nothing more to come.

Your comment is not entirely clear, but it seems that your target PDF needs not be a Form PDF (i.e. having fillable fields). If so, then you could bake() the source PDF before insertion into the target and excluding
annots and widgets in method .insert_pdf().
This is probably advisable anyway because otherwise a real lot of logic becomes active that deal with making (or keeping) the target PDF a Form PDF. And this logic will get under stress because you seem to add the same source page over and over again ... As you probably know, form fields must have unique field names. I guess you can imagine what your approach is causing here (disclaimer: I may be entirely misunderstanding what is happening).

When you are done, by all means specify garbage collection and compression options in the .write()

1 reply

ODAFF-Zane Nov 24, 2025
Author

To be clear with what I am trying to achieve,

I have a template PDF that has form fields, I dynamically insert values into those form fields, after they have been inserted I no longer need the form field, all I care about is the text.

I then need to add another identically copy of the template PDF and fill it with different data, however its the same template PDF. So basically I have a big document of the same template PDFs, but all I care about is the text that actually goes into it, not the forms themselves for the output.

I am currently using the javascript library PDF lib, and it does well, but has limitations when it comes to emailing the document and printing of said document due to what you said, non unique form names. PyMuPDF in all of my testing has done well with printing, but right now i am only running into this malloc issue. I am going to add more resources to the server and see if that fixes, just wasn't sure if there was an advisable solution to this.

JorjMcKie · 2025-11-24T15:22:44Z

JorjMcKie
Nov 24, 2025
Maintainer

Ok, thanks for the clarifications.
Then - before doing .insert_pdf() - you should bake() the source. Then code the the PDF insertion as

target.insert_pdf(source, annots=False, widgets=False, links=False, final=True)

This will at a minimum speed up the method - maybe also solve a few other issues.
You may also consider saving / recycling the target to help keeping memory requirements under control, e.g. after every 10th insert_pdf(). Rough idea

for i, update in enumerate(updates): doc = pymupdf.open(template_path) for page in doc: widgets = page.widgets() or [] for w in widgets: print(w.field_name) if w.field_name in update: w.field_value = update[w.field_name] w.field_flags = w.field_flags | 1 # Set Read Only BitMask to true w.update()# MUST ALWAYS BE CALLED WHEN UPDATING WIDGETS doc.bake() # Append updated PDF to master master_doc.insert_pdf(doc, links=False, annots=Fals, widegets=False, final=True) doc.close() if i and i % 10 == 0: # recycle master freeing some resources underway data = master_doc.write(garbage=3, deflate=True) master_doc.close() master_doc = pymupdf.open("pdf", data) updated_pdf_bytes = master_doc.write(garbage=3, deflate=True) pdf_base64 = base64.b64encode(updated_pdf_bytes).decode("utf-8")

1 reply

ODAFF-Zane Nov 24, 2025
Author

Thank you, I appreciate it and will give this a try in the coming weeks. I will mark this as answered for now and reopen this discussion if I continue to have trouble after upgrading server resources and trying this solution. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Malloc error when trying to create larger PDFs from a template PDF with acrobat form fields #4802

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Malloc error when trying to create larger PDFs from a template PDF with acrobat form fields #4802

Uh oh!

ODAFF-Zane Nov 24, 2025

Replies: 3 comments · 4 replies

Uh oh!

JorjMcKie Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

ODAFF-Zane Nov 24, 2025 Author

Uh oh!

ODAFF-Zane Nov 24, 2025 Author

Uh oh!

JorjMcKie Nov 24, 2025 Maintainer

Uh oh!

ODAFF-Zane Nov 24, 2025 Author

Uh oh!

JorjMcKie Nov 24, 2025 Maintainer

Uh oh!

ODAFF-Zane Nov 24, 2025 Author

ODAFF-Zane
Nov 24, 2025

Replies: 3 comments 4 replies

JorjMcKie
Nov 24, 2025
Maintainer

ODAFF-Zane Nov 24, 2025
Author

ODAFF-Zane Nov 24, 2025
Author

JorjMcKie
Nov 24, 2025
Maintainer

ODAFF-Zane Nov 24, 2025
Author

JorjMcKie
Nov 24, 2025
Maintainer

ODAFF-Zane Nov 24, 2025
Author