Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Jun 22, 2023

Rewrite PyImport_AddModule() to simply call Py_DECREF(), rather than creating a weak reference, to get a borrowed reference to the module.


📚 Documentation preview 📚: https://cpython-previews--105998.org.readthedocs.build/

@vstinner
Copy link
Member Author

@serhiy-storchaka: In gh-86160, you added this code using PyWeakref_New() + PyWeakref_GET_OBJECT() get a borrowed reference, whereas the module is known to be stored in the sys.modules dictionary. What's the rationale for using a weak reference, rather than just using Py_DECREF()?

Rewrite PyImport_AddModule() to simply call Py_DECREF(), rather than creating a weak reference, to get a borrowed reference to the module.
@vstinner
Copy link
Member Author

I reverted the doc change.

@vstinner
Copy link
Member Author

Oh, ./python -m test -j1 -R 3:3 test_import leaks references ("test_import leaked [11, 13, 10] references, sum=34")... but it also leaks in the main branch. Ah! It's unrelated to this PR. Also, ./python -m test -R 3:3 test_import doesn't leak. It's inteesting :-)

@serhiy-storchaka
Copy link
Member

sys.modules can be not a dict, and it is not guaranteed that it keeps reference. It can be a custom mapping with __getitem__ which returns a new object or removes returned object or __setitem__ which does nothing. There is so much unknown. With weakref we can be sure that we get either a reference to live object or NULL. Please do not change this code until you find better solution.

@vstinner
Copy link
Member Author

Also, ./python -m test -j1 -R 3:3 test_import does crash, but again, it's unrelated to this PR. It does also crash in the main branch. It's a recent regression: #91095 (comment)

@vstinner vstinner marked this pull request as draft June 22, 2023 22:38
@vstinner
Copy link
Member Author

PyImport_AddModule() history, oldest to newest:

  • In 1990, commit 85a5fbb (first commit which added C code to CPython, so oldest C code tracked by the Git repository) added new_module() function which uses mtab = sysget("modules"); to get modules. sysget() is implemented as dictlookup(sysdict, name) which looks into static object *sysdict;. This version respected sys.modules :-)

  • In 1990, commit 3f5da24 added static object *modules; variable to Python/import.c, variable used by add_module() (old name of PyImport_AddModule()). sys.modules is no longer used by add_modules().

  • In 1997, commit 25ce566 added PyImport_GetModuleDict() implemented as PyThreadState_Get()->interp. PyImport_AddModules() uses PyObject *modules = PyImport_GetModuleDict(); and then works on this dictionary with PyDict_GetItemString() and PyDict_SetItemString().

  • In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue [subinterpreters] Eliminate PyInterpreterState.modules. #72597.

  • In 2021, commit 4db8988 changes PyImport_AddModule() to use PyWeakref_New() to create a borrowed reference.

  • In 2023, commit b2fc549 changes Python/import.c to use a new MODULES(interp) macro which gets interp->imports.modules.

@vstinner
Copy link
Member Author

In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue GH-72597.

I'm not convinced that GH-72597 was fully implemented. At least, PyImport_AddModule() doesn't support overriding sys.modules at runtime since 1990... so since the creation of Python :-)

I added a test to my PR to check this special case: custom sys.modules which doesn't hold a strong reference to the newly created module.

According to this test, PyImport_AddModule() always uses tstate->interp->imports.modules directly, it doesn't take in account that sys.modules was overriden. Moreover, setting sys.modules at runtime doesn't update this internal tstate->interp->imports.modules member: the sys module has no custom __setattr__() function to update it.

Calling PyImport_AddModule() writes into the "original" modules dict, even if sys.modules was overriden.

@vstinner
Copy link
Member Author

In the main branch, there is a single line which sets PyInterpreter.imports.modules: MODULES(interp) = PyDict_New(); of _PyImport_InitModules(). I don't see any C API, even internal C API, to changes this PyInterpreterState member. And the public C API PyImport_GetModuleDict() just returns this member.

It means that all C code using directly PyInterpreter.imports.modules can only get a dict, not a custom type.

Obviously, C code and Python code getting sys.modules attribute can get any type, not only dict, if sys.modules is overriden. For example, the _pickle extension (implemented in C) gets the sys.modules attribute and then... uses the PyDict C API, oops. It is likely to crash if sys.modules type is not dict or a subclass of dict.

@vstinner
Copy link
Member Author

Well, my goal here was to get rid of the PyWeakref_GET_OBJECT() since I would like to deprecate it. So I wrote a different change: PR #106001 uses _PyWeakref_GET_REF().

But maybe this whole code was always useless and so can be removed, since it seems impossible to get a type other than dict in this code path.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Py_REFCNT any better than PyWeakref_GetObject? I preferred to use later because the former is a macro and more tightly bound to internal representation.

@serhiy-storchaka
Copy link
Member

I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.

@vstinner
Copy link
Member Author

I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.

The problem is that modules have no __setattr__() method which could be used to prevent setting sys.modules to a non-sense value like sys.modules = 123 or sys.modules = ["my", "list"]. Maybe some developers made the wrong assumption that internal C functions really access sys.modules attributes, whereas the implementation access directly a PyInterpreterState member and ignores sys.modules.

@vstinner
Copy link
Member Author

I created issue #106016 "Support defining setattr() and delattr() in a module". It would allow to either disallow setting sys.modules to a type different than dict, or at least to update PyInterpreterState.imports.modules reference.


// gh-86160: PyImport_AddModuleObject() returns a borrowed reference
PyObject *ref = PyWeakref_NewRef(mod, NULL);
// Convert to a borrowed reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Convert to a borrowed reference
// Convert to a borrowed reference.
@vstinner
Copy link
Member Author

I fixed the issue differently: commit ee52158

I don't think that PyImport_AddModuleObject() requires this complicated PyWeakref dance, since the code always gets a strong reference from a Python dict: it's not possible to override PyInterpreterState.imports.modules, and this function access directly PyInterpreterState.imports.modules. But I don't feel the need to change the code. I propose to continue the discussion in the issue #106016.

@vstinner vstinner closed this Jun 26, 2023
@vstinner vstinner deleted the add_module_obj branch June 26, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants