Skip to content

WIP: copy proxy_host and savedata_data#909

Draft
sudden6 wants to merge 2 commits intoTokTok:masterfrom
sudden6:internal_malloc
Draft

WIP: copy proxy_host and savedata_data#909
sudden6 wants to merge 2 commits intoTokTok:masterfrom
sudden6:internal_malloc

Conversation

@sudden6
Copy link

@sudden6 sudden6 commented Jun 14, 2018

fixes #908


This change is Reviewable

@sudden6 sudden6 changed the title copy proxy_host and savedata_data [WIP] copy proxy_host and savedata_data Jun 14, 2018
@sudden6
Copy link
Author

sudden6 commented Jun 14, 2018

Is removing const allowed without breaking the API?

@nurupo
Copy link
Member

nurupo commented Jun 15, 2018

Why are you removing const though?


Reviewed 2 of 3 files at r1.
Review status: 0 of 1 LGTMs obtained


toxcore/tox.h, line 625 at r1 (raw file):

 * outlive the options object. */ const uint8_t *savedata_data;

Should be kept as const.


toxcore/tox.h, line 692 at r1 (raw file):

void tox_options_set_savedata_type(struct Tox_Options *options, TOX_SAVEDATA_TYPE type); const uint8_t *tox_options_get_savedata_data(const struct Tox_Options *options);

This const should not be removed. The function returns a pointer to the internal data, we want to return it as const, we don't want the user to modify the internal data of the Tox_Options struct. The user can make a copy if they need to modify it.


toxcore/tox.h, line 694 at r1 (raw file):

const uint8_t *tox_options_get_savedata_data(const struct Tox_Options *options); void tox_options_set_savedata_data(struct Tox_Options *options, const uint8_t *data, size_t length);

This const should not be removed. It just says that the function won't modify data array.


toxcore/tox.api.h, line 565 at r1 (raw file):

 * outlive the options object. */ const uint8_t[length] data;

This should remain const.


toxcore/tox_api.c, line 66 at r1 (raw file):

{ const size_t hostname_len = strlen(host); options->proxy_host = (char *) calloc(hostname_len + 1, sizeof(char));

calloc can fail. Perhaps we should add an error enum to the function arguments to indicate success/failure of the function call to the user.


toxcore/tox_api.c, line 79 at r1 (raw file):

void tox_options_set_savedata_data(struct Tox_Options *options, const uint8_t *data, size_t length) { options->savedata_data = (uint8_t *) malloc(length);

malloc can fail. Perhaps we should add an error enum to the function arguments to indicate success/failure of the function call to the user.


Comments from Reviewable

@sudden6
Copy link
Author

sudden6 commented Jun 15, 2018

Review status: 0 of 1 LGTMs obtained


toxcore/tox.h, line 625 at r1 (raw file):

Previously, nurupo wrote…

Should be kept as const.

If I leave this const, I get the same error as here: https://travis-ci.org/TokTok/c-toxcore/builds/392474765#L1419 since free() apparently can't work on const pointers?


toxcore/tox.h, line 692 at r1 (raw file):

Previously, nurupo wrote…

This const should not be removed. The function returns a pointer to the internal data, we want to return it as const, we don't want the user to modify the internal data of the Tox_Options struct. The user can make a copy if they need to modify it.

This was not intentional, but I have no idea how to tell apidsl to make getters return a const pointer, even if the struct member is not const.


toxcore/tox.h, line 694 at r1 (raw file):

Previously, nurupo wrote…

This const should not be removed. It just says that the function won't modify data array.

same


toxcore/tox.api.h, line 565 at r1 (raw file):

Previously, nurupo wrote…

This should remain const.

See the above issue with free


toxcore/tox_api.c, line 66 at r1 (raw file):

Previously, nurupo wrote…

calloc can fail. Perhaps we should add an error enum to the function arguments to indicate success/failure of the function call to the user.

agreed, but I think this breaks the API right? or at minimum the clients need to check additional things then


toxcore/tox_api.c, line 79 at r1 (raw file):

Previously, nurupo wrote…

malloc can fail. Perhaps we should add an error enum to the function arguments to indicate success/failure of the function call to the user.

same


Comments from Reviewable

@nurupo
Copy link
Member

nurupo commented Jun 15, 2018

Review status: 0 of 1 LGTMs obtained


toxcore/tox.h, line 625 at r1 (raw file):

Previously, sudden6 wrote…

If I leave this const, I get the same error as here: https://travis-ci.org/TokTok/c-toxcore/builds/392474765#L1419 since free() apparently can't work on const pointers?

Oh, I see, this is the internal struct tox.c uses, so this savedata_data is not being copied as it is the case with the setter function but is used directly? You should not free it, the user might have passed a stack-allocated savedata_data, one which was not allocated with malloc/calloc, freeing that would be very bad.


Comments from Reviewable

@nurupo
Copy link
Member

nurupo commented Jun 15, 2018

I think we should postponed this change until Tox_Options becomes an opaque pointer. See my comment in #908.

@sudden6
Copy link
Author

sudden6 commented Jun 16, 2018

I think now this is not possible without breaking the API and should be postponed until the next API break.


Review status: 0 of 1 LGTMs obtained


toxcore/tox.h, line 625 at r1 (raw file):

but is used directly?

No, my intention was that savedata_data is allocated by toxcore and the data copied from the user supplied pointer.


Comments from Reviewable

@iphydf iphydf added this to the v0.2.x milestone Jun 24, 2018
@iphydf iphydf changed the title [WIP] copy proxy_host and savedata_data WIP: copy proxy_host and savedata_data Jul 21, 2018
@iphydf
Copy link
Member

iphydf commented Dec 23, 2021

This can now be completed because we got rid of apidsl.

@sudden6 sudden6 force-pushed the internal_malloc branch 2 times, most recently from 0237d8e to 531740e Compare December 25, 2021 22:24
@sudden6
Copy link
Author

sudden6 commented Dec 25, 2021

@iphydf updated, but I'm not sure how we should do the error handling in setters for malloc. I forgot to think about this in the last PR. I'm also not sure how the semantics of this function should be in case the user wants to unset the proxy.

@JFreegman
Copy link
Member

I'm not sure how we should do the error handling in setters for malloc.

Why not just make it behave like the other API functions and set an error code?

Copy link
Member

@JFreegman JFreegman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @nurupo and @sudden6)


toxcore/tox_api.c, line 80 at r2 (raw file):

void tox_options_set_proxy_host(struct Tox_Options *options, const char *host) { const size_t hostname_len = strlen(host);

Instead of calling strlen() maybe the caller could supply the length. toxcore doesn't generally deal with C strings.


toxcore/tox_api.c, line 102 at r2 (raw file):

{ options->savedata_data = (uint8_t *) malloc(length); if(!options->savedata_data) {

if (options->save_data == nullptr)


toxcore/tox_api.c, line 107 at r2 (raw file):

 } memcpy(options->savedata_data, data, length);

Should we check that length is < sizeof(options->savedata_data) or just ask the caller to be sane?


toxcore/tox_api.c, line 145 at r2 (raw file):

 free(options->proxy_host); } free(options);

Although there's nothing wrong with calling free on a null pointer, it seems weird to explicitly leave this outside of the if clause.

@sudden6 sudden6 force-pushed the internal_malloc branch 4 times, most recently from 5d8d3af to ef5f181 Compare December 27, 2021 13:36
Copy link
Author

@sudden6 sudden6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @JFreegman and @nurupo)


toxcore/tox.h, line 692 at r1 (raw file):

Previously, sudden6 wrote…

This was not intentional, but I have no idea how to tell apidsl to make getters return a const pointer, even if the struct member is not const.

Done.


toxcore/tox.h, line 694 at r1 (raw file):

Previously, sudden6 wrote…

same

Done.


toxcore/tox_api.c, line 79 at r1 (raw file):

Previously, sudden6 wrote…

same

Added a boolean result parameter.


toxcore/tox_api.c, line 80 at r2 (raw file):

Previously, JFreegman wrote…

Instead of calling strlen() maybe the caller could supply the length. toxcore doesn't generally deal with C strings.

Done. Kept a check for NUL termination though, AFAICT the string is used in places where it must be NUL terminated.


toxcore/tox_api.c, line 102 at r2 (raw file):

Previously, JFreegman wrote…

if (options->save_data == nullptr)

Done.


toxcore/tox_api.c, line 107 at r2 (raw file):

Previously, JFreegman wrote…

Should we check that length is < sizeof(options->savedata_data) or just ask the caller to be sane?

We just allocated it with the correct size though?


toxcore/tox_api.c, line 145 at r2 (raw file):

Previously, JFreegman wrote…

Although there's nothing wrong with calling free on a null pointer, it seems weird to explicitly leave this outside of the of this if clause.

IDK, I prefer to make as few things conditional as possible. Can change it though if that's a style requirement.

@sudden6 sudden6 force-pushed the internal_malloc branch 2 times, most recently from 2891a35 to 481541a Compare December 27, 2021 13:57
@JFreegman
Copy link
Member


toxcore/tox_api.c, line 107 at r2 (raw file):

Previously, sudden6 wrote…

We just allocated it with the correct size though?

Ah yeah I misread that one

@JFreegman
Copy link
Member


toxcore/tox_api.c, line 145 at r2 (raw file):

Previously, sudden6 wrote…

IDK, I prefer to make as few things conditional as possible. Can change it though if that's a style requirement.

Just a style nitpick, either way is correct.

@sudden6 sudden6 force-pushed the internal_malloc branch 3 times, most recently from 65698ab to 5ee74fb Compare December 27, 2021 14:56
Copy link
Member

@nurupo nurupo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 10 files at r5.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @JFreegman, @nurupo, and @sudden6)


toxcore/tox_api.c, line 66 at r1 (raw file):

Previously, sudden6 wrote…

agreed, but I think this breaks the API right? or at minimum the clients need to check additional things then

You can add things to an enum without breaking ABI. Adding values at the very end of the enum is safe.


toxcore/tox_api.c, line 79 at r1 (raw file):

Previously, sudden6 wrote…

Added a boolean result parameter.

That's rather unusual. I'm going though the API and it looks like all functions that can return a failure (bool / null pointer, uin32_t) also have and error enum.


toxcore/tox_api.c, line 80 at r2 (raw file):
@JFreegman toxcore does deal with string hosts! How is this host argument in here any different from tox_bootstrap()'s host argument, which the documentation says/implies that must be null-terminated:

  • @param host The hostname or IP address (IPv4 or IPv6) of the node. Must be
  • at most TOX_MAX_HOSTNAME_LENGTH chars, including the NUL byte.
    ?
    Or tox_add_tcp_relay()' host argument, which is the same.
    Neither function accept a host_length argument.
Copy link
Author

@sudden6 sudden6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @JFreegman, @nurupo, and @sudden6)


toxcore/tox_api.c, line 80 at r2 (raw file):

Previously, nurupo wrote…

@JFreegman toxcore does deal with string hosts! How is this host argument in here any different from tox_bootstrap()'s host argument, which the documentation says/implies that must be null-terminated:

  • @param host The hostname or IP address (IPv4 or IPv6) of the node. Must be
  • at most TOX_MAX_HOSTNAME_LENGTH chars, including the NUL byte.
    ?
    Or tox_add_tcp_relay()' host argument, which is the same.
    Neither function accept a host_length argument.

@nurupo you're right, and with some further thinking the current solution is worse than before, because now the string must be NUL terminated and length supplied. IMO for a sane API it should be either or and consistent with the rest of the API. So the decision is what style do we want for c-toxcore? Generally I'd see pointer+length as more sane than NUL terminated, but compatibility says otherwise.


toxcore/tox_api.c, line 107 at r2 (raw file):

Previously, JFreegman wrote…

Ah yeah I misread that one

Done.


toxcore/tox_api.c, line 145 at r2 (raw file):

Previously, JFreegman wrote…

Just a style nitpick, either way is correct.

Done.

@sudden6
Copy link
Author

sudden6 commented Dec 29, 2021

I've found a problem around tox_options_default(...), it seems this can not be implemented with the current semantics, because it's allowed in docs to pass an uninitialized struct. So we can not free the malloced savedata_data and proxy_host, because the pointers are uninitialized. But if we can't free them, it might leak memory, when resetting a Tox_Options struct, as is done by multiple tests.

IMO the tox_options_default(...) function should be removed and tox_options_new(...) should always provide a default initialized struct, which must be freed by tox_options_free(...). Client impact should be minimal, since tox_options_new(...) should be used anywa.

@iphydf @JFreegman anything I missed that speaks against this?

sudden6 and others added 2 commits December 30, 2021 00:55
tox_options_new(...) already returns a default initialized options struct and this function makes it impossible to malloc buffers in Tox_Options, so remove it.
@nurupo
Copy link
Member

nurupo commented Dec 30, 2021

because it's allowed in docs to pass an uninitialized struct.

Huh? This makes no sense. Are you sure you are not misunderstanding things? Can you put/link to the exact quote from the doc in here?

@nurupo
Copy link
Member

nurupo commented Dec 30, 2021

Oh, I see. It's allowed to pass an uninitialized Tox_Options only into tox_options_default(). I thought you were talking about tox_new(), which would make no sense.

Taking a second look at this PR (it was years since the last time I looked into it), why are you allocating a string in tox_options_set_proxy_host() and setting it onto Tox_Options? You are not allowed to do that. It should set user-provided host string on Tox_Options, such that tox_options_get_proxy_host() would return the same user-provided string back. The correct time to copy the user-provided host string into the internal state is inside the tox_new() call. tox_options_*() functions shouldn't be allocating or freeing any memory, they are just dumb setters/getters for Tox_Options struct. All the input validation logic (see TOX_ERR_NEW error struct), allocation/deallocating and everything else happens in tox_new(), into which that Tox_Options is passed.

@sudden6
Copy link
Author

sudden6 commented Dec 30, 2021

@nurupo have you also read the linked discussions in #908? AFAICT there we agreed on the exact opposite of what you're saying now.

The problem I see with not deep-copying the data into Tox_Options is that the user has to track the lifetime of multiple objects (savedata_data, proxy_host) instead of just the Tox_Options struct, which easily leads to mistakes. Since the goal is to make Tox_Options opaque anyway, *_new(...) and _free(...) are needed anyway so I don't see any reason not to do things like in this PR.

@sudden6 sudden6 added the api break Change breaks API or ABI label Dec 30, 2021
@nurupo
Copy link
Member

nurupo commented Dec 30, 2021

Aha! I have totally forgotten about #908.

Like I have said in that issue, we could change the semantics of tox_options_set_* functions so that they do a deep copy instead, but only after we make Tox_Options struct opaque, as this is incompatible with the user being able to set the struct fields directly. So this PR can't be merged until that's done (presumably in a separate PR).

Also, when Tox_Options does become opaque, even though we could do such a change, it warrants a discussion if we really should do such a change, as it's rather breaking. Firstly, it's API-breaking, even for users that have used only the tox_options_* functions and avoiding setting the struct fields directly, as the function signatures would need to change. Because the tox_options_set_* functions taking buffers now allocate inside, they would require an extra argument for the error enum to indicate the allocation failure. Secondly, it's behavior-breaking -- users might have relied on tox_options_get'ing their pointers back and freeing them, and such memory ownership change might catch them off-guard when updating toxcore.

I'm not opposed to the change, just want to see if we think that pros outweight the cons. Breaking the API and possibly unintentionally introducing memory corruption bugs into clients by changing the memory ownership model under them shouldn't be done without a good reason. I have yet to see a good reason for making this change though. (Surprisingly, there wasn't any reason provided before your last comment, unless it was discussed on the IRC at the time).

The problem I see with not deep-copying the data into Tox_Options is that the user has to track the lifetime of multiple objects (savedata_data, proxy_host) instead of just the Tox_Options struct, which easily leads to mistakes.

Can you give an example of such a mistake? I'm having hard time understanding as savedata_data and proxy_host should have the same lifetime as Tox_Options itself -- user has to track the lifetimes until passing Tox_Options into tox_new(), at which point toxcore would make a copy of the data and the user is free to do whatever they want with savedata_data, proxy_host and Tox_Options itself: free them, or keep them around to be used for creating another tox instance or whatever. User also doesn't have to keep track of all the savedata_data, proxy_host, etc. variables individually, they can keep track just of the Tox_Options struct and get all their pointers back by using the tox_options_get_*() family of functions. So I'd argue that it's not that hard to track their lifetimes (and as a bonus you don't even have to keep track of their pointers), all you need to keep track of is Tox_Options's lifetime, as they are the same. (Also, in practice, the lifetime of Tox_Options is so short, that you often end up creating it, setting things on it and deallocating it all within a single function, so it feels like this change in lifetimes would break things for very little gain.)

@sudden6
Copy link
Author

sudden6 commented Dec 30, 2021

but only after we make Tox_Options struct opaque, as this is incompatible with the user being able to set the struct fields directly. So this PR can't be merged until that's done (presumably in a separate PR).

True, I'll look into what prevents making Tox_Options opaque.

user has to track the lifetimes until passing Tox_Options into tox_new(), at which point toxcore would make a copy of the data and the user is free to do whatever they want

TBH this is the first time I'm hearing that this is the intended behavior, because this not documented in tox.h. So far I assumed the Tox_Options object must outlive the Tox options object in all cases.

Also, when Tox_Options does become opaque, even though we could do such a change, it warrants a discussion if we really should do such a change, as it's rather breaking.

I think doing this change with the transition to an opaque pointer is the perfect time, since the API changes anyway I'd rather break it once and make it future proof.

Can you give an example of such a mistake?

qTox was not honoring the (undocumented) lifetime requirement that the proxy_host string must outlive the Tox_Options object, by storing that string on the stack, which led to Tox silently ignoring the proxy settings. Unarchived the full story: qTox/qTox#4934 IMHO this is quite an easy mistake to make.

The main argument I see for making copies of data passed via setters is simplicity and reliability. The user doesn't need to care about the lifetime of the parameters they set after calling a setter. Additionally all synchronization points are well defined by the functions tox_options_new(...) , tox_options_free and tox_new(...)

As for making Tox_Options opaque, it's already discouraged in the header to manipulate it directly, so I think we should make it opaque in the next API breaking release and then also apply this PR.

@nurupo
Copy link
Member

nurupo commented Dec 30, 2021

TBH this is the first time I'm hearing that this is the intended behavior, because this not documented in tox.h. So far I assumed the Tox_Options object must outlive the Tox options object in all cases.

Sounds like the documentation should be improved then.

But yeah, you can now stop making Tox_Options outlive a Tox* instance, it just needs to outlive tox_new(). You own Tox_Options and any data you set on it. Passing it to tox_new() doesn't re-assign the ownership to toxcore, you still own it all.

The main argument I see for making copies of data passed via setters is simplicity and reliability.

I wouldn't call making getters return values different from what was passed into setters intuitive or simple... That's some spooky action at a distance stuff -- rather unexpected.

char host[] = "127.0.0.1"; tox_options_set_proxy_host(&opt, host); assert(tox_options_get_proxy_host(&opt) == host); // huh? false?! 
@sudden6
Copy link
Author

sudden6 commented Dec 30, 2021

Sounds like the documentation should be improved then.

As always :D

As for strange things that could happen with current code:

char host[] = "127.0.0.1"; tox_options_set_proxy_host(opt1, host); host[8] = '2'; tox_options_set_proxy_host(opt2, host); strcmp(tox_options_get_proxy_host(opt1), tox_options_get_proxy_host(opt2)); // huh? equal?! 

Another problem is passing a char[] on the stack, returning from the function and then calling tox_new(...) as seen in qTox. IMHO these errors are more common and easier to make than comparing the pointers as in your example.

The problem is that in C there is no way to describe the concept that data is deep copied or just the reference stored in the function signature (see ownership transfer in Rust or modern C++). So we need to document the way we handle it and ideally be consistent throughout our own API. To be consistent with other setters which handle only integral types I'd implement this for strings in a similar "fire and forget" way.

@nurupo
Copy link
Member

nurupo commented Dec 30, 2021

qTox was not honoring the (undocumented) lifetime requirement that the proxy_host string must outlive the Tox_Options object, by storing that string on the stack, which led to Tox silently ignoring the proxy settings. Unarchived the full story: qTox/qTox#4934 IMHO this is quite an easy mistake to make.

I looked into #4934 and it's a clear case of CWE-562: Return of Stack Variable Address. It's returning a stack-allocated buffer from a function and somehow expecting it to work. It doesn't even use any of tox_options_set_*() functions to set it, it's just a plain assigned to a struct field. This isn't an example of the tox_options_set_*() functions not deep-copying buffers "easily leading to mistakes", it's an example of someone needing to be arrested for C++ crimes of developing while not understanding such basic concepts as stack-allocated things not outliving the function :D If a developer does something like that, that's totally on them and not toxcore, so that's not an example I was looking for.

@sudden6
Copy link
Author

sudden6 commented Dec 31, 2021

True, but whether you use a setter or assign the value directly to the struct doesn't matter for this kind of problem. I'd even argue that before making Tox_Options opaque, deep copying would have been more effort, but now that using the getters and setters is mandatory I see no drawbacks and consider it the better way. Why do you think it's worse?

@nurupo
Copy link
Member

nurupo commented Jan 1, 2022

I have already explained this, but here we go again, in a bit more structured way this time:

  1. Setters doing a deep copy breaks the common expectation that getters return the same object that was passed into the setters.set(foo); assert(get() == foo);. This is not intuitive and is bound to cause users write buggy code.
  2. Getters return the same object that was passed into the setter. Returning a different object is a big behavioral break, which is a kind of API break. Users might have relied on getting their objects back e.g. to free them. They might update to the new toxcore with the deep copying change, fix any API issues that prevent the code from compiling, but not realize that the behavior of setters/getters has changed and now they would have bugs in their programs.
  3. Some of the error handling will have to be moved out of tox_new() into the setters. The buffer setters would need to have an extra error enum argument added to indicate the allocation failure. Adding an extra argument to setters would break API. It might also make users restructure their code, since before they had to handle errors only around tox_new() alone, but now they need to handle the setters erroring too.

So in short, there are too many drawbacks to this change:

  • This change doesn't benefit toxcore in any way. Moving deep copying from tox_new() to setters is of no benefit to toxcore.
  • This change doesn't benefit existing toxcore users. It breaks things for the existing users of setters and getters -- (3) and might introduce bugs into users' programs -- (2).
  • This change might confuse new users as setters deep copying and getters returning a different object is rather unintuitive -- (1).

In comparison, leaving things as they are -- toxcore deep copying inside tox_new() -- has zero drawbacks.

@sudden6
Copy link
Author

sudden6 commented Jan 1, 2022

@nurupo Thanks for listing everything in a structured manner.

@iphydf @JFreegman I interpreted your comments on this PR in favor of this change, but before I put any more work into it: do you still support changing the API to deep copying Tox_Options parameter values?

@iphydf iphydf added P3 Low priority and removed P3 Low priority labels Feb 5, 2022
@iphydf iphydf marked this pull request as draft February 19, 2022 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api break Change breaks API or ABI

4 participants