2

Let's say I am traversing a string of length n. I want it to end at a specific character that fulfils some conditions. I know that C style strings can be terminated at the i'th position by simply assigning the character '\0' at position i in the character array.

Is there any way to achieve the same result in an std::string (C++ style string)? I can think of substr, erase, etc. but all of them are linear in their complexity, which I cannot afford to use.

TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?

7
  • 2
    you can get an iterator to point to anywhere you want to use it as the end. Just do std::advance(str.begin(), n) and use that as the end. Commented Mar 1, 2017 at 20:20
  • 1
    What do you want to do with the resulting substring? And what should happen to the remainder of the string? Commented Mar 1, 2017 at 21:07
  • @ChristianHackl I didn't need the remainder of the string, just needed the part before the end character. Commented Mar 2, 2017 at 4:50
  • @vu1p3n0x Another intuitive answer, thanks, but it would take linear time, just like all other suggestions :) And I'm not even sure we can simply assign iterators to string::end which might be a const iterator. Commented Mar 2, 2017 at 4:53
  • @KushalAgrawal: What I meant is if you have considered memory consumption. A C string does not really become smaller if you set a character to null. The remainder still occupies memory until free is called on the first character's address. This may or may not be important for your current problem. Commented Mar 2, 2017 at 16:31

5 Answers 5

5

You can use resize:

std::string s = /* ... */; if (auto n = s.find(c); n != s.npos) { s.resize(n); } 
Sign up to request clarification or add additional context in comments.

12 Comments

Mind you, that's documented as linear complexity too; some implementations could choose to reallocate unconditionally (assuming you wish to reduce memory usage), others might only do so if the string shrinks below the reallocation threshold.
std::vector::resize is guaranteed not to reallocate when resizing down (in order to preserve the iterators). Could it be the same holds for the string? In any case, resize should be amortized O(1)
@KushalAgrawal: Please show me in the standard where the complexity of resize is defined to be linear. I'm being serious; I can't find any statement of its complexity. Cppreference is a great site, but like any secondary source, it is not authoritative.
@zett42: Implication means nothing. The "as if" rule easily allows implementations to not copy anything and just move some pointers around and write a new NUL character. Unless the standard has a direct statement about the complexity requirement, the implementations can do anything they want, so long as they get the same results. Without a statement about complexity, then we must assume it is implementation-defined (and therefore likely to be reasonable).
@NicolBolas basic_string's specification leaves a lot to be desired because it dates from the COW-era, where even a shrinking resize may well require a full allocation+copy. A full paper is probably needed to completely clean it up.
|
3

The logical answer here is basic_string::resize. What the standard says about this function is:

Effects: Alters the length of the string designated by *this as follows:

  • If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
  • If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.

Now, that looks very much like linear time. However, the standard does not specifically state that things will happen this way. They only state that it will be "as if" things happen this way. Therefore, an implementation is completely free to implement the shrinking version of resize by shifting one pointer and writing a NUL character. Nothing in the standard would forbid such an implementation.

So the real question is... are standard library implementations written by complete morons? It's certainly possible that they are. But it's probably wise not to assume so.

Personally, I'd just use resize on the assumption that the library implementers know what they're doing. After all, if they can't write an optimization as simple as that, then who knows what other things they're doing wrong? If you can't trust your standard library implementation not to do stupid things, then you shouldn't be using it in performance-critical code.

3 Comments

"are standard library implementations written by complete morons?" For the longest time, <regex> on gcc's default standard library was evidence of "yes, some of it was written by complete morons". ;)
@T.C.: BTW, is that "timsong-cpp" GitHub page officially where issues are stored now, or is it just a 3rd party listing?
@NicolBolas That's my site :) (because I hate linking to a massive page for a single issue)
2

is there any "end" character for an std::string?

No. It is possible to define a std::string that is not null terminated. You won't be able to do a few things for such strings, such as treat the return value of std::string:data() as a null terminated C string 1, but a std::string can be constructed that way.

Can I make the end iterator point to the current character somehow?

To get a std::string::iterator point to a certain character, you'll have to traverse the string.

E.g.

std::string str = "This is a string"; auto iter = str.begin(); auto end = iter; while ( end != str.end() && *end != 'r' ) ++end; 

After that, the range defined by iter and end contains the string "This is a st".

If that is not acceptable, you'll have to adapt your code to check the value of the character for every step.

std::string str = "This is a string"; auto iter = str.begin(); // Break when 'r' is encountered or end of string is reached. while ( iter != str.end() && *iter != 'r' ) { // Use *iter ... } 

1 Thanks are due to @Cubbi for pointing out an error in what I stated. std::string::data() can return a char const* that is not null terminated if using a version of C++ earlier than C++11. If using C++11 or later, std::string::data() is required to return a null terminated char const*.

8 Comments

Thanks for the quick and very detailed answer :)
You could simplify the code to just auto end = std::find(str.begin(), str.end(), 'r'); so you get either an iterator pointing to the first 'r' or the original end iterator (using std::find instead of string.find means you don't even need to check return values). Still has to do the same work, so if you are processing as you go the explicit loop avoids iterating twice (once to find stop point, once to do work), but if you need to use the computed end many times, this simplifies matters, and the code is easier to read either way.
How do you imagine c_str not being usable as a C string?
@Cubbi: It would only be usable for C functions which take a pointer and a size argument. It would not be usable for something like strlen, which expects to encounter '\0' at a valid address.
The returned array is absolutely required to be null-terminated. Pay attention to "before C++11" and "since C++11" tags on both pages. c_str is always null-terminated, data is null-terminated as of C++11
|
1

std::string does not have an "end character" like c style strings. You can have many null terminators inside a single std::string. If you want to the string to end after a certain character then you need to erase the rest of the characters in the string after that last character.

In your case that would give you something like

string_variable.erase(pos_of_last_character + 1) 

Comments

0

TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?

Not really. std::string uses the std::string::size() function to keep track of the number of characters stored and maintained independently of any sentinel characters like '\0'.

Though these are considered when a std::string is initialized from a const char*.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.