Skip to main content
fixed trivial mistake
Source Link
kyrill
  • 1.6k
  • 11
  • 24

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\'\ and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\'\ in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'\s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • In function tto_escape_hex, save *pos into a variable instead of using it directly. The way you do it now, you dereference a pointer on every access. That, if your compiler didn't optimize it, would be slow. Allocating an extra variable is worth it and if you have optimizations turned on (or maybe even if you don't), the compiler probably stores the value in a register anyway.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • In function tto_escape_hex, save *pos into a variable instead of using it directly. The way you do it now, you dereference a pointer on every access. That, if your compiler didn't optimize it, would be slow. Allocating an extra variable is worth it and if you have optimizations turned on (or maybe even if you don't), the compiler probably stores the value in a register anyway.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of \ and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of \ in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for \s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • In function tto_escape_hex, save *pos into a variable instead of using it directly. The way you do it now, you dereference a pointer on every access. That, if your compiler didn't optimize it, would be slow. Allocating an extra variable is worth it and if you have optimizations turned on (or maybe even if you don't), the compiler probably stores the value in a register anyway.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

added 389 characters in body
Source Link
kyrill
  • 1.6k
  • 11
  • 24

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • In function tto_escape_hex, save *pos into a variable instead of using it directly. The way you do it now, you dereference a pointer on every access. That, if your compiler didn't optimize it, would be slow. Allocating an extra variable is worth it and if you have optimizations turned on (or maybe even if you don't), the compiler probably stores the value in a register anyway.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • In function tto_escape_hex, save *pos into a variable instead of using it directly. The way you do it now, you dereference a pointer on every access. That, if your compiler didn't optimize it, would be slow. Allocating an extra variable is worth it and if you have optimizations turned on (or maybe even if you don't), the compiler probably stores the value in a register anyway.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

added 275 characters in body
Source Link
kyrill
  • 1.6k
  • 11
  • 24

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you will save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you will save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

I will only address the issue of performance.

  • You might want to use strchr to find occurrences of '\' and just memcpy everything in between. It's highly probable that these are optimized by SSE or AVX.

  • If you can (I suspect you cannot), don't allocate memory for each string separately, and if you do, don't reallocate, it's probably not worth the overhead.

  • To kill two birds with one stone, you can allocate an array where you save the positions of '\' in the string. Then you allocate exactly as much memory as needed, and do the memcpy and parsing of escape sequences. EDIT To deal with escape sequences of variable length, you can parse and store the escape sequences as you scan the string for '\'s. Store them in another array, along with their positions, and then do memcpy of plain text plus individually copy the parsed characters.

  • Preferably put the most common branch first, eg. if (*pos != '\\'). Although the branch prediction buffer will probably alleviate the negative effects of doing it the way you do it now. You can take a look at the macros __builtin_expect and likely / unlikely.

  • If you are serious about this, you can take inspiration from some reference-grade compilers such as GCC (although that one might be a bit too heavyweight).

added 281 characters in body
Source Link
kyrill
  • 1.6k
  • 11
  • 24
Loading
Source Link
kyrill
  • 1.6k
  • 11
  • 24
Loading