Regex to convert words in TitleCase

Question

I use this regex to convert words in TitleCase and confirm each substitution:

:s/\%V\<\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc

However this matches also the words who are already in Titlecase.

Does anyone know how to change the above regex in order to jump over words who are already in TitleCase?

The pattern in the first capture group (the first letter?) includes A-Z and lots of accented capitals. If you drop them your search will match only words starting with lowercase, I think. — romainl
– romainl, Commented Dec 21, 2012 at 13:46

romainl · Accepted Answer · 2012-12-21 13:54:41Z

2

:s/\%V\<\([a-z0-9àäâæèéëêìòöôœùüûç]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc

seems to do the trick, here.

Because you have explicitely included uppercase characters in the range you use in the first letter capture group, your pattern is going to match both foo and Foo. Removing the uppercase characters from that range seems to resolve your immediate problem.

answered Dec 21, 2012 at 13:54

romainl

199k21 gold badges300 silver badges340 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Reman Over a year ago

Hi romainl, your solution doesn't match FOo and FoO or FOO

romainl Over a year ago

@Remonn, a piece of text in "title case" is a piece of text where almost every word starts with an upper case character. As you asked, your substitution now skips words that already start with an upper case character. Trying to deal with every cornercase would make your pattern a lot more complicated. If you add to that the fact that some words like conjunctions are often not capitalized in title case sentences you are entering "function territory", there.

romainl Over a year ago

@Remonn, your initial pattern splits the word in two parts: the first char and all the rest. I can't imagine a straightforward way to exclude Foo and not FoO.

Reman Over a year ago

yes that's why I asked this question. Everytime I have to confirm (yes or no) substitution of words which are already Title case. There must be a way to find all words other then title case, no?

alexis · Accepted Answer · 2012-12-22 02:59:01Z

1

To match only non-titlecase words, you want to match those that start either (a) with a lowercase letter or (b) with two uppercase letters. The following will do it (add accented letters and digits to taste):

\b([A-Z])([A-Z][A-Za-z]*)|\b([a-z])([a-zA-Z]+)

But some words match at groups \1 and \2, others at \3 and \4. I don't use vim so I can't say if it'll let you substitute with this kind of pattern. (E.g., \u\1\3\L\2\4; only two of the four will ever be non-empty)

answered Dec 22, 2012 at 2:59

alexis

50.4k18 gold badges108 silver badges173 bronze badges

4 Comments

Reman Over a year ago

Yes I adapted it a bit to Vim and it works \C\(\<[A-Z]\)\([a-z]*[A-Z]\+[a-z]*\>\)\|\C\(\<[a-z]\)\([a-zA-Z]\+\>\) However.. I don't know how to substitute with an OR statement in the regex. Someone can help me?

Reman Over a year ago

Oeps. it doesn't match HEllO

Reman Over a year ago

found it. This one works

s/\C\%V\(\<[A-ZÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*[A-Z0-9ÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\+[a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\>\)\|\C\%V\(\<[a-zàäâæèéëêìòöôœùüûç]\)\([a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\+\>\)/\u\1\L\2\u\3\L\4/gc

alexis Over a year ago

Oh I see, you're also matching words with internal caps (CamelCase). Missed that. (But did you try the pattern I suggested, \u\1\3\L\2\4?)

Collectives™ on Stack Overflow

Regex to convert words in TitleCase

2 Answers 2

4 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Related