0

I use this regex to convert words in TitleCase and confirm each substitution:

:s/\%V\<\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc 

However this matches also the words who are already in Titlecase.

Does anyone know how to change the above regex in order to jump over words who are already in TitleCase?

1
  • 1
    The pattern in the first capture group (the first letter?) includes A-Z and lots of accented capitals. If you drop them your search will match only words starting with lowercase, I think. Commented Dec 21, 2012 at 13:46

2 Answers 2

2
:s/\%V\<\([a-z0-9àäâæèéëêìòöôœùüûç]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc 

seems to do the trick, here.

Because you have explicitely included uppercase characters in the range you use in the first letter capture group, your pattern is going to match both foo and Foo. Removing the uppercase characters from that range seems to resolve your immediate problem.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi romainl, your solution doesn't match FOo and FoO or FOO
@Remonn, a piece of text in "title case" is a piece of text where almost every word starts with an upper case character. As you asked, your substitution now skips words that already start with an upper case character. Trying to deal with every cornercase would make your pattern a lot more complicated. If you add to that the fact that some words like conjunctions are often not capitalized in title case sentences you are entering "function territory", there.
@Remonn, your initial pattern splits the word in two parts: the first char and all the rest. I can't imagine a straightforward way to exclude Foo and not FoO.
yes that's why I asked this question. Everytime I have to confirm (yes or no) substitution of words which are already Title case. There must be a way to find all words other then title case, no?
1

To match only non-titlecase words, you want to match those that start either (a) with a lowercase letter or (b) with two uppercase letters. The following will do it (add accented letters and digits to taste):

\b([A-Z])([A-Z][A-Za-z]*)|\b([a-z])([a-zA-Z]+) 

But some words match at groups \1 and \2, others at \3 and \4. I don't use vim so I can't say if it'll let you substitute with this kind of pattern. (E.g., \u\1\3\L\2\4; only two of the four will ever be non-empty)

4 Comments

Yes I adapted it a bit to Vim and it works \C\(\<[A-Z]\)\([a-z]*[A-Z]\+[a-z]*\>\)\|\C\(\<[a-z]\)\([a-zA-Z]\+\>\) However.. I don't know how to substitute with an OR statement in the regex. Someone can help me?
Oeps. it doesn't match HEllO
found it. This one works s/\C\%V\(\<[A-ZÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*[A-Z0-9ÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\+[a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\>\)\|\C\%V\(\<[a-zàäâæèéëêìòöôœùüûç]\)\([a-zA-Z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\+\>\)/\u\1\L\2\u\3\L\4/gc
Oh I see, you're also matching words with internal caps (CamelCase). Missed that. (But did you try the pattern I suggested, \u\1\3\L\2\4?)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.