1

I have so many various names

Input:

Depsai P.R.N. Dênis De Castro John D.J. Andrew E. D.J. JOHN JOHN Mical D.J. 

I need output like this.

D. P.R.N. D. C. J. D.J. A. E. D.J. J. J. M. D.J. 

If the name like Dênis De Castro i need the output: D. C. If the name contains theses cases (De|Di|Le|La|Van|Der) in between should not capture the first word.

 use strict; use warnings; my $gn = qq(<name>Depsai P.R.N.</name> <name>D&#x00EA;nis De Castro</name> <name>Andrew E.</name> <name>John D.J.</name> <name>D.J. John</name> <name>John Mical D.J.</name>); my @int = $gn =~ m{<name>(.*?)</name>}ig; my $ini=(); foreach my $initial(@int){ $ini .= "$1\. " while($initial =~ s/(?:^|[ \.\,\;]+)([A-Z])\w*(\b|$)//s); $ini =~ s/ $//mi; print join("\n",$ini);exit; } Please give some regex pattern. Thanks advance. 
1
  • 2
    removing the lowecase letters will give you the desired output. Commented Nov 4, 2014 at 4:43

2 Answers 2

1

You can try below one liner :

InputFile:

<name>Depsai P.R.N.</name> <name>D&#x00EA;nis De Castro</name> <name>John D.J.</name> <name>Andrew E.</name> <name>D.J. JOHN</name> <name>JOHN Mical D.J.</name> <name>Roc&#x00ED;o</name> 

On Windows cmd prompt:

perl -lne "if($_ =~ /<name(>.*?<)\/name>/) {$result = $1; $result =~ s/(\s)(De|Di|Le|La|Van|Der)(\s)/$1$3/g; $result =~ s/((?:>|\s)[A-Z])[^\.]/$1\./g; $result =~ s/.*?(\s*[A-Z]\.\s*).*?/$1/g;$result =~ s/([a-z]|[A-Z][A-Z]).*?<//g;$result =~ s/<//g;print $result;}" InputFile 

On Unix:

perl -lne 'if($_ =~ /<name(>.*?<)\/name>/) {$result = $1; $result =~ s/(\s)(De|Di|Le|La|Van|Der)(\s)/$1$3/g; $result =~ s/((?:>|\s)[A-Z])[^\.]/$1\./g; $result =~ s/.*?(\s*[A-Z]\.\s*).*?/$1/g;$result =~ s/([a-z]|[A-Z][A-Z]).*?<//g;$result =~ s/<//g;print $result;}' InputFile 

Output:

D. P.R.N. D. C. J. D.J. A. E. D.J. J. J. M. D.J. R. 
Sign up to request clarification or add additional context in comments.

2 Comments

You said do not capture first words if they are (De|Di|Le|La|Van|Der) in between then can you tell me what is the expected output of this D&#x00EA;nis Van Castro and even for this D&#x00EA;nis John La Castro ?
For this case not working @Praveen <name>Roc&#x00ED;o</name> should come like <name>R.</name>
0
(?<=[a-zA-Z])[a-zA-Z]+ 

You can try this.Replace by ..See demo.

http://regex101.com/r/bB8jY7/12

import re p = re.compile(ur'(?<=[a-zA-Z])[a-zA-Z]') test_str = u"Depsai P.R.N. \nJohn D.J. \nAndrew E." subst = u"." result = re.sub(p, subst, test_str) 

6 Comments

am not downvoted. now i edit the question. i need space if the initials have otherwise no need. i working in perl the lookbehind regex not working for me it shows error like not support for lookbehind. if space present in the name should come.
@depsai try now.See demo.
your code is working in regex101.com and regex buddy. but in perl program lookbehind regex not working please give some regex without using ?<=.
@depsai try [a-z]+ replace by ..
thanks working i used this ([A-Z])([a-zA-Z]+) replace with $1.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.