2

I am trying to do a complicated (to me) regex on a multi-line snip from an e-mail. I have tried hard, with no luck. I am trying to get rid of anything from "On " through " wrote:"

Would be nice if you can also check to see if it contains the word "AcmeCompany", so it doesn't check for everything "On " "wrote:"

So far, I have this: /On(.*)AcmeCompany(.*)/im but it does not work...

say hello, world! On Tue, Jun 7, 2011 at 6:18 AM, AcmeCompany < [email protected]> wrote: 

Thank you for the responses, but it seems like there's another problem.

EDIT: I found out that this works: /On[\s\S]+?AcmeCompany[\s\S]+?wrote:/m, but it seems to fail when the e-mail contents have word "On".

say hello, world! On a plane! On Tue, Jun 7, 2011 at 6:18 AM, AcmeCompany < [email protected]> wrote: 

EDIT2: Every mail client is different... gmail tends to do it in 2 lines, mail app from iphone do it in 1 line, so it doens't always follow the strict format.

1 thing for sure: beginning always uses "On " and ends with " wrote:". It also contains a hash and AcmeCompany, which I can also use to verify.

5
  • You don't need to capture the .*, try On.*AcmeCompany.* Commented Jun 7, 2011 at 15:23
  • . won't catch a newline, which is why you are having trouble with a regex that spans more than one line. Commented Jun 7, 2011 at 15:31
  • @Michael Pryor : I tried using \sm instead of \im, i got pretty close, but when I have On blah blah On Tues Jun 7, it breaks.. Commented Jun 7, 2011 at 15:36
  • Hi, please try once using /On[\s\S]+?AcmeCompany[\s\S]+?wrote:/ Commented Jun 7, 2011 at 15:38
  • @sudimail I revised the question, your solution works, but it ran another problem... Commented Jun 7, 2011 at 15:41

5 Answers 5

1

For the new requirement I am adding another reply. Hope you won't mind.

Can you try something like this?

/On\s(Mon|Tue|Wed|Thu|Fri|Sat)[\s\S]+?AcmeCompany[\s\S]+?wrote:/ 

I am trying again..how about using ?

/On.+?AcmeCompany[\s\S]+?wrote:/ 
Sign up to request clarification or add additional context in comments.

2 Comments

I am modifying here, as, but again if the content has day name! trying to think!
it turns out not every mail client includes mon/tue/wed... but every mail client includes "On" and "wrote:" so this would only work for gmail and hotmail. :(
1

Hope this helps:

/On[\s\S]+?AcmeCompany[\s\S]+?wrote:/ 

The regular expression above first matches On and then either of all spaces and non-spaces (together swallowing all characters and newlines) with a lazy repetition mode till it finds AcmeCompany. Again it matches all spaces and non-spaces (together swallowing all characters and newlines) with a lazy repetition till it finds wrote:

1 Comment

Hi, this actually works, but it seems like it collects anything "On" and thereafter...
1

This will work:

On.*AcmeCompany.* 

Maybe offtopic but... If you want to learn regex you should try Expresso

Example of Expresso at work:

enter image description here

Comments

1

To get the string before On Tue,Jun...:

$str = explode ('On', $yourstring); $oldstr = array_pop($str); //Remove the last value of the $str array echo trim( implode('On',$str) ); //Trim the string to remove any unnecessary line breaks 

To find if the hidden message contains AcmeCompany:

if( strstr ( $oldstr , 'AcmeCompany' ) ) { echo "I found AcmeCompany!"; } else { echo "I didn't find AcmeCompany!"; } 

Hope my answer is useful, even though I didn't use regex.

5 Comments

Hi, interesting concept.. If we happen to have "On" anywhere in the body, then this solution won't work... last resort, I might crop out 2 rows
Yes, but this shows that your script would break if someone's comment was: say hello, world! On Tue, Jun 7, 2011 at 6:18 AM, Hacker <hacker@hacker> wrote:
That's also true... perhaps it's better to crop out last 2 rows.. However, this is highly unlikely to happen =/
I gotta give credit to you (+1), but it still doesn't fit the requirement. If the sender has bunch of "On" then it won't work.
Well, during my tests, it worked, because the array_pop function removes the last value (the comment you want to delete) and keeps the others.
0

Try this: /On.*AcmeCompany <$[^:]+:/im, the m is important as it lets the $ match line breaks.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.