2

I want to find valid email addresses in a text file, and this is my code:

email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line) 

But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!

An example of my problem would be:

my code can find this email: [email protected]

but it cannot find this one: [email protected]

And it cannot filter this email out either: xyz@gmail

0

3 Answers 3

5

From the python re docs, \w matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]. So [\w\.-] will appropriately match numbers as well as characters.

email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line) 

This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.

Sign up to request clarification or add additional context in comments.

2 Comments

username allows plus sign in email addresses
using this r'[\w\.-]+@[\w\.-]+(?:\.[\w]+)+' would be more robust.
2

Try the validate_email package.

pip install validate_email 

Then

from validate_email import validate_email is_valid = validate_email('[email protected]') 

3 Comments

thanks. But can I do this only by regular expression? I prefer using only regular expression
Took a test drive of validate_email, it sadly thinks that bad@ss is a valid email.
Also, it takes longer than usual.
1
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$ 

Not mine, but I have used it in apps before.

Source

3 Comments

can you explain to me what '-' after w do?
It matches the '-' character literally so users can enter something like [email protected]. This site is a great resource for learning about regex and how each piece works.
Doesn't work for me: error: bad character range \w-\. at position 2

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.