strip out all strange characters of string

Question

I'm trying to create a slug so I would like to strip out every strange character. The only thing the slug should contain is lowercase letters and underscores. Is there a way to check for strange characters and filter it out the string? everything that is not a character or underscore should be deleted

this is what I have:

if(!preg_match_all('/[a-z]/')): $output = preg_replace("/ ... euhm ... /", "", $slug2); else: $output = $slug2; endif;

I should go from this: Create a 3D Ribbon Wrap-Around Effect (Plus a Free PSD!)

to this: create_a_3d_ribbon_wrap_around_effect_plus_a_free_psd

Sometimes it is easier to "remove all characters but this, this, and this"; I think that fits your case. — BeemerGuy
– BeemerGuy, Commented Nov 17, 2010 at 23:02
So you also want to translate spaces into underscores? And numbers are also ok? Your example and your description do not match up. — cdhowie
– cdhowie, Commented Nov 17, 2010 at 23:02
There are 1908 lowercase letters, of which your [a-z] comprises merely a hairsbreadth more than 1⅓%. — tchrist
– tchrist, Commented Nov 17, 2010 at 23:48
This is a duplicate of possibly many other questions. Here are some: stackoverflow.com/questions/4051889/…, stackoverflow.com/questions/1432463/…, stackoverflow.com/questions/25259/…, stackoverflow.com/questions/3984983/…, etc. — Gumbo
– Gumbo, Commented Nov 18, 2010 at 6:57
@cdhowie of course it does not match, that's why I'm posting the question. — Christophe
– Christophe, Commented Nov 18, 2010 at 8:27

cdhowie · Accepted Answer · 2010-11-17 23:04:38Z

3

$slug = strtolower($slug); $slug = str_replace(" ", "_", $slug); $slug = preg_replace("/[^a-z0-9_]/", "", $slug);

answered Nov 17, 2010 at 23:04

cdhowie

172k25 gold badges303 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tchrist Over a year ago

I’m afraid you’ve forgotten one thousand eight hundred and eighty-two lowercase letters besides the those quaint 1960sish a-z. ˋunichars -a '\p{Lower}' '[^a-z]' | wc -lˋ == 1882

cdhowie Over a year ago

Based on the OP's example, this looks like it will be used for slugs in a URL. And Unicode characters look ugly in URLs, so I doubt the OP wants to keep them.

AndreKR Over a year ago

Agreed. But interesting fact that there are more than latin, cyrillic and greek. ;)

mickmackusa Over a year ago

[^a-z0-9_] could be sensibly reduced to \W.

John Kugelman · Accepted Answer · 2010-11-17 23:39:09Z

No need for the initial match. You can do an unconditional search-and-replace. If there's nothing to replace, no big deal. Here it is as one big chain of function calls:

$slug = trim(preg_replace('/[\W_]+/', '_', strtolower($slug)), '_');

Or split out into separate lines:

$slug = strlower($slug); $slug = preg_replace('/[\W_]+/', '_', $slug); $slug = trim($slug, '_');

Explanation:

Convert uppercase to lowercase with strtolower.
Search for \W and _. A "word" character is a letter, digit, or underscore. A "non-word" character is the opposite of that, i.e. whitespace, punctuation, and control characters. \W matches "non-word" characters.
Replace those bad characters with underscores. If there's more than one in a row they'll all get replaced by a single underscore.
Trim underscores from the beginning and end of the string.

The code's on the complicated side because there are several tricky cases it needs to handle:

Bad characters on the ends need to be deleted, not converted to underscores. For example, the !) in your example.
We want foo_-_bar to turn into foo_bar, not foo___bar. Underscores should be collapsed, basically.

AndreKR · Accepted Answer · 2010-11-17 23:02:47Z

0

$slug = preg_replace("[^a-z_]", "", $slug);

answered Nov 17, 2010 at 23:02

AndreKR

34k21 gold badges120 silver badges181 bronze badges

1 Comment

tchrist Over a year ago

You forgot almost two thousand lowercase letters.

Collectives™ on Stack Overflow

strip out all strange characters of string

3 Answers 3

4 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Linked

Related