1

I'm trying to create a slug so I would like to strip out every strange character. The only thing the slug should contain is lowercase letters and underscores. Is there a way to check for strange characters and filter it out the string? everything that is not a character or underscore should be deleted

this is what I have:

if(!preg_match_all('/[a-z]/')): $output = preg_replace("/ ... euhm ... /", "", $slug2); else: $output = $slug2; endif; 

I should go from this: Create a 3D Ribbon Wrap-Around Effect (Plus a Free PSD!)

to this: create_a_3d_ribbon_wrap_around_effect_plus_a_free_psd

7
  • Sometimes it is easier to "remove all characters but this, this, and this"; I think that fits your case. Commented Nov 17, 2010 at 23:02
  • 2
    So you also want to translate spaces into underscores? And numbers are also ok? Your example and your description do not match up. Commented Nov 17, 2010 at 23:02
  • There are 1908 lowercase letters, of which your [a-z] comprises merely a hairsbreadth more than 1⅓%. Commented Nov 17, 2010 at 23:48
  • 1
    This is a duplicate of possibly many other questions. Here are some: stackoverflow.com/questions/4051889/…, stackoverflow.com/questions/1432463/…, stackoverflow.com/questions/25259/…, stackoverflow.com/questions/3984983/…, etc. Commented Nov 18, 2010 at 6:57
  • @cdhowie of course it does not match, that's why I'm posting the question. Commented Nov 18, 2010 at 8:27

3 Answers 3

3
$slug = strtolower($slug); $slug = str_replace(" ", "_", $slug); $slug = preg_replace("/[^a-z0-9_]/", "", $slug); 
Sign up to request clarification or add additional context in comments.

4 Comments

I’m afraid you’ve forgotten one thousand eight hundred and eighty-two lowercase letters besides the those quaint 1960sish a-z. ˋunichars -a '\p{Lower}' '[^a-z]' | wc -lˋ == 1882
Based on the OP's example, this looks like it will be used for slugs in a URL. And Unicode characters look ugly in URLs, so I doubt the OP wants to keep them.
Agreed. But interesting fact that there are more than latin, cyrillic and greek. ;)
[^a-z0-9_] could be sensibly reduced to \W.
1

No need for the initial match. You can do an unconditional search-and-replace. If there's nothing to replace, no big deal. Here it is as one big chain of function calls:

$slug = trim(preg_replace('/[\W_]+/', '_', strtolower($slug)), '_'); 

Or split out into separate lines:

$slug = strlower($slug); $slug = preg_replace('/[\W_]+/', '_', $slug); $slug = trim($slug, '_'); 

Explanation:

  1. Convert uppercase to lowercase with strtolower.
  2. Search for \W and _. A "word" character is a letter, digit, or underscore. A "non-word" character is the opposite of that, i.e. whitespace, punctuation, and control characters. \W matches "non-word" characters.
  3. Replace those bad characters with underscores. If there's more than one in a row they'll all get replaced by a single underscore.
  4. Trim underscores from the beginning and end of the string.

The code's on the complicated side because there are several tricky cases it needs to handle:

  • Bad characters on the ends need to be deleted, not converted to underscores. For example, the !) in your example.
  • We want foo_-_bar to turn into foo_bar, not foo___bar. Underscores should be collapsed, basically.

Comments

0
$slug = preg_replace("[^a-z_]", "", $slug); 

1 Comment

You forgot almost two thousand lowercase letters.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.