Skip to main content
replaced http://mathematica.stackexchange.com/ with https://mathematica.stackexchange.com/
Source Link

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this articlearticle, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

{"temp","sp.a","tiral","dump","NV-A","rambo","6833","16","rgcht"} 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

{"temp","sp.a","tiral","dump","NV-A","rambo","6833","16","rgcht"} 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

{"temp","sp.a","tiral","dump","NV-A","rambo","6833","16","rgcht"} 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

added 20 characters in body; edited title
Source Link
Lena
  • 121
  • 1
  • 3

Removing non-word characters in certain parts of a string.

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

temp sp{"temp","sp.a tiral dump NVa","tiral","dump","NV-A rambo 6833 16 rgchtA","rambo","6833","16","rgcht"} 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

Removing non-word characters in certain parts of a string.

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

temp sp.a tiral dump NV-A rambo 6833 16 rgcht 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

Removing non-word characters in certain parts of a string

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

{"temp","sp.a","tiral","dump","NV-A","rambo","6833","16","rgcht"} 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!

Source Link
Lena
  • 121
  • 1
  • 3

Removing non-word characters in certain parts of a string.

I would like to ask how I can remove non-word characters from a string, but only in certain cases.

I have read this article, so I know how to get the words out of a string. My text is however a bit more complicated.

For example:

trialtext = ",,temp sp.a tiral - dump NV-A rambo.6833. 16,rgcht"; 

From this text, I would like to get as output:

temp sp.a tiral dump NV-A rambo 6833 16 rgcht 

In other words, I want so split according to spaces, commas, hyphens and dots, EXCEPT when they have letter character before and after either a hyphen or a dot (so not commas or other signs!)

This has been my most succesful trial so far:

StringSplit[trialtext, Except[WordCharacter, WordCharacter .. ~~ "." ~~ WordCharacter]] {"temp sp.a tiral dump NV-A rambo.6833 16,rgcht"} 

although I do not understand why - if I as for "." - it decides to also take "," and "-".

Therefore also the related question: can someone please explain to me why this

StringSplit[trialtext, Except[WordCharacter, ","]] 

gives this output:

 {"temp sp.a tiral dump NV-A rambo.6833 16", "rgcht"} 

while this:

StringSplit[trialtext, Except[WordCharacter, "."]] 

produces this output:

{"temp", "sp", "a", "tiral", "dump", "NV", "A", "rambo", "6833", "16", "rgcht"} 

Thanks a bunch!