• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Devaka Cooray
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • Paul Clapham
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Saloon Keepers:
  • Tim Holloway
Bartenders:

Splitting over a delimiter

 
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I'm having a problem splitting over a string array with a delimiter

So my code is this:



is there just so I only get one sample from moby for testing purposes.


This is my output:



I'm not sure if the delimiter is the problem but I think it is. I tried using \\ and \\\\ but to no solution. PartOfSpeechArray is what I'm trying to split.
Its dieiing at the line: String [] mobyLine = pos.split("\\");


Thanks all.
 
Marshal
Posts: 81613
593
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That delimiter would suggest you are splitting on backslashes. Are there any backslashes in your input?
 
author
Posts: 23965
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Well, it would help if you tell us what the delimiter you are trying to split is. The backslash is the escape for both the Java literal string, and for the regex pattern, so, two backslashes with a string literal is not a valid regex.

You say four backslashes doesn't work, but that is a valid regex -- which represents a single backslash as the delimiter. So, back to the original question, what is the delimiter that you are trying to split?

Henry
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Right there in the output. I'm trying to split  3-D\AN
 
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Tried using \\\\

Output then is:


The code I'm referencing for that output is:



As you can see, partOfSpech and word both come back blank from the splitting operation
 
Henry Wong
author
Posts: 23965
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted Gress wrote:
As you can see, partOfSpech and word both come back blank from the splitting operation



Both of those variables have a value of zero length strings (aka. blank as set at line 6 and line 7)... because you set them to zero length strings. The splitting operation have nothing to do with it.

Henry
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oops. My mistake. :-)
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your "symbol" will consist of one or more characters denoting various parts of speech in probability order. Your word "3-D" is most likely to be an adjective ('A') but may possibly be a noun ('N').

 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok. Got another one.

Code:



Output:



mobyLine is only showing up as a single element, "a", despite the fact that the next entry after "Abyla" in the moby dictionary is abysmally\v
It should be:


[SensoryCore] pos: abysmally\v
[SensoryCore] mobyLine: [abysmally, v]
[SesnroyCore] partOfSpeechSymbol: v
[SensoryCore] new word:  abysmally
[SensoryCore] unit : abysmally
[SensoryCore] encoding size : 0

 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How are you populating "unit" when you call the method?

Here's a code posting tip: You can change what line number your code listing starts with. The default is '1'.
vs
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm populating unit using this function that reads from a text file and gets a word-part of speech pair.

So, for example, unit could be:

Abyla\N

This is the code that reads from the text file:

 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is a single "unit" for each line. One and only one unit.
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For example, this is a sample from the  moby textfile i'm reading:

 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Congratulations! You've discovered a Scanner bug!

I wrote a minimalist program to read mobypos.txt using Scanner and it stopped at the same character for me as it did for you. Thought it might be that a non-printable character snuck into the txt file so I retyped a few lines at the point of failure and that didn't fix it. I changed code to use a BufferedReader and that worked without an issue.

EDIT: Even Files.readAllLines() failed.
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok. New Problem. It isn't recognizing the nouns and verbs. In fact, its only recognizing "Alice" and "Blue". (Nice catch on the scanner bug btw)





So what it should do is if it finds the tkn variable (the token word that the part of speech is being checked on) it should ouptut, for nouns anyway,



Or in lamens terms if the noun is 'Car': [Sensory Core] Encoding noun: Car noun

WHat it is doing is outputting two nouns and then quitting.



 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, let's call it a "feature". The file does have some characters outside of the default character set. If you set the character set as noted in my comments below all three approaches work.
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes?>[SensoryCore] Reading Line: Alice followed the white rabbit. She followed the white rabbit down the tunnel. Blue green purple red.
[SensoryCore] Sentence: Alice followed the white rabbit
[SensoryCore] Sentence:  She followed the white rabbit down the tunnel
[SensoryCore] Sentence:  Blue green purple red
[SensoryCore] Append Sentence: Alice followed the white rabbit.
[SensoryCore] Append Sentence:  She followed the white rabbit down the tunnel.
[SensoryCore] Append Sentence:  Blue green purple red.
[SensoryCore] Processing tokens (sentences): Alice followed the white rabbit
[SensoryCore] Array Contents[Alice, followed, the, white, rabbit]
[Sensory Core] Encoding noun: Alice noun
[SensoryCore] Processing tokens (sentences):  She followed the white rabbit down the tunnel
[SensoryCore] Array Contents[, She, followed, the, white, rabbit, down, the, tunnel]
[SensoryCore] Processing tokens (sentences):  Blue green purple red
[SensoryCore] Array Contents[, Blue, green, purple, red]
[Sensory Core] Encoding noun: Blue noun
[SensoryCore] Encoding Complete
[SesoryCore] Encoding Size: 2
[SensoryCore] Encoded String: Alice noun
[SensoryCore] Encoded String: Blue noun
[Sensory Core] Outputting following message:  Alice noun Blue noun
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Any ideas>
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Should pick up on all the nouns, right?>
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you want this instead. The symbol may contain multiple letters, but the first letter is the most likely POS.
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I changed



(uncommented new token)

and now I'm getting long long lists of items from the readiniglist.

[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
[SensoryCore] new token: Alice
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Carey, thanks for the help so far. You are going in the credits of the paper for this. :-)
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So quick note. If I take out that comment I get the following output:

Yes?>[SensoryCore] Reading Line: Alice followed the white rabbit. She followed the white rabbit down the tunnel. Blue green purple red.
[SensoryCore] Sentence: Alice followed the white rabbit
[SensoryCore] Sentence:  She followed the white rabbit down the tunnel
[SensoryCore] Sentence:  Blue green purple red
[SensoryCore] Append Sentence: Alice followed the white rabbit.
[SensoryCore] Append Sentence:  She followed the white rabbit down the tunnel.
[SensoryCore] Append Sentence:  Blue green purple red.
[SensoryCore] Processing tokens (sentences): Alice followed the white rabbit
[SensoryCore] Array Contents[Alice, followed, the, white, rabbit]
[Sensory Core] Encoding noun: Alice noun
[Sensory Core] Encoding noun: rabbit noun
[SensoryCore] Processing tokens (sentences):  She followed the white rabbit down the tunnel
[SensoryCore] Array Contents[, She, followed, the, white, rabbit, down, the, tunnel]
[Sensory Core] Encoding noun: rabbit noun
[Sensory Core] Encoding noun: tunnel noun
[SensoryCore] Processing tokens (sentences):  Blue green purple red
[SensoryCore] Array Contents[, Blue, green, purple, red]
[Sensory Core] Encoding noun: Blue noun
[Sensory Core] Encoding noun: green noun
[Sensory Core] Encoding noun: purple noun
[Sensory Core] Encoding noun: red noun
[SensoryCore] Encoding Complete
[SesoryCore] Encoding Size: 8
[SensoryCore] Encoded String: Alice noun
[SensoryCore] Encoded String: rabbit noun
[SensoryCore] Encoded String: rabbit noun
[SensoryCore] Encoded String: tunnel noun
[SensoryCore] Encoded String: Blue noun
[SensoryCore] Encoded String: green noun
[SensoryCore] Encoded String: purple noun
[SensoryCore] Encoded String: red noun
[Sensory Core] Outputting following message:  Alice noun rabbit noun rabbit noun tunnel noun Blue noun green noun purple noun red noun


As you can see, its duplicating
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So quick note. If I take out that comment I get the following output:

Yes?>[SensoryCore] Reading Line: Alice followed the white rabbit. She followed the white rabbit down the tunnel. Blue green purple red.
[SensoryCore] Sentence: Alice followed the white rabbit
[SensoryCore] Sentence:  She followed the white rabbit down the tunnel
[SensoryCore] Sentence:  Blue green purple red
[SensoryCore] Append Sentence: Alice followed the white rabbit.
[SensoryCore] Append Sentence:  She followed the white rabbit down the tunnel.
[SensoryCore] Append Sentence:  Blue green purple red.
[SensoryCore] Processing tokens (sentences): Alice followed the white rabbit
[SensoryCore] Array Contents[Alice, followed, the, white, rabbit]
[Sensory Core] Encoding noun: Alice noun
[Sensory Core] Encoding noun: rabbit noun
[SensoryCore] Processing tokens (sentences):  She followed the white rabbit down the tunnel
[SensoryCore] Array Contents[, She, followed, the, white, rabbit, down, the, tunnel]
[Sensory Core] Encoding noun: rabbit noun
[Sensory Core] Encoding noun: tunnel noun
[SensoryCore] Processing tokens (sentences):  Blue green purple red
[SensoryCore] Array Contents[, Blue, green, purple, red]
[Sensory Core] Encoding noun: Blue noun
[Sensory Core] Encoding noun: green noun
[Sensory Core] Encoding noun: purple noun
[Sensory Core] Encoding noun: red noun
[SensoryCore] Encoding Complete
[SesoryCore] Encoding Size: 8
[SensoryCore] Encoded String: Alice noun
[SensoryCore] Encoded String: rabbit noun
[SensoryCore] Encoded String: rabbit noun
[SensoryCore] Encoded String: tunnel noun
[SensoryCore] Encoded String: Blue noun
[SensoryCore] Encoded String: green noun
[SensoryCore] Encoded String: purple noun
[SensoryCore] Encoded String: red noun
[Sensory Core] Outputting following message:  Alice noun rabbit noun rabbit noun tunnel noun Blue noun green noun purple noun red noun


As you can see, its duplicating
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The colors, for example, have POS "NAV" (noun, adjective, verb). It could be any one of those. If you want to see if there's any possibility that a word is a noun you could use
Ditto for any other POS characters. This is where the parsing gets ugly because if you have three possible POS, then which one is it?
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Problem solved. Do you know where I can find a list of what the meanings of the parts of speech abbreviations in moby could be?
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Carey Brown wrote:Your "symbol" will consist of one or more characters denoting various parts of speech in probability order. Your word "3-D" is most likely to be an adjective ('A') but may possibly be a noun ('N').

 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"He went golfing on the green"
"your bank balance is in the red"
 
Ted Gress
Ranch Hand
Posts: 229
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Where did you get that list?
 
Carey Brown
Bartender
Posts: 11188
89
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ted Gress wrote:Where did you get that list?

A README.txt file that came along in my download.
 
Why fit in when you were born to stand out? - Seuss. Tiny ad:
The new gardening playing cards kickstarter is now live!
https://www.kickstarter.com/projects/paulwheaton/garden-cards
reply
    Bookmark Topic Watch Topic
  • New Topic