3
$\begingroup$

I am importing a text file with Swedish letters (å, ä, ö).

A. If I use ReadList["sv_JSPFirefox.txt"], it imports the file nicely but then I cannot use a command line like:

dict2 = Select[dict1, Not@StringContainsQ[#, Alternatives @@ rejectlist] &] 

The error code is:

StringContainsQ::strse: String or list of strings expected at position 1 in StringContainsQ[Aapua,.|0|1|2|3|4|5|6|7|8|9|{A,B,C,D,E,F,G,H, I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z}|Å|Ä|Ö]. >> 

Two screen dumps show the problem better: Importing Swedish characters OK Importing Swedish characters error

B. If I use: ReadList["sv_JSPFirefox.txt, Word] it imports the file but those characters are a mess; but at least I can use the command above and proceed; but commands (search, etc) including these characters do work properly.

I have tried to add "UTF8", "UTF-8","Unicode", "Lines" and other options but nothing seems to help. Any pointer in the right direction would be greatly appreciated. Thank you for your time!

20160306 Edit. @C.E. I made a sample file and a very small file called "shrimpsandwich.txt". This is enough because this is a word in Swedish that contains all three letters, "räksmörgås". You might have heard of smörgåsbord.

Trials and errors

Sample files: A small sample file is displayed in the screen dump above.

$\endgroup$
11
  • $\begingroup$ I can generate several error messages when I try to import a file with åäö in it using ReadList, but I can't reproduce your problem. Can you provide a file - does not have to be your entire file, it can be a smaller one - as well as a definition for rejectlist so that we can reproduce the problem? Also have you tried using Import? (e.g. StringSplit@Import[file]) $\endgroup$ Commented Mar 5, 2016 at 23:00
  • $\begingroup$ @Xavier, yes it is True. Thanks. $\endgroup$ Commented Mar 6, 2016 at 14:13
  • $\begingroup$ I tried two different character encodings yesterday but I could not reproduce your error message. Please post all of your code as copyable text and I will try using that. btw DictionaryLookup[{"Swedish", Repeated[___ ~~ "å" | "ä" | "ö" ~~ ___, {3, Infinity}]}] clearly shows that räksmörgås is not the only word in Swedish with å, ä and ö in it. $\endgroup$ Commented Mar 6, 2016 at 15:19
  • $\begingroup$ Thanks, @Xavier. dict2 becomes all OK but then again when I try to remove words containing characters in the "rejectlist" (capital letters, numbers etc.) it all reverses to gibberish for those letters. I do not understand why it is so, though. Is there another way? I suppose so, but I would like to understand why (but we can come back to that later). dict3 = Select[dict2, Not@StringContainsQ[#, Alternatives @@ idagThrowAway] &] $\endgroup$ Commented Mar 6, 2016 at 19:44
  • $\begingroup$ @C.E and @Xavier using the suggestion dict1 = ToString /@ ReadList["sv_JSPFirefox.txt"] it all works! Awesome! Now, the question arises in my mind: What was the problem, and what is different in this solution? If you have some time, please do explain. $\endgroup$ Commented Mar 6, 2016 at 20:58

1 Answer 1

1
$\begingroup$

As you can see from the discussion above which might give some other insights, the answer by @Xavier solves the problem. Thank you all, for taking the time to test and answer.

Solution:

dict1 = ToString /@ ReadList["sv_JSPFirefox.txt"] 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.