646 questions
0 votes
1 answer
41 views
Tokenization of Compound Words not Working in Quanteda in Japanese
Creating bigrams from unigrams doesn't seem to work in Japanese in {quanteda}. I can hack the text with gsub(), but I hope there's a better way. I can't post a complete reprex because SO won't allow ...
1 vote
1 answer
62 views
How to subset SOTU dfm to Presidents Bush and Carter in sotu and quanteda to generate a wordcloud chart?
I'm working with quanteda's SOTU corpus and need to subset it to look at President Bush's and Carter's speeches. I've been learning how to preprocess the corpus when in dfm format, but I'm not certain ...
1 vote
1 answer
73 views
R quanteda kwic not matching negative look behind pattern
I'm trying to use a regex pattern with kwic that doesn't match word preceded by in, of or and (using a negative lookbehind), it works in regex101 but not in kwic (which uses stringi's ICU regex ...
0 votes
1 answer
33 views
predict.svm ignoring new dfm object as x
I am trying to run svm (from e1071 package) on a document-feature matrix produced by the package quanteda. I start by training the svm on training data: svm_fit <- svm(x=dfm_train, y=as.factor(...
1 vote
1 answer
73 views
Avoiding overlap in frequency and document frequency count in Quanteda
Below is a dummy corpus of 4 documents. The dictionary was developed to identify the frequency of words or phrases in the corpus, as well as the number of documents a word or phrases occurs in. The ...
1 vote
2 answers
65 views
Unexpected behaviour with dfm_lookup - ordering of entries affects feature frequency counts
I am using quanteda 4.1.0 and getting some unexpected behaviour when using a dictionary to adjust for synonyms and plurals. The ordering of the entries in the dictionary is affecting the frequency ...
1 vote
1 answer
71 views
Quanteda overlap frequency reporting problem
I’m hoping to identify an apparent miscalculation due to overlapping terms. In the dummy data set and code below, which uses the same code on the actual data, the analysis works as expected. “...
1 vote
1 answer
100 views
R Error in validObject while running quanteda.textstats
I have around 2000 text files. While I was running textstat_summary I faced the following issue and unsure what to do next. I could somehow identify the problem came from this specific file (maybe ...
1 vote
0 answers
162 views
Parallel computing is disabled in Quanteda CRAN version. How can I enable Parallel computing in M3 Mac OS X
I am having problem with enablingh quanteda's parallel computing in R on M3 Macbook. In GitHub, the readme says Windows or macOS users do not have to install TBB or any other packages to enable ...
0 votes
1 answer
104 views
How to save n-gram output
A hopefully simple question. How can I save the ngram output from the following code? \\ library("quanteda") ## Package version: 2.1.2 data(data_corpus_inaugural) toks <- ...
0 votes
1 answer
68 views
Reading output of tokens_compound into a dictionary
I want to use tokens_compound to examine the frequency of phrases in the documents of a corpus. I used the corpus data_corpus_inaugural for illustrative purposes and selected some ngrams to search for....
-2 votes
1 answer
258 views
How to utilize Rauh's German Political Sentiment Dictionary
I need to utlize the named sentiment dictionary for my sentiment analysis in R studio. Unfortunately I have problems at that. The dictionary comes within a zip archive and specifically (as I assume) ...
0 votes
1 answer
82 views
R + quanteda + automatic detection of topics: error when running model
I have a set of many (around 20 thousand) short job descriptions in English. My purpose for now is to be able to detect their optimal number of topics. I use an R script which worked decently on a ...
0 votes
0 answers
149 views
LDA Error in x$terms %||% attr(x, "terms")
everyone. I can't understand why is giving me an error. Later on, the code was working with no errors. Packages are: quanteda, quanteda.texmodels, quanteda.textstats, quanteda.textplots, newsmap, ...
0 votes
1 answer
71 views
textstat_simil not found [closed]
I am using a script that contains the function textstat_simil and when I run it it throws an error "could not find function "textstat_simil" similarity_matrix <- textstat_simil(x = ...