Subscribe to RSS

Question 1

Creating bigrams from unigrams doesn't seem to work in Japanese in {quanteda}. I can hack the text with gsub(), but I hope there's a better way. I can't post a complete reprex because SO won't allow ...

Question 2

I'm working with quanteda's SOTU corpus and need to subset it to look at President Bush's and Carter's speeches. I've been learning how to preprocess the corpus when in dfm format, but I'm not certain ...

Question 3

I'm trying to use a regex pattern with kwic that doesn't match word preceded by in, of or and (using a negative lookbehind), it works in regex101 but not in kwic (which uses stringi's ICU regex ...

Question 4

I am trying to run svm (from e1071 package) on a document-feature matrix produced by the package quanteda. I start by training the svm on training data: svm_fit <- svm(x=dfm_train, y=as.factor(...

Question 5

Below is a dummy corpus of 4 documents. The dictionary was developed to identify the frequency of words or phrases in the corpus, as well as the number of documents a word or phrases occurs in. The ...

Question 6

I am using quanteda 4.1.0 and getting some unexpected behaviour when using a dictionary to adjust for synonyms and plurals. The ordering of the entries in the dictionary is affecting the frequency ...

Question 7

I’m hoping to identify an apparent miscalculation due to overlapping terms. In the dummy data set and code below, which uses the same code on the actual data, the analysis works as expected. “...

Question 8

I have around 2000 text files. While I was running textstat_summary I faced the following issue and unsure what to do next. I could somehow identify the problem came from this specific file (maybe ...

Question 9

I am having problem with enablingh quanteda's parallel computing in R on M3 Macbook. In GitHub, the readme says Windows or macOS users do not have to install TBB or any other packages to enable ...

Question 10

A hopefully simple question. How can I save the ngram output from the following code? \\ library("quanteda") ## Package version: 2.1.2 data(data_corpus_inaugural) toks <- ...

Question 11

I want to use tokens_compound to examine the frequency of phrases in the documents of a corpus. I used the corpus data_corpus_inaugural for illustrative purposes and selected some ngrams to search for....

Question 12

I need to utlize the named sentiment dictionary for my sentiment analysis in R studio. Unfortunately I have problems at that. The dictionary comes within a zip archive and specifically (as I assume) ...

Question 13

I have a set of many (around 20 thousand) short job descriptions in English. My purpose for now is to be able to detect their optimal number of topics. I use an R script which worked decently on a ...

Question 14

everyone. I can't understand why is giving me an error. Later on, the code was working with no errors. Packages are: quanteda, quanteda.texmodels, quanteda.textstats, quanteda.textplots, newsmap, ...

Question 15

I am using a script that contains the function textstat_simil and when I run it it throws an error "could not find function "textstat_simil" similarity_matrix <- textstat_simil(x = ...

Collectives™ on Stack Overflow

Tokenization of Compound Words not Working in Quanteda in Japanese

How to subset SOTU dfm to Presidents Bush and Carter in sotu and quanteda to generate a wordcloud chart?

R quanteda kwic not matching negative look behind pattern

predict.svm ignoring new dfm object as x

Avoiding overlap in frequency and document frequency count in Quanteda

Unexpected behaviour with dfm_lookup - ordering of entries affects feature frequency counts

Quanteda overlap frequency reporting problem

R Error in validObject while running quanteda.textstats

Parallel computing is disabled in Quanteda CRAN version. How can I enable Parallel computing in M3 Mac OS X

How to save n-gram output

Reading output of tokens_compound into a dictionary

How to utilize Rauh's German Political Sentiment Dictionary

R + quanteda + automatic detection of topics: error when running model

LDA Error in x$terms %||% attr(x, "terms")

textstat_simil not found [closed]

Hot Network Questions