Skip to main content
0 votes
1 answer
41 views

Creating bigrams from unigrams doesn't seem to work in Japanese in {quanteda}. I can hack the text with gsub(), but I hope there's a better way. I can't post a complete reprex because SO won't allow ...
Mark R's user avatar
  • 1,053
1 vote
1 answer
62 views

I'm working with quanteda's SOTU corpus and need to subset it to look at President Bush's and Carter's speeches. I've been learning how to preprocess the corpus when in dfm format, but I'm not certain ...
Ben's user avatar
  • 107
1 vote
1 answer
73 views

I'm trying to use a regex pattern with kwic that doesn't match word preceded by in, of or and (using a negative lookbehind), it works in regex101 but not in kwic (which uses stringi's ICU regex ...
pluke's user avatar
  • 4,496
0 votes
1 answer
33 views

I am trying to run svm (from e1071 package) on a document-feature matrix produced by the package quanteda. I start by training the svm on training data: svm_fit <- svm(x=dfm_train, y=as.factor(...
Clara's user avatar
  • 3
1 vote
1 answer
73 views

Below is a dummy corpus of 4 documents. The dictionary was developed to identify the frequency of words or phrases in the corpus, as well as the number of documents a word or phrases occurs in. The ...
bgreen's user avatar
  • 87
1 vote
2 answers
65 views

I am using quanteda 4.1.0 and getting some unexpected behaviour when using a dictionary to adjust for synonyms and plurals. The ordering of the entries in the dictionary is affecting the frequency ...
Rob Ackland's user avatar
1 vote
1 answer
71 views

I’m hoping to identify an apparent miscalculation due to overlapping terms. In the dummy data set and code below, which uses the same code on the actual data, the analysis works as expected. “...
bgreen's user avatar
  • 87
1 vote
1 answer
100 views

I have around 2000 text files. While I was running textstat_summary I faced the following issue and unsure what to do next. I could somehow identify the problem came from this specific file (maybe ...
tctrg's user avatar
  • 13
1 vote
0 answers
162 views

I am having problem with enablingh quanteda's parallel computing in R on M3 Macbook. In GitHub, the readme says Windows or macOS users do not have to install TBB or any other packages to enable ...
Sadettin Demirel's user avatar
0 votes
1 answer
104 views

A hopefully simple question. How can I save the ngram output from the following code? \\ library("quanteda") ## Package version: 2.1.2 data(data_corpus_inaugural) toks <- ...
bgreen's user avatar
  • 87
0 votes
1 answer
68 views

I want to use tokens_compound to examine the frequency of phrases in the documents of a corpus. I used the corpus data_corpus_inaugural for illustrative purposes and selected some ngrams to search for....
bgreen's user avatar
  • 87
-2 votes
1 answer
258 views

I need to utlize the named sentiment dictionary for my sentiment analysis in R studio. Unfortunately I have problems at that. The dictionary comes within a zip archive and specifically (as I assume) ...
user23820003's user avatar
0 votes
1 answer
82 views

I have a set of many (around 20 thousand) short job descriptions in English. My purpose for now is to be able to detect their optimal number of topics. I use an R script which worked decently on a ...
larry77's user avatar
  • 1,543
0 votes
0 answers
149 views

everyone. I can't understand why is giving me an error. Later on, the code was working with no errors. Packages are: quanteda, quanteda.texmodels, quanteda.textstats, quanteda.textplots, newsmap, ...
Diego Gimenez's user avatar
0 votes
1 answer
71 views

I am using a script that contains the function textstat_simil and when I run it it throws an error "could not find function "textstat_simil" similarity_matrix <- textstat_simil(x = ...
Nuria's user avatar
  • 77

15 30 50 per page
1
2 3 4 5
44