Questions tagged [bag-of-words]

Question 1

I was going through Naive Bayes Classifier (from Cornell Machine Learning course (link here) and I found quite confusing the use of the Naive Bayes classifier for bag-of-words with the Multinomial ...

Question 2

The standard bigram model, (for example defined here) defines a probability distribution over a corpus $V$ based on the following principles: The marginal probability of a word $w$ is defined as its ...

Question 3

The continuous bag of words model has the following log probability for observing a sequence of words: $$\log P(\textbf{w})=\sum_{c=1}^{C}\log{P(w_c|w_{c-m},...w_{c-1}, w_{c+1},...,w_{c+m}})$$ I don't ...

Question 4

My Situation: I should start off with my end goal: I want to get a distance metric between each document and all of the other documents To get there, I first need to encode these topic labels so that ...

Question 5

I am working to implement the continuous bag of words approach on the New York Times corpus dataset. However, I am getting word embeddings that do not seem very useful based on a few examples of ...

Question 6

i am trying to classify texts into topics. for example, let's say one of the topics is cooperation. so in the vocab param of the sklearn api. so some of the prevalent words (or "tokens" are ...

Question 7

I have a 500K rows x 10K features dataset. It consists into : a term document matrix with words + bigram and TF-IDF weighting 6 one hot encoded multi-labels It is much more features that I want to ...

Question 8

I am doing a document classification task and I find that using simple BOW features with a random forest provide better results than using complex models like BERT or ELECTRA even after doing some ...

Question 9

I was reading some articles on topic classification, in which some algorithm uses snippets of text as input and tries to classify them in topics, and I thought of implementing this technique in my ...

Question 10

They say in their paper, that "word hashing" can cause a collision. But I don't understand, how. For example, if word good is tranformed to ...

Question 11

I have a school project which consists of identifying each language of a tweet from a dataset of tweets. The dataset contains tweets in Spanish, Portuguese, English, Basque, Galician and Catalan. The ...

Question 12

The question is pretty clear from the Title itself, why the Continuous Bag of Words (CBOW) model is called continuous. I also don't know what exactly "distributed representation" of word vectors ...

Question 13

I am aware of the notion of the Dirichlet distribution, a multivariate generalization of the beta distribution. To get parameters of the Dirichlet distribution prior for bag-of-words, this CMU ...

Question 14

This CMU Machine Learning Course is using the Bag-of-words model without too much explanation. wiki uses the term multiplicity to explain that model. The bag-of-words model is a simplifying ...

Question 15

I have a multiclass text classification problem where I have very few documents for each class. The classes are imbalanced but I want to be able to predict the class when I have at least 200 - 300 ...

Stack Exchange Network

Questions tagged [bag-of-words]

Conditional independence assumption for Naive Bayes with Multinomial distribution

End-Tokens are Required to make Ngram Models Proper

Continuous Bag of Words derivation

What is a word embedding approach that would work for these pre-labeled documents?

Continuous Bag of Words NY Time Corpus

why is using a small vocabulary for topic modelling bad?

Fast feature selection on a huge dataset in R on a term document matrix

BOW features classifying better than complex models like BERT

Transforming topics into text data

How can "word hashing" cause a collision in DSSM?

Language Identification Better Results with Unigrams

Why CBOW model is called "continuous"?

could someone please give a concrete example to illustrate the Dirichlet distribution prior for bag-of-words?

could someone please give an concrete example to illustrate what does Multiplicity mean in the context of Bag-of-words model?

Text classification with small dataset for a specialized domain

Hot Network Questions