philo2vec

A Tensorflow implementation of word2vec applied to stanford philosophy encyclopedia, the implementation supports both cbow and skip gram

for more reference, please have a look at this papers:

After training the model returns some interesting results, see interesting results part

Evaluating hume - empiricist + rationalist:

descartes malebranche spinoza hobbes herder

Some interesting results

Similarities

Similar words to death:

untimely ravages grief torment

Similar words to god:

divine De Providentia christ Hesiod

Similar words to love:

friendship affection christ reverence

Similar words to life:

career live lifetime community society

Similar words to brain:

neurological senile nerve nervous

operations

Evaluating hume - empiricist + rationalist:

descartes malebranche spinoza hobbes herder

Evaluating ethics - rational:

hiroshima

Evaluating ethic - reason:

inegalitarian anti-naturalist austere

Evaluating moral - rational:

commonsense

Evaluating life - death + love:

self-positing friendship care harmony

Evaluating death + choice:

regret agony misfortune impending

Evaluating god + human:

divine inviolable yahweh god-like man

Evaluating god + religion:

amida torah scripture buddha sokushinbutsu

Evaluating politic + moral:

rights-oriented normative ethics integrity

The repo contains:

an object to crawl data from the philosophy encyclopedia; PlatoData
a object to build the vocabulary based on the crawled data; VocabBuilder
the model that computes the continuous distributed representations of words; Philo2Vec

Installation

The dependencies used for this module can be easily installed with pip:

> pip install -r requirements.txt

The params for the VocabBuilder:

min_frequency: the minimum frequency of the words to be used in the model.
size: the size of the data, the model then use the top size most frequenct words.

The hyperparams of the model:

optimizer: an instance of tensorflow Optimizer, such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.
model: the model to use to create the vectorized representation, possible values: CBOW, SKIP_GRAM.
loss_fct: the loss function used to calculate the error, possible values: SOFTMAX, NCE.
embedding_size: dimensionality of word embeddings.
neg_sample_size: number of negative samples for each positive sample
num_skips: numer of skips for a SKIP_GRAM model.
context_window: window size, this window is used to create the context for calculating the vector representations [ window target window ].

Quick usage:

params = { 'model': Philo2Vec.CBOW, 'loss_fct': Philo2Vec.NCE, 'context_window': 5, } x_train = get_data() validation_words = ['kant', 'descartes', 'human', 'natural'] x_validation = [StemmingLookup.stem(w) for w in validation_words] vb = VocabBuilder(x_train, min_frequency=5) pv = Philo2Vec(vb, **params) pv.fit(epochs=30, validation_data=x_validation)

params = { 'model': Philo2Vec.SKIP_GRAM, 'loss_fct': Philo2Vec.SOFTMAX, 'context_window': 2, 'num_skips': 4, 'neg_sample_size': 2, } x_train = get_data() validation_words = ['kant', 'descartes', 'human', 'natural'] x_validation = [StemmingLookup.stem(w) for w in validation_words] vb = VocabBuilder(x_train, min_frequency=5) pv = Philo2Vec(vb, **params) pv.fit(epochs=30, validation_data=x_validation)

about stemming

Since the words are stemmed as part of the preprocessing, some operation are sometimes necessary

StemmingLookup.stem('religious') # returns "religi" StemmingLookup.original_form('religi') # returns "religion"

Getting similarities

pv.get_similar_words(['rationalist', 'empirist'])

Evaluating operations

pv.evaluate_operation('moral - rational')

plotting vectorized words

pv.plot(['hume', 'empiricist', 'descart', 'rationalist'])

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
models.py		models.py
preprocessors.py		preprocessors.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

philo2vec

Some interesting results

Similarities

operations

The repo contains:

Installation

The params for the VocabBuilder:

The hyperparams of the model:

Quick usage:

about stemming

Getting similarities

Evaluating operations

plotting vectorized words

Training details

skip_gram:

cbow:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

md-mq/philo2vec

Folders and files

Latest commit

History

Repository files navigation

philo2vec

Some interesting results

Similarities

operations

The repo contains:

Installation

The params for the VocabBuilder:

The hyperparams of the model:

Quick usage:

about stemming

Getting similarities

Evaluating operations

plotting vectorized words

Training details

skip_gram:

cbow:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages