How to calculate lexical cohension and semantic informaticveness for a given dataset?

Question

In 'Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures' they have mentioned;

There are two slightly different classes of measure: lexical cohesion (sometimes called ‘unithood’ or ‘phraseness’), which quantifies the expectation of co-occurrence of words in a phrase (e.g., back-of-the-book index is significantly more cohesive than term name); and semantic informativeness (sometimes called ‘termhood’), which highlights phrases that are representative of a given document or domain.

However, the review does not include the ways to calculate/derive these measures. Can someone please specify how to get these two measurements for a given text documents?

Welcome to the site! See Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation, and Semantic-based Estimation of Term Informativeness. — Emre
– Emre, Commented Mar 23, 2018 at 4:53

Brian Spiering · Accepted Answer · 2021-08-29 20:34:36Z

Lexical cohesion is also called colocation extraction, frequently occurring ngrams. One example is "San Francisco" which occurs more often relative to "San" and "Francisco" appearing independently. One method for colocation extraction is to rank order the occurrence all ngrams and pick a threshold for inclusion.

Semantic informativeness is closer to tf–idf for ngrams. Instead of just using raw frequency counts, the frequency is weighted by uniqueness.

Stack Exchange Network

How to calculate lexical cohension and semantic informaticveness for a given dataset?

1 Answer 1

Hot Network Questions

How to calculate lexical cohension and semantic informaticveness for a given dataset?

1 Answer 1

Related

Hot Network Questions