1
$\begingroup$

In 'Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures' they have mentioned;

There are two slightly different classes of measure: lexical cohesion (sometimes called ‘unithood’ or ‘phraseness’), which quantifies the expectation of co-occurrence of words in a phrase (e.g., back-of-the-book index is significantly more cohesive than term name); and semantic informativeness (sometimes called ‘termhood’), which highlights phrases that are representative of a given document or domain.

However, the review does not include the ways to calculate/derive these measures. Can someone please specify how to get these two measurements for a given text documents?

$\endgroup$
1

1 Answer 1

0
$\begingroup$

Lexical cohesion is also called colocation extraction, frequently occurring ngrams. One example is "San Francisco" which occurs more often relative to "San" and "Francisco" appearing independently. One method for colocation extraction is to rank order the occurrence all ngrams and pick a threshold for inclusion.

Semantic informativeness is closer to tf–idf for ngrams. Instead of just using raw frequency counts, the frequency is weighted by uniqueness.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.