Querying words with hyphen
I installed QueryData - thank you for sharing!
I am checking WordNet coverage of free text - how much of the
words are in WN. A problem I came across is the following:
my text has a plural of a hyphenated word:
and POS tagger I am using knows it is a noun.
Since I want first of all to normalize words, I query
and I get an empty string in response, although online WN brings
me to an entry for go-kart.
I suspect the problem is that _forms uses [ _] as token-separators,
so go-karts is treated as one token. However, rules of detachment
do not seem to work correctly, since Perl's \w+ does not match
hyphens, so go-karts does not match the plural pattern.
I am aware of the risk of fixing one thing locally, without
appreciating the consequences. Do you think adding hyphen as a token
separator is a good way around, or is it going to have major
undesirable consequences? Maybe you have a better idea of a fix?
Beata Beigman Klebanov
Computer Science and Engineering
Hebrew University of Jerusalem, Israel