Talk:Word sense disambiguation

From Scholarpedia
Jump to: navigation, search

    This is a informative summary of the field of WSD at this time. I have some minor suggestions which the authors might consider:

    Is a link on 'computational linguistics' possible, rather than linguistics as 'computational linguistics' presumably links to linguistics?

    On the first line, consider putting '(meaning)' after sense

    I would use WSD throughout (rather than word sense disambiguation) after you first introduce the acronym.

    Penultimate paragraph before the History where you state 'However, support vector machines (supervised learning) have been the most successful algorithms to date.' I would say 'However, supervised learning methods, particularly support vector machines, have been the most successful algorithms to date.'

    On the last paragraph before 'Applications' I would add a qualification to the last sentence to the extent that there is a heavy prerequisite for supervised systems because of their requirement for substantial manual annotation which is laborious and costly.

    In your paragraph on 'the utility of WSD' you state that WSD as a separate module has not been shown to make a decisive difference. You might want to qualify this and explain that when integrated properly there is some evidence that WSD helps e.g. Carpuat and Wu (2007) at EMNLP.

    In the machine translation paragraph 'or WSD is folded into a statistical translation model.' you could add 'where words are translated within phrases which thereby provide context'

    I would rewrite the sentence 'For example, in pine cone, the right definitions both include the words evergreen and tree, in one dictionary.' Perhaps as 'For example, when disambiguating the words in the phrase pine cone, the definitions of the appropriate senses of both pine and cone both include the words evergreen and tree (at least in one dictionary).'

    Semi-supervised methods: I would perhaps call these hybrid methods, or make it clear in the description that the methods combine supervised methods with either knowledge, or extra unannotated data.

    In the paragraph on Unsupervised methods the last sentence is perhaps too strong and should be qualified. For example, you might say 'It is hoped that unsupervised learning will overcome the knowledge acquisition bottleneck because they are not dependent on manual effort.'

    Evaluation: first line: I'd change 'target or correct' to 'correct' (I use target for the token words)

    'The latter is deemed a more realistic form of evaluation, but the corpus is more expensive to produce' I'd add 'because human annotators have to read the definitions for each word in the sequence every time they need to make a tagging judgement, rather than once for a block of instances for the same target word.'

    last line 'and SemEval once' might be better as 'and SemEval, its successor, once'

    Why is it Hard: I would change the title to 'Why is WSD Hard' otherwise we aren't sure if it refers to WSD or evaluation

    In 'Word meaning does not divide up into discrete senses': 'can agree at' -> 'can agree on distinctions at'

    'However, it is not at all clear if these same meaning distinctions are applicable in computational applications.' I might add a reference to Kilgarriff (2006) who points out that the decisions of lexicographers are driven by other considerations.

    'One does begin to wonder if complete natural language understanding is necessary.' I would rephrase this perhaps to the effect 'It is still an outstanding question what form and extent of natural language understanding is necessary as well as what is feasible.'

    Suggestions for reading: As well as the references to Kilgarriff (2006) and Wu and Carpuat (2007) I would welcome references to Schutze (1998) when discussing unsupervised clustering and Yarowsky (1995) when introducing hybrid papers as these were seminal papers in this field.

    Kilgarriff (2006) Word Senses. In Word Sense Disambiguation: Algorithms and Application, ed. Agirre and Edmonds. Chapter 2, pp 29-46. Springer.

    Marine CARPUAT and Dekai WU. Improving Statistical Machine Translation using Word Sense Disambiguation. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007). Prague: Jun 2007.

    Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA, pp. 189-196, 1995.

    Schutze, H (1998) Automatic word sense discrimination Computational Linguistics Volume 24 , Issue 1 Special issue on word sense disambiguationpp 97 - 123

    'External Links' I would put the SemEval site after the SENSEVAL site since it is the successor.

    Personal tools

    Focal areas