============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 4 Originality /Significance: 3 Soundness / Correctness: 4 Meaningful Comparison: 4 Overall Recommendation: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- - The paper presents interesting findings that, much farther down the line, might have a significant impact on message formulation in AAC devices. Most notably, it addresses unordered concept/word/icon selection and tests 4 basic algorithms for predicting cooccurring message components. Two algorithms are based on standard n-grams, and two are based on what the authors call sem-grams (which is too vaguely defined as "a multi-set of words that can appear together in a sentence" - examples would be useful!). The key to sem-grams appears to be related to within sentence cooccurrence frequency of content-word stems independent of ordering or distance (thus relating it to semantic coherence within sentences). Though the current prediction level is extremely low for all algorithms, they make the promising observation that the n-gram algorithms provide more accurate prediction of message for shorter messages, but sem-gram algorithms are better at narrowing to the exact message(s) more rapidly, and appear to be better as messages get longer. As the authors propose, a hybrid approach is worth testing in later research. - In their abstract the authors note that syntactic ordering "may not be appropriate for individuals with limited or emerging linguistic skills." But in the paper they talk of "Those with limited or emerging literacy skills". These are very different populations, and I dare say their approach may offer little to those with "limited or emerging linguistic skills", though it can offer much more to "those with limited or emerging literacy skills" (who otherwise have good foundational linguistic skills). - Though they seem to be interested in here-and-now, in person face-to-face AAC use for conversational purposes, the corpus they have used for testing is the "Blog Authorship Corpus". Despite its informal characteristics, this written corpus does not represent anything like conversational language use. As such, their claim that their algorithms were compared on a "conversational corpus" is extremely misleading. The corpus has none of the turn characteristics of conversation. In one dimension (message length) they may have made things more difficult for themselves by choosing this corpus, given that the number of words per turn in face to face conversation is often quite low. (In some conversations averaging between 3 and 7 words per turn, and rarely rising to 17 words per turn). -- There are plenty of conversational corpora available and it would be interesting to see what their tests reveal about these corpora. -- At this point, all they can do is change their claim, instead of saying they compared algorithm results on a conversational corpus, they need to say they compared it on a very large corpus of informal written (blogger) English. - AAC devices (used in face-to-face interaction) produce utterances, NOT sentences. The issue of well-formed utterances in context are distinct from those of well-formed sentences. This is another reason why real spoken conversation needs to be explored. Many well-formed utterances in context are regularly understood to be syntactic fragments. This raises an issue as to what they think the definition of their sem-grams means for conversational speech (and whether it needs to take into account information from the interlocutor's turn - since one person's utterance is often parasitic on that of the prior turn of their interlocutor). - Though the authors set themselves up as addressing an issue of relevance to "icon-based augmantative and alternative communication", I believe this is a red-herring. The predictions are relevant to all AAC systems, and may indeed be more useful for literate word-based users who could strategically select the key content words (regardless of order) and then select the semantically appropriate predicted utterance. [As an example of what potential inputs a user might make in a system, I wonder whether the researchers have looked at the telegraphic speech of people with non-fluent aphasia, or have used such utterances to see what expansions would emerge.] - If they are, in fact, specifically interested in icon-based AAC systems, then I'd like them to run us through how they imagine an AAC user, using their imagined system, works from an interface which is often as minimal as 6 icons presented to the user (at any one time) to the maximal presentation of about 60 icons (with roughly 30 often being visually optimal depending on size of device). Do they imagine screens changing options with each selection? What is presented to the user on each selection? Do they presume that the user will signal when their message is finished and they want options, or ...? -- I see no reason why they should be thinking at this level at the moment, unless they want to further simplify their problem space. - If the authors are interested in the corpus features of real conversations (and the relationships between corpus linguistics and conversation analysis and discourse analysis) then there were two issues of the International Journal of Corpus Linguistics dedicated to these matters in 2011. - Illustrative examples would have helped this reader. - The authors computational and NLP knowledge is of a high order, but they seem less clear about the nuts and bolts of AAC, and show very little actual linguistic knowledge. This comment is more about what they need to do to advance their project in the future than about the suitability of their paper for this conference. I think it is perfectly suitable. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 5 Originality /Significance: 4 Soundness / Correctness: 4 Meaningful Comparison: 4 Overall Recommendation: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This is a nice paper on the idea of fully semantic word prediction without regard to the order of the preceding input, which is useful in some situations of cognitive or linguistic impairment. The idea is well-developed, with an experiment on a synthetic corpus derived from blogs. The results are promising for further research. The paper is clear and well written. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 3 Clarity: 3 Originality /Significance: 3 Soundness / Correctness: 2 Meaningful Comparison: 3 Overall Recommendation: 3 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This is an interesting paper that presents a method of word prediction independent from syntactic order. This may be an important challenge, yet even though this is a preliminary work, I have some questions concerning the basic assumption. It is claimed that many AAC devices are used with syntactic order. However, the order in which a sentence built is not 'wired' in devices, and usually conveys educational/literacy paradigms. I would like to see more evidence on the necessity of prediction for unordered symbol/words insertion. Since unordered message is not desirable in communication sense, I'd rather see a tool that can help the user in structuring correct messages. Are words in non syntactic order are completely randomized, or perhaps phrases (a noun and its qualifiers) are kept in sequence? In addition, is the process aimed the process for symbolic/iconic input or for type one? Does this fact changes the methodology of choice? Why omitting stop words from prediction? Making prediction of 100 words seems to be unreasonable. Why 100? There must be a real example for input/words prediction process. Evaluation: It seems that evaluation is done only on one word per sentence. How will results change if sentences are tested gradually, that is, predict the first symbol? second? fifth? I am not sure that it is the same as testing sentences of length 1/2/etc with one omitted word. ============================================================================ REVIEWER #4 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 3 Originality /Significance: 3 Soundness / Correctness: 2 Meaningful Comparison: 2 Overall Recommendation: 3 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper proposes "sem-grams" which are essentially a form of bags of words. The authors make a good case for exploring these and it is potentially an interesting avenue of research. The paper is generally well-written and easy to follow. The problem with the paper is that it is using a very simplistic methodology. The paper does not address the huge practical challenges in achieving adequate performance. For instance, how do we efficiently smooth these models so we can get sufficient coverage? The lack of smoothing also explains the poor coverage and the results are not compared against the state of the art (which probably would be the statistical language models found at aactext.org/imagine). In addition, I'd like to see perplexity numbers that demonstrate how well (or poorly) the models predict AAC-like text (aactext.org/imagine would be a good source for test texts). In conclusion, the paper clearly has weaknesses but perhaps it will open up interesting discussions. My main reservation is that the work is of a very preliminary nature at this stage.