Pre-Presentation Notes Slides and presentation materials are available online at: karlwiegand.com/defense 1 Disambiguation of Imprecise User Input Through Intelligent Assistive Communication Karl Wiegand Northeastern University Boston, MA USA December 2014 2 Thesis Statement "Intelligent interfaces can mitigate the need for linguistically and motorically precise user input to enhance the ease and efficiency of assistive communication." 3 Theoretical Contributions "...mitigate the need for linguistically and motorically precise user input..." An unordered language model that bridges syntax and semantics. [Wiegand and Patel, 2012A] An empirical comparison of contextual language predictors. [Wiegand and Patel, 2015B (R1)] A motor movement study with current and potential AAC users. [Wiegand and Patel, 2015A] 4 Applied Contributions "...to enhance the ease and efficiency of assistive communication." A semantic approach to icon-based, switch AAC. [Wiegand and Patel, 2014B] A continuous motion overlay module for icon-based AAC. [Wiegand and Patel, 2012B] Mobile, letter-based AAC that supports conversational speeds. [Wiegand and Patel, 2014A] 5 Outline Assistive Communication Theoretical Contributions Applied Contributions Summary and Conclusion 6 Part 1: Assistive Communication 7 On Communication SMCR and derivatives [Shannon and Weaver, 1949] Affected by distortion to any component What if there is distortion from the Source? 8 Who Uses AAC? People of all ages; ~2 million in US [NIH, 2000] Developmental disorders: Autism, cerebral palsy... 53% of people with CP use AAC [Jinks and Sinteff, 1994] Neurological and neuromotor disorders: ALS, MD, MSA, stroke, paralysis... 75% of people with ALS use AAC [Ball, 2004] 9 AAC stands for Augmentative and Alternative Communication and is primarily used by people for whom... Functional Definitions Target users are primarily non-speaking and may have upper limb motor impairments Target users may also have developing literacy or language impairments 10 Types of AAC Physical Boards Electronic Systems Letter-Based Icon-Based 11 Types of AAC Physical Boards Electronic Systems Letter-Based Icon-Based 12 On Speed of Communication Speech is often 150 - 200 words per minute [Beasley and Maki, 1976] vs. Typical AAC is < 20 words per minute [Higginbotham et al, 2007] 13 Modern AAC Application 14 SpeakForYourself, an icon-based AAC application for iOS and Android The Problem 15 What is the Goal? Make AAC more intelligent "Intelligent" meaning: User-specific Adaptive Context-sensitive 16 How? By addressing some common assumptions: Prescribed Order Intended Set Discrete Entry 17 Assumption 1: Prescribed Order Users will select items in a specific order, such as the syntactically "correct" one. Users do not always select items in expected order [Van Balkom and Donker-Gimbrere, 1996] Using AAC devices is slow [Beukelman et al, 1989; Todman, 2000; Higginbotham et al, 2007] Assumptions of diminished capacity 18 Assumption 2: Intended Set Users will select exactly the items that are desired -- no fewer or more. Motor and cognitive impairments may result in missing or additional selections [Ball, 2004] Letter-based text entry systems detect accidental and missing selections 19 Assumption 3: Discrete Entry Users will make discrete movements or selections, either physically or with a cursor. Some letter-based systems have started to remove this assumption [Goldberg, 1997; Kristensson and Zhai, 2004; Kushler and Marsden, 2008; Rashid and Smith, 2008] Many input signals are naturally continuous 20 The Goal 21 Part 2: Theoretical Contributions 22 Theoretical Contributions Semantic Frames, Semantic Grams Semantic Grams, Contextual Prediction Personalized Interaction Prescribed Order Intended Set Discrete Entry 23 Theoretical Contributions Semantic Frames, Semantic Grams Semantic Grams, Contextual Prediction Personalized Interaction Prescribed Order Intended Set Discrete Entry 24 Addressing Prescribed Order Statistical MT [Soricut and Marcu, 2006] Semantic frames, CxG, and PAS [Fillmore, 1976] Give ( Agent, Object, Beneficiary ) WordNet, FrameNet, "Read the Web" (NELL), Groningen Meaning Bank Computationally intense to obtain statistics 25 Motivating Questions Can we create a simple and fast language model for use with semantic frames? Current completion and prediction strategies rely on syntactic order and word distance N-grams, s-grams, skip-grams, CVSMs, etc. Compansion [McCoy et al, 1998] Memory-based LMs [Van Den Bosch and Berck, 2009] Can utterances be predicted/completed without assuming order and distance? 26 Motivating Examples Prior Input: play, video games, i, brother Output: "My brother and I play video games." Prior Input: play, chess, i, dad Output: "I play chess with my dad." Input: i, brother, ... Output: ? 27 Possible Approach Sentences are one of the smallest units of language that are: Semantically coherent Semantically cohesive Syntactically demarcated How can they be leveraged for prediction? 28 Semantic Grams A multiset of words that appear together in the same sentence. "I like to play chess with my brother." brother, chess (1) brother, i (1) brother, like (1) brother, play (1) chess, i (1) chess, like (1) chess, play (1) i, like (1) i, play (1) like, play (1) 29 More on Sem-Grams Sentence Boundary Detection (SBD) is fast and relatively accurate (> 98.5%) Sentences provide dynamic context windows Sentence-level co-occurrence with uniform weight applied to all relationships in a sentence 30 Sem-Grams Study Blog Authorship Corpus 140 million words from 19,320 bloggers Age range of 13 - 48; balanced genders Split by authors: 80% training, 20% testing 2 n-gram and 2 sem-gram algorithms Naive Bayes: N1 and S1 N2 (weighted adjacency) and S2 (full independence) 31 Method For every test sentence: Process (split, stop, stem, and check) Shuffle stems Remove one (target) Query each algorithm for missing stem (ranked list) Evaluation: random 2000 sentences Score: position of target (lower score is better) 32 Results: Example 1 Original: “This semester Im taking six classes.” Target Stem: class Input Stems: take, semest, six N1 Candidate List: next, month, class, hour, last, second, week, year, first, five, flag, ... S1 Candidate List: class, month, year, last, time, one, go, day, get, school, will, first, ... 33 Results: Example 2 Original: “Hey, they’re in first, by a game and a half over the Yankees.” Target Stem: game Input Stems: yanke, hey, first, half N1 Candidate List: game, stadium, like, hour, time, year, day, guy, hey, fan, say, one, two, ... S1 Candidate List: game, got, like, red, time, play, team, sox, hour, go, fan, one, get, day, ... 34 Results: Example 2 Original: “Hey, they’re in first, by a game and a half over the Yankees.” Target Stem: game Input Stems: yanke, hey, first, half N1 Candidate List: game, stadium, like, hour, time, year, day, guy, hey, fan, say, one, two, ... S1 Candidate List: game, got, like, red, time, play, team, sox, hour, go, fan, one, get, day, ... 35 To further demonstrate the difference between these two approaches, I've highlighted some of the words here... Results: Performance of Sem-Grams 36 Summary of Sem-Grams Simple, "fast" (SBD), and distance-agnostic More accurate than similar n-gram-based algorithms Alternative to more complex methods Natural fit for use with semantic frames 37 Theoretical Contributions Semantic Frames, Semantic Grams Semantic Grams, Contextual Prediction Personalized Interaction Prescribed Order Intended Set Discrete Entry 38 Improving Unordered Prediction Dropping assumption of order results in information loss How can we compensate? Devices often ask for user demographics Mobile AAC devices have sensors: Date Time Location 39 Motivating Questions Almost all statistical LMs require background probabilities (priors) Most systems use Google's N-Gram Corpus, Wall Street Journal, or New York Times How much closer to a real user's priors can we get by leveraging context? 40 Contextual Prediction 23-year-old female in Seattle 23-year-olds Global Seattle 23-year-old females 41 Contextual Prediction Study Blog Authorship and Yelp Academic Dataset Contexts: age, gender, day of the week, day of the month, month, city, and state Map unigrams to contexts for all authors; minimal stops and no stemming Attribute Blog Authorship Yelp Authors 19,320 130,850 Features 525,253 134,199 42 Method Split by authors: 90% training, 10% testing For every test author's unique context: Obtain the true distribution (target) Compare to distribution from each predictor combo based on non-target 9 folds Metrics: Kullback-Leibler Divergence, Cosine Similarity, and Precision@20 43 Method Example Target Distribution Age: 23 Gender: Female DOW: Monday DOM: 25 - 31 Month: July City: Seattle State: Washington Predictor Combos Age Gender DOW Age + Gender Month + City Age + Gender + City ... (48 in total) 44 Results: Predictors by Metric . . . (No Context) 47 31, 27 (No Context) KL Divergence Rank CosSim & Prec@20 DOW+DOM+Month+City 1 Gender+DOM+Month Age+Gender+DOW+DOM+Month 2 Gender+Month Age+DOW+DOM+Month 3 Age+Month DOW+DOM+Month+State 4 Gender+DOW+Month DOW+Month+City 5 Age+Gender Age+Gender+DOW+Month 6 Age DOM+Month+City 7 Age+DOM 45 Summary of Context Contextual distributions can be more accurate than global statistics Location better by KL; demographics better by CosSim and Prec@20 Some combinations consistently better: Gender + DOM + Month Age + Gender + DOW + Month Age + Gender + DOM Age + Month 46 Theoretical Contributions Semantic Frames, Semantic Grams Semantic Grams, Contextual Prediction Personalized Interaction Prescribed Order Intended Set Discrete Entry 47 Addressing Discrete Entry Physical path or signal characteristics Rotated unistroke recognition [Goldberg, 1997] Letter-based paths [Kristensson and Zhai, 2004; Kushler, 2008] Relative positioning [Rashid, 2008] Well-received by non-disabled users 48 Motivating Questions Modern AAC now deployed on touchscreens Increasing research on accessibility Fitts and Steering Laws [Fitts, 1954; Accot and Zhai, 1996] Swabbing/sliding is easier [Wacharamanotham et al, 2011] Buttons need to be bigger [Chen et al, 2013] What about functional compensation? Can we learn realistic, layout-agnostic interaction patterns for an individual user? 49 Motor Optimization GUI (MoGUI) 50 MoGUI Example 51 MoGUI Study Residents at the Boston Home Current and potential AAC users 10 females and 5 males Ages 35 - 71 (mean of 56) 8 right-handed; 7 left-handed (3 due to MS) 2 cross-balanced sessions: taps vs. slides 4x4 grid = 16 locations Pseudo-random shuffling (a la Latin Squares) 52 Method 10.1" Android tablet in comfortable, landscape position; fully reachable Choice of finger or stylus 10 levels of 3 rounds each 1, 2, 3, ...10 balloons per round = 165 total Track all hits, misses, and timing 53 Results: Variability of Tap Misses 54 Multiple Taps Fingers Dragging Hand Resting Thumb Usage Results: Locations by Handedness Left Right Mean speed-to-target in pixels/second 55 Results: Directions by Handedness Mean speed-to-target in pixels/second Left Right 56 Summary of Personalization Sliding not significantly faster than tapping for arbitrary targets; no motor learning 16% accidental slides; 43% accidental taps High variance in individual motor patterns; weak correlations by handedness Gamified calibration Static improvements through personas: Handedness → margins, button locations Tap/slide preferences → input sensitivity 57 Part 3: Applied Contributions 58 Applied Contributions Free Order, Discrete Icons Free Order, Continuous Icons Mobile, Mixed-Input Letters RSVP-iconCHAT SymbolPath DigitCHAT 59 A Collaborative Effort Locked-In Syndrome (LIS) Spinal injuries, ALS, tumors, strokes... 1% of ischemic strokes [Smith and Delargy, 2005] Icon-based, switch AAC for people with LIS Dr. Deniz Erdogmus and Dr. Rupal Patel Minimal switch/signal requirements (1+) Goal of a brain-computer interface (BCI) Verb-first message construction [Patel et al, 2004] 60 Rapid Serial Visual Presentation Used in psychology, speed-reading, lie detection, and letter-based BCI [Orhan et al, 2012] 61 RSVP-iconCHAT 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Observations Prediction/ordering controls speed of message construction Natural fit for prediction via semantic grams Required screen space is now tied to message complexity 76 RSVP-iconCHAT Study 24 non-disabled participants (ND) 14 females and 10 males Ages 19 - 43 (mean of 24) 4 participants with speech and motor impairments (SMI) 2 females and 2 males Ages 33 - 56 (mean of 41) Space bar as switch mechanism Up to 106 words in alphabetic order 77 Method For every participant: Introduction and 3 training cards Shuffle 30 picture cards Use the system to describe each card RSVP starting at 700ms; adjustable at any time 78 Results: Construction Time 79 Overview of Results Average speed of last 5 utterances: 70s (ND) vs. 107s (SMI) No nonsensical utterances Average of 5 selections (verb + 4) RSVP speeds w/ positive motor response: 700ms (ND) vs. 1200ms (SMI) 80 Summary of RSVP-iconCHAT Immediately applicable to mobile systems Message complexity can be scaled (personalized) Exandable to multi-modal or analog input: Push the switch harder to go faster Directional switches "Oops" functionality Involuntary responses (BCI) could leverage predictive reordering via sem-grams 81 Applied Contributions Free Order, Discrete Icons Free Order, Continuous Icons Mobile, Mixed-Input Letters RSVP-iconCHAT SymbolPath DigitCHAT 82 SymbolPath Motivation 83 SymbolPath "I need more coffee" 84 Summary of SymbolPath Designed for people with upper limb motor impairments or developing literacy Semantic grams reweighted by path contour 75+ active users on Android Regular email feedback: "It's fun!" Drawing and syntactic completion/generation encourages fuller utterances 85 Applied Contributions Free Order, Discrete Icons Free Order, Continuous Icons Mobile, Mixed-Input Letters RSVP-iconCHAT SymbolPath DigitCHAT 86 DigitCHAT Motivation 87 DigitCHAT Word-by-word, real-time construction Mixed-mode input and active learning 88 Summary of DigitCHAT Scalable and fast (> 45 WPM) [Silfverberg et al, 2000] Compare to < 20 WPM for most AAC systems 15+ active users on Android Winner of the ACM ASSETS 2014 Text Entry Challenge 89 Projected DigitCHAT Head-tracking prototype by Dan Lazewatsky and Bill Smart (Oregon State University) 90 Part 4: Summary and Conclusion 91 Thesis (Redux) "Intelligent interfaces can mitigate the need for linguistically and motorically precise user input to enhance the ease and efficiency of assistive communication." 92 Theoretical Contributions "...mitigate the need for linguistically and motorically precise user input..." An unordered language model that bridges syntax and semantics. [Wiegand and Patel, 2012A] An empirical comparison of contextual language predictors. [Wiegand and Patel, 2015B (R1)] A motor movement study with current and potential AAC users. [Wiegand and Patel, 2015A] 93 Applied Contributions "...to enhance the ease and efficiency of assistive communication." A semantic approach to icon-based, switch AAC. [Wiegand and Patel, 2014B] A continuous motion overlay module for icon-based AAC. [Wiegand and Patel, 2012B] Mobile, letter-based AAC that supports conversational speeds. [Wiegand and Patel, 2014A] 94 Revisiting the Goal 95 Revisiting the Goal 96 Special thanks to the Continuous Path Foundation and the National Science Foundation (Grants #HCC-0914808 and #SBE-0354378). Thank you for listening! karlwiegand.com/defense 97 98 Sem-Grams: Method Details Test sentences truncated to 20 words All algorithms seeded with top 10 type-specific grams for each input word Maximum of 190 candidate words to rank Absence of target word in list was considered a "failure to predict" 99 Sem-Grams: Overview of Results N1 N2 S1 S2 # of Sentences 2000 2000 2000 2000 # Predicted 647 649 435 435 Average Score 16.26 19.70 9.04 12.67 100 Sem-Grams: Performance 101 Context: Method Details Predictor Blog Authorship Yelp Age 26 - Gender 2 - Day of the Week (DOW) 7 7 Day of the Month (DOM) 31 (4) 31 (4) Month 12 12 City - 119 State - 16 Average of 18 unique contexts per author in Blog Authorship and 4 in Yelp Dataset 102 MoGUI: Observations Varied tablet and hand/arm positions Tablet being held, flat/tilted on lap, on desk, tilted on table, held in wheelchair mount Use of fingers, thumb, stylus, and knuckles Ghost tapping, spastic tapping, stylus friction, and finger humidity Repeated margin activation and triggering of Google Now functionality 103 Brain-Computer Interfaces (BCI) http://www.emotiv.com/ http://www.neurosky.com/ 104 Emotiv Epoc on the left and Neurosky's MindWave on the right The P300 Wave 105 Complexity vs. Real Estate 106 RSVP-iconCHAT: Construction Time 107 108 109 110 111 RSVP-iconCHAT: Feedback All users get restless w/ alphabetic ordering Even alphabetic ordering can be surprising All users with SMI asked about other switches and multi-modal methods All users favorably mentioned the automatic syntax generation/modification 112 113 114