Identifying innuendo no joke for comp sci researchers
- 29 April, 2011 14:26
- Comments 5
"Well that was hard!" "That's what she said" Turning seemingly normal comments into sexual innuendo by adding the words "That's what she said" is a cultural phenomenon, appearing everywhere from TV sitcoms, to internet discussions, to movies. From having its own page on Wikipedia to sites dedicated to the joke, you don't have to look far on the internet before running into it. This has led some to wonder whether it is possible to determine when it is appropriate to add those magic four words to a sentence.
As it turns out, identifying humour through software is hard. For decades now, artificial intelligence (AI) researchers have been trying to solve the NLP (or Natural Language Processing) problem. This field of computer science and linguistics is concerned with the building of systems that can understand normal language as spoken by humans. This is normally considered a hard task, as the meaning of a sentence will often vary based on the context in which it is presented, and this is something that is difficult to implement in software. When you add humour and puns — when words can have multiple meanings — this can get substantially harder.
Two researchers at the University of Washington, however, were willing to give it their best shot. In a recently released paper entitled "That's What She Said: Double Entendre Identification", Kiddon and Brun describe what they've found and introduce their new approach to the problem: "Double Entendre via Noun Transfer" or DEviaNT for short.
Their approach consists of creating three functions that were used to score words based on a number of sample sentences sourced from either an erotic corpus or from the Brown corpus, the standard used in this field. The authors used the Standford Tagger to identify which parts of sentences in the corpus were nouns, adjectives, verbs and so on. Using these two sources, Kiddon and Brun were able to create three sets of functions they used to classify words based on their frequency and position related to other words. The "noun sexiness"
These three functions were used to score sentences for noun euphemisms (ie, does a test sentence include a word likely to be used in an erotic sentence). Other elements sentences were scored on included the presence of adjectives and verbs combinations more likely to be used in erotic literature. Finally, they used some information such as the number of punctuation and non-punctuation items in sentences.
Kiddon and Brun sourced a number of sentences from sites based on user-submitted content such as twssstories.com, fmylife.com and textsfromlastnight.com, which were scored using their system. These scores were used to train the WEKA machine learning package, an open source machine learning tool. Using their test set they were able to show a high level of identification of sentences which were suitable for "That's what she said"-style jokes, while keeping false negatives to a minimum — the authors flagging that making the joke when the sentence is not appropriate is much worse than not making the joke when it is appropriate.
For those interested in the topic, the authors will be presenting on it at the 49th Annual Meeting of the Association for Computation Linguistics: Human Language Technologies in Portland next June.
Thanks to the folks at reddit for spotting the paper.
- Bookmark this page
- Share this article
- Got more on this story? Email Computerworld
- Follow Computerworld on twitter
- TV sitcoms
- movies
- Wikipedia
- sites dedicated to the joke
- artificial intelligence
- That's What She Said: Double Entendre Identification
- Brown corpus
- Standford Tagger
- WEKA machine learning package
- open source
- 49th Annual Meeting of the Association for Computation Linguistics: Human Language Technologies
- Disciplined Agile Delivery: An Introduction
- OVUM Report: Governance Risk and Compliance-- GRC usage and buying trends in the ANZ markets
- Optimizing Storage and Protecting Data with Oracle Database 11g
- How to Choose an SMB - Unified Communications as a Service (UCAAS) Solution
- Oracle SOA Suite – Oracle BPEL Process Manager
-
Anonymous Takes Aim at Indian Government
-
Java creator: Fears over consequences of possible Oracle trial win may be overblown
-
Detroit makes pitch for ousted Yahoo employees
-
LightSquared question is in FCC's hands now
-
EU Parliament to vote on ACTA without waiting for a court decision
-
Windows 7 for Dummies®
-
Windows 7 for Seniors for Dummies®
-
Microsoft Office
-
Office 2007 for Dummies
-
Computers for Seniors for Dummies, 2nd Edition
-
Windows 7 for Dummies® Dvd+book Bundle
-
Office 2007 All-In-One Desk Reference for Dummies
-
MYOB Software for Dummies 6E Australian Edition
-
Excel 2007 All-In-One Desk Reference for Dummies









Comments
gnome
And so these two people, rather generously described as researchers, have been wasting their time and money (and ours, no doubt) on this hyberbolic nonsense.
That's what she said.
John
Vivian you must be great fun. ;) To be fair any attempt to make inroads into getting AI closer to understanding natural language is reasonable in my view, even if the progress so far has been distinctly unimpressive.
no one
these people must hate gay people, because there is no mention of "that's what he said,
cafehunk
Innuendo? - that's what (s)he said!
Matt
Wow, that seems really hard.
I can't believe it's finished already.
I wonder how long it took to get it up and running.
Probably took a lot of long nights to grind it out.
Post new comment