Google is promoting natural language understanding with the open-sourcing of SyntaxNet, a neural network framework, and Parsey McParseface, an advanced parser for English text.
Implemented in Google's open source TensorFlow machine intelligence library and released this month, SyntaxNet provides the code needed to train natural language understanding (NLU) models on your data along with the Parsey McParseface parser for analyzing English text.
"Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language and that can explain the functional role of each word in a given sentence," said Slav Petrov, Google senior staff research scientist. The project arose out of Google's pondering of how computers can read and understand human language in order to process it in intelligent ways.
Accessible on GitHub, SyntaxNet serves as a framework for a syntactic parser, a key first component in many NLU systems, Petrov said. "Given a sentence as input, [the parser] tags each word with a part-of-speech tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question."
Parsey McParseface can analyze a sentence and understand its complexity. On a standard benchmark consisting of English newswire sentences, Parsey McParseface recovers individual dependencies between words with better than 94 percent accuracy.
Parsing is difficult for computers due to the ambiguity of human languages. A moderate-length sentence of 20 to 30 words could have as many as tens of thousands of possible syntactic structures. "A natural language parser must somehow search through all of these alternatives and find the most plausible structure given the context," said Petrov. SyntaxNet uses neural networks to tackle the ambiguity problem.