Thursday, November 5, 2009

PARSING.....PARSING.....PARSING.......

Parsing is the most imporatnt concept when we talk about computer understnading of human language. Mostly all NLP applications such as IE IR, MT, Speech.. all benefit from Parsing. Linguistics as a desicpline has tried to define Parsing as the process of how people interpret language (or language structure!?). What is knowledge of language, where does it reside and how it is used and applied : these are the three fundamental questions of Linguistics theory. In computational terms Parsing involves defining an algorithm that maps any given sentence to its associated synatactic tree structure (is it mandatory to have a tree always??).

If we concieve Parsing in NLP as a process of transforming natural language into an internal system representation, be it in any form.. tree, brackets, graph.. anything, but ultimately the output is a syntactic structure of a given sentence. This structure can be given to a Semantic Analyzer for further interpretation of the meaning of the sentence because structure alone makes no sense like "Colorless green ideas sleep furiously". What is important is the selectional properties of each word in the utterance and their collocational relation with other words. So we can say that Syntactic Parsing is the first step in the computer processing of natural language. Another contributory part of this process is Lexicon which encodes the syntactic properties and semantic features of each word in the language. And if this lexicon can be presented in such a manner that it defines the conceptual relations among words in a formal way.. we call it Ontology. We can also have multiple ontologies that caters to Sub-categorization frames and Selectional Restrictions of any given word form (mostly of noun, verbs and adjectives). Then Parsing is left with minimal work to do if we can list all the required information (GNP features, Properties etc.) in the Lexicon itself.

What I am trying to say is that the problem with the existing computational grammar formalisms (TAG, LFG, HPSG) is they focus more on syntactic parsing. I am not even sure if  something called Semantic Parsing is acceptable or not. They are more into surface representation rather than understanding the underlying lexical information. But if we explore Generative Linguistics Theory, I feel we can utilize the principles of Minimalist Program by Noam Chomsky for computational parsing. Not because I am a student of Generative Grammar or very fond of it, but I feel Generative Grammar is the only linguistic theory that can be imporated to NLP, at least for MT and Information Extraction. For speech recognition, as experts say, Generative Phonology is not so useful. The advantage of Minimalist Program is that it assumes lexicon to be inflected with all the features and its only at the LF and PF level which decide which feature is to be retained and which to be discarded. Need to study more on this... but I have a gut feeling that its feasible. Lets see.....

No comments:

Post a Comment