Saturday, January 30, 2010

Basic Understanding.....

The Minimalist Program (MP) is a program of minimal theory, a search for simplicity, which tries to arrive at the minimal set of language specific rules including principles and parameters. The two most salient features of this system are its derivational character and the role that Economy conditions play in regulating possible derived structures. Structures that do not pass the Economy conditions are simply not generated. It has no D-structure or S-structure but only LF and PF interface. The LF interaction takes place after lexical items are chosen from the lexicon and the computational system starts building representations. Each lexical item has three features, namely, Semantics, Phonological, and Syntactic. The two major grammatical operations in MP are Merge and Move, used as devices of feature checking. Features are the morphological features of Case, Tense and Agreement (phi features of person, gender and number) that have to be checked. Checking is satisfied when a category needing a feature is in construction with some other element that can supply that feature in the sentence (which is called ‘licensing’). For example verbs are assumed to be inflected for tense and agreement features in the lexicon and are inserted into derivation in bare form. The features carried by the verb are checked against corresponding features encoded in the inflectional categories. In case of verb, AGRS, AGRO and T are the features to be checked that can take place at any stage in derivation. In MP Merge operations are only licensed if they allow feature checking to occur.

In the proposed research of mine, I intend to investigate the internal structure of Determiner Phrases (DP) in Assamese within the framework of Minimalism and want to apply this analysis into an NLP application. Initially my idea was to use the results of linguistic analysis in building a syntactic parser for Assamese with specific reference to DP. The immediate by-product pertaining to DP analysis could be a shallow parser (e.g., a DP Chunker in Assamese) which is of immense use in various NLP applications. But now, trying to look at it from a different perspective, what I want to address is the underlying structure of DPs in Classifier languages (Assamese, of course still the reference point) as compared to Class languages like Hindi and how can it help in processing and generation of source and target languages from the point of view of Machine Translation. It sounds quite ambitious and even more tough when we talk about "drawing an analogy with the inherent Human faculties of Perception, Learning & Reasoning". At this pont of time, honestly I have no idea how to do it! After much self brooding over this topic, I feel lets do the linguistic analysis first, forget about NLP. So, what does Minimalism say? There are two levels of syntactic representations: LF and PF. LF interfaces with Conceptual-intentional (CI) system and PF interfaces with Articulatory perceptual (AP) system. (If I am not wrong it should correspond to "the faculty of Understanding" and "the faculty of Sensibility/Perception" in terms of "Perception, Learning & Reasoning"). If that's the case then what does an Assamese speaker concieve of when s/he sees any object or any kind of reference to an object in the discourse of Assamese? What are the features of the noun that are concieved and how that conception is translated into the language?

While addressing these questions with specific reference to DPs in Assamese, what would of considerable interest is to identify the DP internal agreements, if any. Considering the linguistic aspects the language, the important issue to be focused is the syntactic and semantic status of classifiers in Assamese within the feature composition of nouns. Assamese has an extensive system of classifiers which are defined as morphemes (Enclitic Definitives as in Kakoti, 1941) that categorize the referent of a noun in terms of its animacy, shape, size and other inherent properties. Classifiers in Assamese operate in two ways: (a) specific classifiers that agree with the noun in terms of semantic features and (b) the generic classifiers which can occur with any noun regardless of its semantic features. Another important question is whether the language has plural morphemes or not? Whether the nouns are count or mass? What is the behavior of number morphology? What are the other modificational elements in the language? (Adjectives, Quantifiers, Demonstratives etc), which means the first step of "faculty of understanding"---- extraction of data or specific features of data.

OK.... too tired now.... will listen to the mp3 recording of the Classifier class by Prof Veneeta Dayal and try to understand the crosslinguistic typology....

Friday, January 29, 2010

In Retrospect.....


Feels nostalgic to sit through a Research Methodolgy class after years!! Though its not been a long time, just four years back (monsoon 2005), my first introduction to the world of Linguistic research, brought me the remembrance of Prof Pramod Pandey teaching us Bacon, Chomsky, meaning of words like 'scientific research', 'method', all the 'isms'. But at that point of time, we took it as just another coursework, more than understanding the core of it, what  we were interested in was good grades, how to grab an easy text for book review or how to aviod topics related to empirical issues so that in seminar, faculty or other students can harash minimally!! Today, I am finding the real meaning of those terms, while writing my synopsis also did not pay much attention to it. Research was like putting together the review of the RPs and texts, pick one theory and try to fit your data into it. In fact, it happened without  any attention to method (in the real sense of the term, because I did have a section on methodology in my synopsis, mostly use of native speaker intuition, thanks to Generative Syntax)!!

After working through day and night for the last two weeks, I realize how research & methods are connected, logically. How important is conceptual clarity and more importantly, your visualization. Now I know why could not even write a single line back at  home. In two weeks, have discovered so many undefined aspects of my language, still not in a position to say that focused work has  started, everyday hyposthesis is changing, surfing through more RPs (now Chinese stuff got over, lets see what Bangla has to offer!). 


Todays class was very enlightning because what Prof Dayal was trying to clear the basic differences between scientific research and humanities research, the former being more of applied in nature. Humannities research in a sense is also like scientific because in terms of understanding its trying to give a different perspective, a different view. This striked me most because I was at a loss how to connect my theoretical linguistic analysis with the NLP perspective. I am not adopting a standard computational grammar formalism... so how to prove the usefulness or applicability or implementable side of my research. The acquisition of human language is credited to the notion of UG and this innate capacity of human beings also explains how children learn language(s) without any instruction. Because what we have a mental lexicon, inflected with all the principles, structures and what we need is just an exposure to stimulus. The principles of language are already there, just the parameters are to trigered on or off. Thats why, a child exposed to a multiligual enviornment can learn them all without any effort or mistake. Can this assumption be true of computers too??? Can an exhaustive design of lexicon and a finite set of principles and some rules for setting parameters on or off with respect to principle and the given language, bring an analogy??? 

Again questions!!!?? seems the list is endless.........

Thursday, January 28, 2010

Revision of Principle and Parameters (P&P) Model

An adequate theory of language should address two major issues:

  • Despite superficial differences, languages are identical at deep and abstract level. In P&P model this property of human languages is accounted for by postulating the existence of a set of abstract principles common to all languages called Universal Grammar. Obviously this hypothetical principles of UG are open to further testing against data from languages which may either conform to these principles or reveal their language specific characters.

  • Second issue is the converse of the fisrt that though languages are identical at deep structure, they exhibit significant differences at the surface level.The pattern revealed by cross linguistic differences seem to suggest that variation is restricted by a predetermined set of constraints. The task of linguists is to find out these constraints and explain how they relate to the system of principles of UG.
In P&P framework, language variation is accounted for in terms of parameters. A parameter is understood to be a restricted set of options or values associated with a given principle or category. Choice of one options yield a given pattern and choice of a different pattern yields a different pattern. For example, Head Directionality parameter: differences among languages according to their order of the head in relation to its complement.
VO (English) languages select: Head first parameter
OV (German) languages select: Head last parameter

Wednesday, January 27, 2010

My understanding of Minimalist Program (MP)

What is Grammar? A set of rules that explain a language. What does grammar aim for? What are the criteria to be satisfied to devise a theory of grammar? Theoretical Linguistics shows four major concern:
Universality: in the sense that grammar must enable us to devise a descriptively adequate grammar for every natural laguage.
Explanatory Adequacy: in the sense that grammar is able to explain how speakers arrive at a descriptively adequate knowledge of their language.
Restrictability: ihe theory should be constrained so that it can be used only to descibe natural language.
Learnability: theory must provide grammar which are learnable by young children via relatively short period of time.

So moral of the story is that linguistic theory should provide grammar which makes minimum theoretical apparatus, in other words it should be as simple as possible. In fact, MP is motivated to minimize the complex structure and principles of 1980s syntax (Principles & Parameters approach) and the acquisition burden placed on the child and thereby maximizes the leatnability of natural language grammars.

The two levels of representation in MP, Logical Form (LF) and Phonological Form (PF) must satisfy three basic conditions of adequacy----
  • it must be universal in the sense that any actual or potential human language or meaning of an expression is representable within it.
  • it must be an interface in that its elements have an interpretationin terms of sensory motor systems for PF and for LF, elements have an interpretation in terms of other systems of mind/brain involved in thought.
  • it must be uniform, that its interpretation is uniform for all languages so as to capture all and the only properties of the system of languages as such.

Have to find an answer tonight.....

  • How do languages without Determiner marks references of definiteness or indefiniteness?
  • Definiteness, is it an intrinsic lexical property of a word (like English Determiners or Assamese Classifiers) or is it determined by contextual pragmatics??
  • Are Classifiers equivalent to Determiners both syntactically and semantically?
late night findings on 29th Jan 2010:
after going through a number of RPs and literature on Chinese Classifiers and Nominal class, list of questions have now been doubled!! quite surprised about this weird behavior of the quantifying expressions in Assamese, asking myself is it possible in my language? can I say this? well, have made some hypothesis, lets check them with Prof Veneeta Dayal tomorrow... hope its correct...  

Einstein's Puzzle

http://www.stanford.edu/~laurik/fsmbook/examples/Einstein%27sPuzzle.html

Few thoughts in defining the core issue of NLP: Parsing....

With reference (& continuation) to my post dated 5th November, 2009
In Natural Languages, a sentence expresses a proposition, idea or thought, and says something about some real or imaginary world. Thus, extracting the meaning from a sentence is undoubtedly non-trivial. In fact, sentences are not just linear sequences of words. That is why it requires an analysis of each sentence to determine its syntactic structure (which is itself based on a grammar, an abstract formal system of rules and principles) -- a procedure widely known as parsing. However, parsing is not a goal in itself, but rather, an intermediary step for the purpose of further processing, such as assigning an appropriate meaning to a natural language sentence.

Although parsers are proved to be uncontroversially useful in the domain of processing Programming Languages, the issue of parsing in the domain of Natural Language Processing (NLP), has been a cause for tension between the computational and linguistic perspective over a long period of time. In the past, the controversy about parsing was due to the divergence of objectives between natural language application developers who were oriented to developing practical parsers and psycholinguists who were concerned with the psychological process of language comprehension. In recent times, however, developers of natural language applications have questioned the usefulness of parsing in practical NLP systems. This is primarily because there are no grammars available that have complete coverage of freely occurring natural language texts, and there are no parsers that are robust enough to deal with that inadequacy. This limitation is further compounded by the fact that the inherent ambiguity of Natural Languages forces parsers to operate at speeds far from real-time requirements.

It is quite evident that there is a close relationship between a parser and the linguistic representation the parser manipulates. However, in recent times there has been increasing debate on such issues as what should the representation be, how linguistically detailed should the representation be and how one can go about constructing such a representation.

A parser based on a linguistically motivated wide-coverage grammar has many advantages. Depending on how directly a grammar framework encodes linguistic facts, a linguistically motivated grammar developed in that framework could produce output that is quite detailed and directly amenable to further processing. Furthermore, if a grammar for one language is created in detail and the structures of the grammar are organized systematically, then it is conceivable that abstracting away from language specific features could automatically generate grammars for closely related languages.

There are numerous linguistic theories that are embedded in mathematically restricted formal systems. Work in syntactic description along the lines of GB Theory and Minimalism proposed by Chomsky (1981, 1995) and others have always been the most thoroughly detailed and worked-out aspect of linguistic inquiry. A great deal has been borrowed from generative linguistics by NLP researchers. Lexical Functional Grammar (LFG) (Kaplan and Bresnan, 1982), Head-driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994) and Tree Adjoining Grammar (TAG) (Joshi et al., 1975, Joshi, 1985, 1987; among others) are probably the most efficient ones, so far. Each of these formal systems has its own limitations. However, considering the facts mentioned in the preceding paragraph, I intend to adopt Minimalism for developing the computational grammar for Assamese (an Eastern Indo-Aryan language) in the context of Asian language processing. At least I wish to address two theoretical questions:
  • How thorough grammatical information can be incorporated into a computational parsing model?
  • Can a parser attempt to infer from raw text annotated only with Parts of Speech (POS) tags just as a child attempts to do while learning his/her first language?
HOW??????
The Loss of a Language Family: Death of Boa Sr.