Main Page

From Parallel Grammar Wiki
Jump to: navigation, search

ParGramWiki

Welcome to PargramWiki, a wiki for documenting the ParGram project and its process. The wiki is permanently under construction. If you would like to have a user account, please contact Jessica Zipf.

Useful Links

  • ParGram Homepage
  • ParGram Workspace : Here, you can find meeting notes/slides/other material from past ParGram meetings and a search interface. The site was implemented by Anja Leiderer. If you would like access to this site (it is currently password protected), please contact Jessica Zipf.
  • ParGram Starter Grammars: Some of the knowledge accumulated in the ParGram effort over the years has been included as part of the XLE Documentation in terms of a Starter Grammar, A Walk Through and files containing features and templates that grammars have developed (and used in common). Grammar Writers who are just beginning work on a new language will find this repository of information helpful. In particular, common features and conventions developed within ParGram are explained as part of the Starter Grammar.
  • Starter Grammar, Walk Through and some useful tips: http://ling.uni-konstanz.de/pages/xle/doc/
  • Common Features & Common Templates: For files, see bottom of the page.

ParGram Topics

Members of ParGram have been meeting regularly since 1995 and have come together in so-called _Feature Committee_ meetings in which analyses across grammars are compared and discussed. As far as possible, common analyses and naming conventions are agreed upon. Some of the body of cross linguistic grammar engineering knowledge that has been accumulated was documented in the Grammar Writer's Cookbook (http://www.stanford.edu/group/cslipublications/cslipublications/site/1575861704.shtml). The material in this Wiki is an effort to share further knowledge that we have established together over the years with the wider community.

Here are some discussions and information about ParGram topics (in alphabetical order).

Links to ParGram Groups

Here are some links to the Wikis or sites of individual grammar groups. It might be useful to check out languages that are similar to the one you wish to work on (or are working on). If you would like to obtain a particular ParGram grammar, you should contact the groups directly. For example, the Polish grammar is available under the GNU General Public License (version 3).

XLE

XLE consists of cutting-edge algorithms for parsing and generating Lexical Functional Grammars (LFGs) along with a rich graphical user interface for writing and debugging such grammars. It is the basis for the ParGram project, which is developing industrial-strength grammars for English, French, German, Norwegian, Japanese, and Urdu. XLE is written in C and uses Tcl/Tk for the user interface. It currently runs on Solaris Unix, Linux, and Mac OS X.

More information on XLE including its availability can be found on the XLE homepage at: http://ling.uni-konstanz.de/pages/xle/

XLE-Web Interface

The XLE-Web Interface allows access to several ParGram grammars. One very good way of gaining an understanding of how different phenomena are treated within ParGram is to go to this website and parse example sentences online.

The XLE-Web interface along with several ParGram grammars is here:

ParGramBank

ParGramBank is a collection of parallel treebanks currently involving ten languages from six language families. All treebanks included in ParGramBank are constructed using output from individual ParGram grammars. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena.

The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. ParGramBank is a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

ParGramBank can be accessed and downloaded for free via the INESS treebanking infrastructure:


Tense Aspect Mood

The attached document reflects an on-going discussion on what the representation of tense, aspect, mood, etc. should be in the ParGram grammars.

The current common feature space used by grammars is described as part of the overall feature space in the Starter Grammar (http://www2.parc.com/isl/groups/nltt/xle/doc/PargramStarterGrammar/starternotes.html) in the XLE Documentation (http://www2.parc.com/isl/groups/nltt/xle/doc/xle_toc.html).

Negation

There has been a discussion over the years about how to treat negation. The English and German grammars register negation as an ADJUNCT (ADJUNCT-TYPE neg) in the f-strucutre. However, this analysis did not seem to make sense for affixal negation on the verb. Instead, the presence of negation is just registered via a NEG + feature. The ParGram grammars are currently split how they analyze negation between these two options.

At the ParGram meeting in Oxford in 2006, a decision was taken that all grammar should experiment with a possibly complex NEG feature (Negation_committee_report.pdf). The problem here is that you get examples like "I didn't not go." in English and it is not clear how to treat that with just a NEG feature.

Also there is maybe an issue with respect to NPI items that one might want to think about. But perhaps this is best left for semantics.

At the ParGram meeting after the LFG conference in Debrecen in 2013, there was a roundtable discussion on negation in LFG. The overview presentation (negation_prezi_2013_ParGram.pdf) contains extracts from the joint ACL 2013 paper on ParGramBank (http://www.aclweb.org/anthology/P/P13/P13-1054.pdf) and some concomitant email correspondence on individual grammars, as well as screenshots of XLE-analyses of various negation constructions. The major points are as follows:

  • The ADJUNCT/NEG+ choice does not always correlate with the expected language type in the XLE grammars: Polish has a negative adjunct but the XLE grammar uses the NEG+ feature; Indonesian uniformly employs the ADJUNCT-analysis even if it has several distinct negative marker types.
  • Several languages have competing negation strategies (Wolof, Indonesian, French). Thus some level of consistency is an issue crosslinguistically as well as within some of the grammars.
  • Problems for the NEG+ analysis: "I cannot not go" (see also above), scope-interactions
  • Problems for the ADJUNCT analysis: relation between "John didn't see anybody" and "John saw nobody"; or between "John didn't have any time" and "John had no time" ("no" is a quantifier in the English grammar with the feature POL negative)
  • General issue: separate clearly f-structure issues and semantic issues in the analysis

The novel discussion at the ParGram meeting 2015 in Warsaw, which was substantiated by a talk by Tibor Laczko on Hungarian negation, lead to the insight that maybe what one should do is to adopt the differentiated treatment put forward by the Hungarian grammar. The slides by Tibor are attached (laczko_negation_ParGram_Warsaw_140204.pdf).

Testsuites

This page lists some interesting testsuite resources. Thanks go to Emily Bender and Dan Flickinger for naming some of the resources.

TSNLP testsuites

Testsuites put together by the TSNLP project. Very linguistically principled, but not a very large range of languages:

http://www.dfki.de/lt/project.php?id=Project_380&l=en

(This is apparently no longer available for download. It might be worth contacting the people behind the project for a copy of the testsuites created.)

The TSNLP testsuite is available from the ELRA catalogue:

http://islrn.org/resources/717-350-913-018-8/

The Konstanz site has gotten a hold of it via ELRA. Contact Sebastian Sulger for instructions on how to license the TSNLP package.

MRS testsuite (DELPH-IN)

There is also the MRS testsuite, created by DELPH-IN. This started as a resource for English and has been translated to a few languages. Its focus is on illustrating core semantic phenomena:

http://moin.delph-in.net/MatrixMrsTestSuite

This testsuite is also part of the [incr tsdb()] software package (http://www.delph-in.net/itsdb/) for several languages, but there is a more comprising collection online, accessible via the link above.

I have compiled a package of MatrixMRS testsuites for multiple languages here: http://ling.uni-konstanz.de/pages/home/sulger/files/MatrixMRSTestSuite.tar.gz The testsuites have varying formats since the source page presented them in differing formats.

Another semantically-oriented testsuite (to augment the MRS testsuite above)

Recent work on documenting the semantic analyses in the English Resource Grammar has led to another semantically-oriented testsuite, to augment the MRS testsuite. This one is monolingual, though.

http://moin.delph-in.net/ErgSemantics http://svn.emmtee.net/trunk/uio/wesearch/esd.txt

FraCaS test suite

FraCaS test suite from 1996, focused on linguistic phenomena related to logical inference, described in a technical report by Cooper et al. from 1996. Here is the reference and a link to the paper:

Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Josef Van Genabith, Jan Jaspars, Hans Kamp, Manfred Pinkal, David Milward, Massimo Poesio, and Steve Pulman. 1996. Using the Framework. Technical report, FraCaS: A Framework for Computational Semantics. FraCaS deliverable D16.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.7694&rep=rep1&type=pdf

An XML version of this data is available for download from Bill MacCartney at this website:

http://www-nlp.stanford.edu/~wcmac/downloads/

The FraCaS data was also used for a bilingual English-Swedish test suite by Peter Ljunglöf and Magdalena Siverbo, described in this paper:

http://gup.ub.gu.se/records/fulltext/168967/168967.pdf

HP NL Testsuite

Hewlett-Packard Natural Language Testsuite, originally by Dan Flickinger, Marilyn Friedman, Mark Gawron, John Nerbonne, Carl Pollard, Geoffrey Pullum, Ivan Sag, and Tom Wasow.

http://www.ual.es/personal/nperdu/hpsuite.htm

I put a copy of that testsuite up here, in case the above link stops working:

http://ling.uni-konstanz.de/pages/home/sulger/files/hp-nl-testsuite.txt

Testsuites inspired by Linguist Fieldwork

Wayan Arka says the following:

Field linguists typically have a kind of opportunistic data collection techniques in the field but they often use Lingua questionnaires, or make use of the available elicitation materials e.g. created by the MPI (http://fieldmanuals.mpi.nl/), or create their own elicitation materials.

The attached questionnaires created for my NSF-funded Voice project seem to look like test suites that we use in ParGram. As you will see, they have English and Indonesian for each item. We can adapt the questionnaires, if you like.

I put up the testsuites sent by Wayan below.

Auxiliaries

In classic LFG (Bresnan 1982), auxiliaries were treated as subcategorizing for an open complement (VCOMP in those days, XCOMP in today's terms). See Kaplan and Bresnan (1982:205-206, 228) for concrete examples. This basic approach is laid out in some detail in Falk (1984).

In some of the earliest discussions within ParGram, it was realized that while the XCOMP analysis for auxiliaries has some useful consequences in English such as being able to deal with the affix dependencies among auxiliary stacks in English quite easily (cf. Chomsky's notion of Affix Hopping) and has the advantage that coordinations like "John will and can disappear." are no problem, there are also major disadvantages from a crosslingusitic perspective. In languages like German or French (English, French and German were the original ParGram languages), some tense information is encoded via auxiliaries whereas some other information is encoded via inflections on the verb. Butt, Niño and Segond (1996/2004) use the following example sentences in the future perfect to demonstrate the problem:

The driver will have turned the lever. Der Fahrer wird den Hebel gedreht haben. (German) Le conducteur aura tourné le levier. (French)

In the English version, the future part is expressed by "will", the perfect part by "have" in combination with the past participle of "turned". In the German version, the picture is similar, except that there is a different word order. In French, however, the future and the perfect are combined into one auxiliary: "aura".

Under an XCOMP analysis, this means that the English and the German will have two layers of XCOMP embeddings, while the French will have only one. In terms of parallelism across languages (and back then we had machine translation as our main application goal in mind), this is not an acceptable way of proceeding. Butt et al. instead proposed a "flat" structure in which TNS/ASP information across languages is registered within a TNS/ASP feature, but in which auxiliaries (and other functional elements) are not analyzed as embedding event arguments (which is what XCOMPs are).

The particular way this was implemented in 1996 was via m(orphological)-structure (see m-structure). The problem was what to do about morphological wellformedness information like the affix dependencies found in English. These clearly need to be taken care of, however, they are not interesting for any further analysis as they tend to be quite language specific. That is, f-structure is not an appropriate level to represent language-particular morphological information that is only used to ensure morphosyntactic wellformedness. The ParGram grammars experimented with m-structure for a short while (see pages 60-67 in the Grammar Writer's Cookbook), but then abandoned it as too unwieldy (debugging becomes very difficult whenever several different projections from c-structure must be kept track of). Instead, the ParGram grammars now store information needed for language particular morphosyntactic wellformedness checking in a CHECK feature at f-strucutre. All the information bundled there can be removed/ignored for further analysis steps such as semantic analysis, machine translation, etc.

However, within theoretical LFG, m-structure took on a life of its own and current work continues to explore uses or ramifications of adding in this further projection. See the m-structure page.

References

Bresnan, Joan. 1982. _The Mental Representation of Grammatical Relations_. The MIT Press. Butt, Miriam, Tracy Holloway King, Maria-Eugenia Niño and Frédérique Segond. 1999. _A Grammar Writer's Cookbook._ CSLI Publications. Butt, Miriam, Maria-Eugenia Niño and Frédérique Segond. 2004. Multilingual Processing of Auxiliaries in LFG. In L. Sadler and A. Spencer (eds.), _Projecting Morphology_. Stanford, CA: CSLI Publications, 11-22. Reprinted Version of a 1996 Coling proceedings paper. (http://ling.uni-konstanz.de/pages/home/butt/main/papers/konvens96.pdf) Kaplan, Ron and Joan Bresnan. 1982. Grammatical Representation. In J. Bresnan (ed.), _The Mental Representation of Grammatical Relations_. The MIT Press. Falk, Yehuda. 1984. The English Auxiliary System: A Lexical-Functional Analysis. _Language_ 60(3):483-509.

M-structure

M-structure was invented as a result of early discussions within ParGram on the best cross linguistically valid treatment of auxiliaries (see that page for more discussion). Within ParGram the use of m-structure was abandoned fairly quickly (but the analysis was still current at the time the Grammar Writer's Cookbook was written, see pages 60-67), but then abandoned it as too unwieldy (debugging becomes very difficult whenever several different projections from c-structure must be kept track of). Instead, the ParGram grammars now store information needed for language particular morphosyntactic wellformedness checking in a CHECK feature at f-strucutre.

Within theoretical LFG, m-structure took on a life of its own and current work continues to explore uses or ramifications of adding in this further projection, starting with Frank and Zaenen (1998). A workshop was organized on this topic at the LFG00 conference at Berkeley (http://www.stanford.edu/group/cslipublications/cslipublications/LFG/5/lfg00morphwrk.html) and a collection of papers on morphological issues can be found in Sadler and Spencer (2004).

References

Butt, Miriam, Tracy Holloway King, Maria-Eugenia Niño and Frédérique Segond. 1999. _A Grammar Writer's Cookbook._ CSLI Publications. Frank, Anette and Annie Zaenen. 1998. Tense in LFG: Syntax and Morphology. Reprinted in L. Sadler and A. Spencer (eds.), _Projecting Morphology_. CSLI Publications, Sadler, Lousa and Andrew Spencer (eds.). 2004. _Projecting Morphology_. CSLI Publications.

CHECK feature

The CHECK feature is used in ParGram to collect up language particular information that is necessary for morphosyntactic well-formedness checking. For ease of grammar engineering, the feature is encoded at f-structure. Typical CHECK feature candidates are items like the strong/weak adjectival inflection in German or information on which form the auxiliaries have to be in cases of English auxiliary stacking (e.g., _will have been_ vs. _was having been_). Generally, information coded up at f-structure should be drawn from a universally relevant inventory (e.g., subject, tense/aspect, case) as f-structure should (as much as possible) encode a level of analysis that abstracts away from language particular encoding methods. The information collected in the CHECK feature should be irrelevant for machine translation or for semantic analysis, for example.

Questions

Information-seeking wh-questions are dealt with via functional-uncertainty paths. The wh-item is considered to be focused and this is represented explicitly at f-structure. The wh-item is placed into a (long-distance) dependency with the grammatical function (e.g., OBJ, OBL or ADJUNCT) it represents. F-structure examples are provided below from the English grammar for:

(1) Who does David like? (2) Who does David think Mary likes?

Furthermore, the f-structure makes a distinction between the features FOCUS-INT and PRON-INT. FOCUS-INT registers the actual item that is taken to be in focus, whereas PRON-INT register the actual wh-item. In examples (1) and (2) these are always identical. However, a difference emerges in examples involving pied-piping. Some examples are given in

(3) Whose book did you read? (4) Which book did you read? (5) How old is the president?

Non Latin Scripts

XLE supports all types of scripts. The relevant parts of the XLE documentation are the sections on Emacs and non-ASCII character sets (http://ling.uni-konstanz.de/pages/xle/doc/xle.html#SEC3) and Character Encodings (http://ling.uni-konstanz.de/pages/xle/doc/xle.html#SEC23).

Here is an example of how the Georgian grammar is set up.

In the Configuration file of the grammar, the following line should be added:

CHARACTERENCODING utf-8.

This tells XLE to expect utf-8.

If you are using emacs to write your grammars, you can add the following line to the very top of the main grammar file (e.g., georgian.lfg):

   ";;; -*- Encoding: utf-8 -*-"

This tells emacs that the file is in utf-8.

However, emacs tends to be tricky about utf-8, so you might also want to create a .emacs file and put the following information in it:

 (set-language-environment "UTF-8")
 (setq process-coding-system-alist '(("xle" utf-8 . utf-8)
    ("shell" utf-8 . utf-8)
    ("slime" utf-8 . utf-8)))
 (setq default-process-coding-system '(utf-8 . utf-8))

If you already have a .emacs file, then simply add this information.

Another option is to not use emacs and write your test suite and grammar in a more utf-8 friendly editor. In this case, you can access your test sentences via the "parse-testfile" command. To get more information on this command, while in XLE type:

 % help parse-testfile

The version we want takes the name of a testsuite as a parameter and the number of the sentence you want to parse. For example:

 % parse-testfile your-testsuite.lfg 3


Discourse Functions

Several of the grammars use discourse functions such as TOPIC, FOCUS or GIV-TOP as part of the f-structure analysis.

This is generally done on purely syntactic grounds. The German grammar, for example, assigns TOPIC to the clause-initial constituent (this is not quite right and efforts had been on-going on determining better heuristics).

The Polish grammar has recently adopted the idea of an UDF (unbounded dependency function) for representing elements that are clearly playing some discourse functional role, but for which the role cannot be determined on the basis of syntax alone, but must be left to some further component that has more contextual (or other useful) information.

For theoretical background on UDFs see:

  • Asudeh, Ash. 2012. The Logic of Prononimal Resumption. Oxford University Press.

The LFG14 conference saw a special workshop on this topic. The papers presented are not part of the LFG proceedings but are scheduled to appear in a volume of collected papers.

For an argument why discourse functions should in principle not be represented as part of f-structure, see:


Modality

Modals are generally treated as full verbs in the ParGram grammars (in contrast with Auxiliaries). Modals can be either control or raising verbs. Which is which needs to be determined by language internal tests.

There are two major strategies within ParGram with respect to modals to date. They are exemplified by the English and the Norwegian grammars.

In the English grammar, all modals are treated as raising verbs and no distinction is made between epistemic and deontic/root modals. This has the effect of abstracting away from a finer grained syntactic and semantic analysis and allows for the reduction of ambiguity in the grammar.

The Norwegian grammar, in contrast, divides modals up into two major categories. The central modals in Norwegian are the verbs ‘ville’, ‘kunne’, ‘måtte’ and ‘skulle’ (present tense: ‘vil’, ‘kan’, ‘må’, ‘skal’), etymologically related to the English ‘will’, ‘can’, ‘may’ and ‘shall’, respectively. ‘måtte’, however, has the meaning ‘must’ in Norwegian (unlike in Danish). The Norwegian grammar distinguishes between two kinds of readings of these verbs, labelled ‘epistemic’ and ‘root’. This is reflected in the predicate names (‘epist-ville‘ - ‘root-ville’ etc.), and in the value of the feature MODAL-TYPE, which can be either ‘epistemic’ or ‘root’. Some examples:

Han vil lese boken
‘He will read the book’ (epistemic (in this case temporal))/
‘He wants to read the book’ (root)
Han kan sv√∏mme
‘He may be swimming’ (epistemic)/
‘He can (is able to) swim’ (root)


Han må reise mye
‘He must be travelling a lot’ (epistemic)/
‘He is obliged to travel a lot’ (root)


While there is a scale of modal meanings involved here, the grammar only makes a binary distinction, which means that the term ‘root’ has an extended sense. The basic criterion of choice is the status of the subject as an argument, or not, of the modal verb (i.e. ‘equi’ vs. ‘raising’ readings). Thus the term ‘root’ will also comprise deontic meanings.

The reason for making this distinction in the grammar is that the two kinds of reading can be disambiguated by the syntax. Unlike the related English modals, the Norwegian modals are fully-fledged verbs with infinitive and participle forms. Hence they can enter into sequences of auxiliaries, and the order in which the auxiliaries occur have semantic consequences. The detailed rules are complex and vary to some extent between the modals, but a fairly robust generalization is that epistemic readings are strongly preferred if the modal occurs before the perfect auxiliary ‘ha’ (=‘have’), while root readings (in our wide sense) are strongly preferred if the modal occurs after the perfect auxiliary. Examples:

Han vil ha lest boken
‘He will have read the book’ (epistemic/temporal)


Han har villet lese boken
‘He has wanted to read the book’ (root)


Han kan ha forsvunnet
‘He may have disappeared’ (epistemic)


Han har kunnet forsvinne
‘He has been able to disappear’ (root)


Han må ha reist mye
‘He must have travelled a lot’ (epistemic)


Han har måttet reise mye
‘He has been obliged to travel a lot’ (root)


The feature MODAL-TYPE with its two values ‘epistemic’ and ‘root’ are referred to by other auxiliaries – e.g. the auxiliary ‘ha’ = ‘have’ only take modal complements with the value ‘root’.

Complex Predicates

Complex predicates are instances in which two or more predicational elements combine to form a single monoclausal predication. The definition proposed in Butt (1995) is as follows.

> Complex predicates are formed when two or more predicational elements enter > into a relationship of co-predication. Each predicational element adds arguments > to a monoclausal predication. Unlike what happens with control/raising, there are > no embedded arguments and no embedded predicates at the level of syntax.

Classic examples in the LFG literature come from Romance and Urdu/Hindi (Alsina 1993,1996;Butt 1995; Mohanan 1994). The basic idea is illustrated here with respect to a type of N-V and a type of V-V complex predicate from Urdu/Hindi. In the N-V example below, the complex predicate consists of the predicate `do', which licenses two arguments: an agent and a predicate that represents the thing done, namely `memory'. The `memory' in turn licenses the NP `story'. This NP functions as the direct object of the clause yet is not licensed by the finite verb `do'. The standard analysis is that `memory' and `do' combine at the level of a(rgument)-structure and that their arguments are combined into one monoclausal predication and linked to GFs (grammatical functions) as being in a single argument domain.

nadya=ne   kahani  yad   k-i
Nadya.F.Sg=Erg  story.F.Sg.Nom memory    do-Perf.F.Sg
`Nadya remembered a/the story.' (lit.: `Nadya did memory of the story.')

The same analysis holds for the V-V complex predicate below, where `give' licenses three arguments: the agent (`Nadya'), the beneficiary/goal (`Yassin') and the predicate (event/proposition) that is being allowed to be done (`cut'). This predicate in turn licenses the theme `plant'. Again, the arguments separately contributed by `give' and `cut' are combined at a-structure and linked to GFs as a monoclausal predicate so that you have one composed predicate `give-cut', a SUBJ `Nadya', an OBJ `plant' and an OBJ_theta `Yassin'.

nadya=ne yAssin=ko pAoda kaT-ne di-ya th-a 
Nadya.F.Sg=Erg Yassin.M.Sg=Dat plant.M.Sg.Nom cut-Inf.Obl give-Perf.M.Sg be.Past-M.Sg
‘Nadya had let Yassin cut the plant.’

Determining whether a given structure is monoclausal or not must be done via tests for monoclausality. These tend to be language specific, but can include tests involving control, agreement, anaphora resolution or the distribution of negative polarity items (see Butt 1995,2010).

For example, the sentence below is almost identical to the sentence above, but Butt (1995) shows the sentence below is structurally biclausal, involving an XCOMP `to cut the plant' whose SUBJ is controlled by the matrix OBJ_theta `Yassin'.

nadya=ne yAssin=ko [pAoda kaT-ne]=ko kah-a th-a
Nadya.F.Sg=Erg Yassin.M.Sg=Dat plant.M.Sg.Nom cut-Inf.Obl=Acc say-Perf.M.Sg be.Past-M.Sg
‘Nadya had told Yassin to cut the plant.’ 

A-structure and LFG's _Linking_ or _Mapping Theory_ is not implemented within XLE. In the absence of a-structure and Linking Theory, the question faced within ParGram was as to how one could treat complex predicates in as linguistically satisfying a manner as possible. The solution arrived at ultimately involves the use of the _Restriction Operator_ (Kaplan and Wedekind 1993). The notation for the Restriction Operator is "/" and it allows for the definition of a given f-structure that corresponded to another f-structure but without some of the information contained in the original f-structure. For example, if one wanted to restrict out the CASE feature from an f-structure, then one can say ^/CASE. The correspoding restricted f-structure will be identical to the original one, except for the CASE feature.

As originally conceived, the Restriction Operator applied at the lexical level. However, as complex predicate formation of the type described above applies in the syntax, the Restriction Operator was extended to apply in the syntax. Details on how to use the Restriction Operator are provided in the XLE documentation (http://ling.uni-konstanz.de/pages/xle/doc/notations.html#N4.1.11). Butt et al. (2003) provides a very detailed explanation and sample rules for an application of the Restriction Operator to complex predicates in Urdu. Butt and King (2006) show how to extend this basic treatment to morphological causatives and discuss interactions between passives and syntactically formed complex predicates. Readers are referred to those two papers plus the XLE documentation if they want to being using the Restriction Operator to model complex predicates.

The intuition behind the application of the Restriction Operator to model complex predication is as follows. Although XLE does not include a model of a-structure and linking, the system can make reference directly to arguments on the argument list of a PRED via variables. The arguments of a PRED are ordered and are numbered from left to right. In a subcategorization frame like `verb<(^SUBJ)(^OBJ)>, the SUBJ is Arg1 and the OBJ is Arg2. These argument positions can be referred to as: : ARG1, ARG2, etc. and can thus then also be manipulated via f-structure annotation. Now, if one has a light verb like `do' which specifies that it has a SUBJ argument, but where the nature of the second argument (%ARG2) is unclear because it needs to be determined dynamically as part of the syntactic analysis, then the lexical entry would look as follows.

do LV * (^ PRED)='do<(^ SUBJ) %ARG2>'
"%ARG2 will be filled in by a predicate".


The "filling in" of the second argument can be done via f-structure annotations on a c-structure as follows. The effect is that the Noun (N) "loses" its own PRED, but that this PRED becomes substituted in for the ARG2 of the light verb. The noun thus becomes a part of the verb and all arguments and other information it might have also had specified is now part of the verb's f-structure. For example, if the N is specified to have an OBJ, then that OBJ will now be the OBJ of the joint predication.

 VPcp ->  N:    !\PRED = ^\PRED
                        (! PRED) = (^ PRED ARG2)
                 LV "light verb". 


The above is a very quick characterization. In practice, working with a grammar that contains the Restriction Operator is difficult in comparison to the other grammatical devices that XLE offers. Debugging and grammar maintenance becomes much more complicated because the Restriction Operator does not "lose" information. Rather, it creates more f-structures and any constraining equations or other checks for well-formedness can now in principle apply to both the "original" version of the f-structure as well as the restricted out version. To solve the problems this created, the Urdu grammar, for example, added a feature RESTRICTED + to the grammar to "mark" the f-structures that had been restricted out. Wellformedness constraints are then engineered to only apply to f-structures that are not restricted out.

Generation with Restriction also poses major problems. However, these have recently been addressed in Kaplan and Wedekind (2012).

Finally, Butt and King (2006) note that there are problems in the interaction between passives and complex predicates with the use of the Restriction Operator. That paper did not fully understand the problem, but this was later solved by Özlem Cetinoglu with respect to the Turkish grammar. The problem is essentially one of ordering and is due to the fact that passives were treated via Lexical Rules (http://ling.uni-konstanz.de/pages/xle/doc/notations.html#N5) as per the classic idea in LFG. In the Turkish and Urdu grammars the Passive Lexical Rule was invoked as part of the lexical entry of a stem. For example, something like `break' would specify via the application of the lexical rule that it could be either active and have a SUBJ and an OBJ, or be passive and have a SUBJ and a NULL or ADJUNCT. This information was passed on through the morphology which takes care of inflection and derivation and then enter the syntax as part of the fully inflected verb. However, when you add causative morphology to the mix, you have the problem that the causative morpheme serves to augment the basic a-structure of a stem (by adding a causer, usually) and that this augmentation needs to happen BEFORE any application of passivization. That is, the causative information would also apply to the subcategorization frame that said: `break<SUBJ>' and thus produce unexpected solutions.

The current solution has been to move passivization out of the lexicon listing verb stems and have it apply within the sublexical rules that govern word formation. The ordering problem is thus solved.

Link Files from ParGram meetings

References

  • Ahmed, Tafseer and Miriam Butt. 2011. Discovering Semantic Classes for Urdu N-V Complex Predicates. In _Proceedings of the International Conference on Computational Semantics (IWCS 2011)_, Oxford.
  • Alsina, Alex. 1993. _Predicate Composition: A Theory of Syntactic Function Alternations_. PhD thesis, Stanford University.
  • Alex Alsina. 1996. _The Role of Argument Structure in Grammar_. CSLI Publications.
  • Butt, Miriam. 1995. _The Structure of Complex Predicates in Urdu_. CSLI Publications.
  • Butt, Miriam. 1998. Constraining argument merger through aspect. In: Hinrichs E, Kathol A, Nakazawa T (eds) _Complex Predicates in Nonderivational Syntax_, Academic Press, pp 73‚Äì113.
  • Butt, Miriam and Tracy King. 2006. Restriction for Morphological Valency Alternations: The Urdu Causative. In M. Butt, M. Dalrymple and T.H. King (eds.) _Intelligent Linguistic Architectures: Variations on_ _Themes by Ronald M. Kaplan._ CSLI Publications, 235-258. http://ling.uni-konstanz.de/pages/home/butt/main/papers/ron-fest.pdf
  • Butt, Miriam. 2013. Control vs. Complex Predication. _Natural Language and Linguistic Theory_ 32(1):155--190. DOI 10.1007/s11049-013-9217-5
  • Butt, Miriam. 2010. The Light Verb Jungle: Still Hacking Away, In M. Amberber, M. Harvey and B. Baker (eds.) _Complex Predicates in Cross-Linguistic Perspective_, 48‚Äì78. Cambridge University Press.
  • Butt, Miriam, Tracy Holloway King, and John T. Maxwell III. 2003. Complex Predicates via Restriction. _Proceedings of LFG03._ CSLI Publications.
  • Kaplan, Ron, and J√ºrgen Wedekind. 1993. Restriction and Correspondence-based Translation. _Proceedings of the 6th European Conference of the Association of Computational Linguistics_, pp. 193‚Äì202.
  • Kaplan, Ron, and J√ºrgen Wedekind. 2012. LFG Generation by Grammar Specialization. _Computational Linguistics_ 38(4):1-49. DOI: 10.1162/COLI_a_00113
  • Mohanan, Tara. 1994. Argument Structure in Hindi. CSLI Publications.


Nouns

This section discusses the main ingredients of the ParGram grammars concerning nouns; here, the focus is on the different syntactic and semantic features (and their possible values) attributed to different noun types as part of the ParGram common feature declaration.

NTYPE

NTYPE registers the syntactic and semantic type of the noun (Butt et al., 1999b). All nouns in the ParGram grammars are encoded with an NTYPE feature. NTYPE is a complex feature that has as its value two other features, NSYN and NSEM, as shown in (1), taken from the ParGram common feature declaration file (common.features.lfg). The notation -> <<[...] specifies that the feature’s value is itself a feature.

(1) NTYPE: -> << [ NSEM NSYN ]. 

The division between NSYN and NSEM is intuitive in that NSYN describes the nominal properties that pertain to the noun’s syntax, and NSEM describes the noun’s semantic properties. These two features are in turn described in the following subsections.

NSYN

The NSYN feature specifies the syntactic type of the noun. The declaration of NSYN is given in (2). The distinction made by NSYN is basic in that it only distinguishes between common nouns, proper nouns, and pronouns. This broad distinction may be relevant for constraining some analyses in the grammars, but it is important to note that different types of common nouns may exhibit patterns quite different from one another (e.g., mass nouns vs. count nouns, etc.). This is also true of pronouns (e.g., reflexive pronouns vs. personal pronouns vs. possessive pronouns, etc.) as well as proper nouns (e.g., locations vs. person names vs. organization, etc.). Examples for English nouns which are annotated with NSYN common by the English grammar are given in (3a); different kinds of pronouns, showing up with NSYN pronoun, are in (3b); and nouns with NSYN proper are shown in (3c).

(2) NSYN: -> $ {common pronoun proper}.
(3) 
a.NTYPE NSYN common: all common nouns: table, coffee, love, destruction, mother
b.NTYPE NSYN pronoun: all pronouns: she, him, herself
c.NTYPE NSYN proper: all proper nouns: San Francisco, CSLI Publications, John


NSEM

The NSYN feature offers only a broad distinction in terms of syntactic type. ParGram grammar writers can further specify nouns using the NSEM feature. NSEM subsumes semantic features of nouns; these are usually features that are useful in constraining syntactic constructions, but they may just also pass information on to applications. (4) shows the feature declaration of NSEM. Again, as shown by the notation, the possible values for NSEM are themselves features: COMMON, NUMBER-TYPE, PROPER as well as TIME.

(4) NSEM: -> << [ COMMON NUMBER-TYPE PROPER TIME ].

NSEM COMMON

The NSEM COMMON feature distinguishes different kinds of common (non-proper) nouns. The feature is defined with the range of atomic values given in (5). Some examples of the different types of common nouns are given in (6).

(5) COMMON: -> $ { count gerund mass measure partitive }. 
(6) 
a.NTYPE NSEM COMMON count: all count nouns: table, war, mother
b.NTYPE NSEM COMMON gerund: gerund nouns derived from verbs: (her) doing (the exercise)
c.NTYPE NSEM COMMON mass: mass nouns: coffee, water 
d.NTYPE NSEM COMMON measure: measure nouns: (two) kilos, (ten) miles
e.NTYPE NSEM COMMON partitive: partitive nouns: all (of the animals), half (of the people)

NSEM PROPER

The NSEM PROPER feature distinguishes different kinds of proper nouns. These are subdivided because these details tend to be important for applications.

(7) PROPER: -> << [ PROPER-TYPE LOCATION-TYPE NAME-TYPE ].

NSEM PROPER PROPER-TYPE

The specific subtype of proper noun; the range of values is shown in (8).

(8) PROPER-TYPE: -> $ { addr_form location name organization title }.

NSEM PROPER LOCATION-TYPE

The subtype of location; currently, only city and country are assumed here in the common features file, but more may be added in the grammar as needed.

(9) LOCATION-TYPE: -> $ { city country }.

NSEM PROPER NAME-TYPE

Subtype of name; this currently has values for first name and last (family) name, but more may be added in the grammar as needed.

(10) NAME-TYPE: -> $ {first_name last_name }.

NSEM TIME

Subtype of time expression; some of these are proper nouns and some common. This division still needs work since many time expressions are not covered here; in addition, some phrases only get the TIME feature in time expressions (e.g. numbers in digital representations of time) while others get them whenever they occur (e.g. months of the year).

(11) TIME: -> $ { date day hour minute month season second week year}.


Predicatives

Predicatives have been the subject of many discussions at ParGram meetings over the years. This is partly due to the fact that cross-linguistically, there is quite some variation in these structures, and partly due to the fact that in LFG theory, there is disagreement about how the structures themselves are to be analyzed. Below are some examples of predicatives in English:

  • Sam is the teacher.
  • Sam is old.
  • Sam is in the garden.
  • Sam seems happy.

The discussion in LFG theory is quite complex and will not be repeated here in detail. Overview articles are Dalrymple et al. (2004), Attia (2008) as well as Laczkó (2012). In ParGram, there are three different analyses for such structures:

  • The Single-Tier analysis: the predicate is a sentential head
  • The Double-Tier Open Complement analysis: the predicate is an XCOMP
  • The Double-Tier Closed Complement analysis: the predicate is a PREDLINK

Under the single-tier analysis, it is assumed that the predicative category itself selects for a SUBJ. This is a fitting analysis for languages where the copula can go missing under certain circumstances. The example below is from Japanese.

hon  wa  akai.
book TOP red 
'The book is red.'

In the double-tier open complement analysis, the predicative is an XCOMP (an open complement function), and the copula verb subcategorizes for a SUBJ and an XCOMP. There is functional control between the main clause SUBJ and the XCOMP's SUBJ. This is an analysis often applied in cases where there is agreement between SUBJ and the predicate: agreement comes in for free by functional control. Example from French. Analysis implemented in e.g. the English and French grammars.

Il       est    petit.
he.M.S   be.3   small.M.S 
'He is small.'
Elle       est    petite.
she.F.S    be.3   small.F.S
'She is small.'

In the double-tier closed complement analysis, the predicative is a PREDLINK (a closed complement function), and the copula verb subcategorizes for a SUBJ and a PREDLINK. Agreement can be done by defining inside-out constraints. The Urdu and Hungarian grammar, for example, use this analysis. Example is from Urdu.

DabbE     kAr      mEN   hEN
box.M.Pl  car.F.Sg in    be.3.Pl
'The boxes are in the car.'

In short, the analysis you choose for your grammar/language will depend on several synactic facts: the ability to drop the copula, the agreement facts between the subject and the predicative, and further syntactic observations. For example, the XCOMP analysis will predict that the predicate cannot provide its own SUBJ (since that GF comes in from the matrix clause), so that e.g. English 'The good thing is that he did not throw the snowball.' will not be felicitous under an XCOMP analysis of copula. Make sure you check on all of these criteria and choose one analysis, preferably one throughout your grammar.

References

  • Attia, Mohammed. 2008. A Unified Analysis of Copula Constructions in LFG. In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG08 Conference, pages 89‚Äì108, CSLI Publications.
  • Dalrymple, Mary, Dyvik, Helge and King, Tracy Holloway. 2004. Copular Complements: Closed or Open? In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG04 Conference, pages 188‚Äì198, CSLI Publications.
  • Laczk√≥, Tibor. 2012. On the (Un)Bearable Lightness of Being an LFG Style Copula in Hungarian. In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG12 Conference, pages 341‚Äì361.