Main Page

From Parallel Grammar Wiki
Revision as of 10:49, 12 January 2023 by Jessica.2.zipf (talk | contribs)
Jump to navigation Jump to search

ParGramWiki

Welcome to PargramWiki, a wiki for documenting the ParGram project and its process. The wiki is permanently under construction. If you would like to have a user account, please contact Jessica Zipf.

Useful Links

  • ParGram Homepage
  • ParGram Workspace : Here, you can find meeting notes/slides/other material from past ParGram meetings and a search interface. The site was implemented by Anja Leiderer. If you would like access to this site (it is currently password protected), please contact Jessica Zipf.
  • ParGram Starter Grammars: Some of the knowledge accumulated in the ParGram effort over the years has been included as part of the XLE Documentation in terms of a Starter Grammar, A Walk Through and files containing features and templates that grammars have developed (and used in common). Grammar Writers who are just beginning work on a new language will find this repository of information helpful. In particular, common features and conventions developed within ParGram are explained as part of the Starter Grammar.
  • Starter Grammar, Walk Through and some useful tips: http://ling.uni-konstanz.de/pages/xle/doc/
  • Common Features & Common Templates: For files, see bottom of the page.

ParGram Topics

Members of ParGram have been meeting regularly since 1995 and have come together in so-called _Feature Committee_ meetings in which analyses across grammars are compared and discussed. As far as possible, common analyses and naming conventions are agreed upon. Some of the body of cross linguistic grammar engineering knowledge that has been accumulated was documented in the Grammar Writer's Cookbook (http://www.stanford.edu/group/cslipublications/cslipublications/site/1575861704.shtml). The material in this Wiki is an effort to share further knowledge that we have established together over the years with the wider community.

Here are some discussions and information about ParGram topics (in alphabetical order).

Links to ParGram Groups

Here are some links to the Wikis or sites of individual grammar groups. It might be useful to check out languages that are similar to the one you wish to work on (or are working on). If you would like to obtain a particular ParGram grammar, you should contact the groups directly. For example, the Polish grammar is available under the GNU General Public License (version 3).

XLE

XLE consists of cutting-edge algorithms for parsing and generating Lexical Functional Grammars (LFGs) along with a rich graphical user interface for writing and debugging such grammars. It is the basis for the ParGram project, which is developing industrial-strength grammars for English, French, German, Norwegian, Japanese, and Urdu. XLE is written in C and uses Tcl/Tk for the user interface. It currently runs on Solaris Unix, Linux, and Mac OS X.

More information on XLE including its availability can be found on the XLE homepage at: http://ling.uni-konstanz.de/pages/xle/

XLE-Web Interface

The XLE-Web Interface allows access to several ParGram grammars. One very good way of gaining an understanding of how different phenomena are treated within ParGram is to go to this website and parse example sentences online.

The XLE-Web interface along with several ParGram grammars is here:

ParGramBank

ParGramBank is a collection of parallel treebanks currently involving ten languages from six language families. All treebanks included in ParGramBank are constructed using output from individual ParGram grammars. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena.

The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. ParGramBank is a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

ParGramBank can be accessed and downloaded for free via the INESS treebanking infrastructure:


Testsuites

This page lists some interesting testsuite resources. Thanks go to Emily Bender and Dan Flickinger for naming some of the resources.

TSNLP testsuites

Testsuites put together by the TSNLP project. Very linguistically principled, but not a very large range of languages:

http://www.dfki.de/lt/project.php?id=Project_380&l=en

(This is apparently no longer available for download. It might be worth contacting the people behind the project for a copy of the testsuites created.)

The TSNLP testsuite is available from the ELRA catalogue:

http://islrn.org/resources/717-350-913-018-8/

The Konstanz site has gotten a hold of it via ELRA. Contact Sebastian Sulger for instructions on how to license the TSNLP package.

MRS testsuite (DELPH-IN)

There is also the MRS testsuite, created by DELPH-IN. This started as a resource for English and has been translated to a few languages. Its focus is on illustrating core semantic phenomena:

http://moin.delph-in.net/MatrixMrsTestSuite

This testsuite is also part of the [incr tsdb()] software package (http://www.delph-in.net/itsdb/) for several languages, but there is a more comprising collection online, accessible via the link above.

I have compiled a package of MatrixMRS testsuites for multiple languages here: http://ling.uni-konstanz.de/pages/home/sulger/files/MatrixMRSTestSuite.tar.gz The testsuites have varying formats since the source page presented them in differing formats.

Another semantically-oriented testsuite (to augment the MRS testsuite above)

Recent work on documenting the semantic analyses in the English Resource Grammar has led to another semantically-oriented testsuite, to augment the MRS testsuite. This one is monolingual, though.

http://moin.delph-in.net/ErgSemantics http://svn.emmtee.net/trunk/uio/wesearch/esd.txt

FraCaS test suite

FraCaS test suite from 1996, focused on linguistic phenomena related to logical inference, described in a technical report by Cooper et al. from 1996. Here is the reference and a link to the paper:

Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Josef Van Genabith, Jan Jaspars, Hans Kamp, Manfred Pinkal, David Milward, Massimo Poesio, and Steve Pulman. 1996. Using the Framework. Technical report, FraCaS: A Framework for Computational Semantics. FraCaS deliverable D16.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.7694&rep=rep1&type=pdf

An XML version of this data is available for download from Bill MacCartney at this website:

http://www-nlp.stanford.edu/~wcmac/downloads/

The FraCaS data was also used for a bilingual English-Swedish test suite by Peter Ljunglöf and Magdalena Siverbo, described in this paper:

http://gup.ub.gu.se/records/fulltext/168967/168967.pdf

HP NL Testsuite

Hewlett-Packard Natural Language Testsuite, originally by Dan Flickinger, Marilyn Friedman, Mark Gawron, John Nerbonne, Carl Pollard, Geoffrey Pullum, Ivan Sag, and Tom Wasow.

http://www.ual.es/personal/nperdu/hpsuite.htm

I put a copy of that testsuite up here, in case the above link stops working:

http://ling.uni-konstanz.de/pages/home/sulger/files/hp-nl-testsuite.txt

Testsuites inspired by Linguist Fieldwork

Wayan Arka says the following:

Field linguists typically have a kind of opportunistic data collection techniques in the field but they often use Lingua questionnaires, or make use of the available elicitation materials e.g. created by the MPI (http://fieldmanuals.mpi.nl/), or create their own elicitation materials.

The attached questionnaires created for my NSF-funded Voice project seem to look like test suites that we use in ParGram. As you will see, they have English and Indonesian for each item. We can adapt the questionnaires, if you like.

I put up the testsuites sent by Wayan below.

Questions

Information-seeking wh-questions are dealt with via functional-uncertainty paths. The wh-item is considered to be focused and this is represented explicitly at f-structure. The wh-item is placed into a (long-distance) dependency with the grammatical function (e.g., OBJ, OBL or ADJUNCT) it represents. F-structure examples are provided below from the English grammar for:

(1) Who does David like? (2) Who does David think Mary likes?

Furthermore, the f-structure makes a distinction between the features FOCUS-INT and PRON-INT. FOCUS-INT registers the actual item that is taken to be in focus, whereas PRON-INT register the actual wh-item. In examples (1) and (2) these are always identical. However, a difference emerges in examples involving pied-piping. Some examples are given in

(3) Whose book did you read? (4) Which book did you read? (5) How old is the president?

Non Latin Scripts

XLE supports all types of scripts. The relevant parts of the XLE documentation are the sections on Emacs and non-ASCII character sets (http://ling.uni-konstanz.de/pages/xle/doc/xle.html#SEC3) and Character Encodings (http://ling.uni-konstanz.de/pages/xle/doc/xle.html#SEC23).

Here is an example of how the Georgian grammar is set up.

In the Configuration file of the grammar, the following line should be added:

CHARACTERENCODING utf-8.

This tells XLE to expect utf-8.

If you are using emacs to write your grammars, you can add the following line to the very top of the main grammar file (e.g., georgian.lfg):

   ";;; -*- Encoding: utf-8 -*-"

This tells emacs that the file is in utf-8.

However, emacs tends to be tricky about utf-8, so you might also want to create a .emacs file and put the following information in it:

 (set-language-environment "UTF-8")
 (setq process-coding-system-alist '(("xle" utf-8 . utf-8)
    ("shell" utf-8 . utf-8)
    ("slime" utf-8 . utf-8)))
 (setq default-process-coding-system '(utf-8 . utf-8))

If you already have a .emacs file, then simply add this information.

Another option is to not use emacs and write your test suite and grammar in a more utf-8 friendly editor. In this case, you can access your test sentences via the "parse-testfile" command. To get more information on this command, while in XLE type:

 % help parse-testfile

The version we want takes the name of a testsuite as a parameter and the number of the sentence you want to parse. For example:

 % parse-testfile your-testsuite.lfg 3

Predicatives

Predicatives have been the subject of many discussions at ParGram meetings over the years. This is partly due to the fact that cross-linguistically, there is quite some variation in these structures, and partly due to the fact that in LFG theory, there is disagreement about how the structures themselves are to be analyzed. Below are some examples of predicatives in English:

  • Sam is the teacher.
  • Sam is old.
  • Sam is in the garden.
  • Sam seems happy.

The discussion in LFG theory is quite complex and will not be repeated here in detail. Overview articles are Dalrymple et al. (2004), Attia (2008) as well as Laczkó (2012). In ParGram, there are three different analyses for such structures:

  • The Single-Tier analysis: the predicate is a sentential head
  • The Double-Tier Open Complement analysis: the predicate is an XCOMP
  • The Double-Tier Closed Complement analysis: the predicate is a PREDLINK

Under the single-tier analysis, it is assumed that the predicative category itself selects for a SUBJ. This is a fitting analysis for languages where the copula can go missing under certain circumstances. The example below is from Japanese.

hon  wa  akai.
book TOP red 
'The book is red.'

In the double-tier open complement analysis, the predicative is an XCOMP (an open complement function), and the copula verb subcategorizes for a SUBJ and an XCOMP. There is functional control between the main clause SUBJ and the XCOMP's SUBJ. This is an analysis often applied in cases where there is agreement between SUBJ and the predicate: agreement comes in for free by functional control. Example from French. Analysis implemented in e.g. the English and French grammars.

Il       est    petit.
he.M.S   be.3   small.M.S 
'He is small.'
Elle       est    petite.
she.F.S    be.3   small.F.S
'She is small.'

In the double-tier closed complement analysis, the predicative is a PREDLINK (a closed complement function), and the copula verb subcategorizes for a SUBJ and a PREDLINK. Agreement can be done by defining inside-out constraints. The Urdu and Hungarian grammar, for example, use this analysis. Example is from Urdu.

DabbE     kAr      mEN   hEN
box.M.Pl  car.F.Sg in    be.3.Pl
'The boxes are in the car.'

In short, the analysis you choose for your grammar/language will depend on several synactic facts: the ability to drop the copula, the agreement facts between the subject and the predicative, and further syntactic observations. For example, the XCOMP analysis will predict that the predicate cannot provide its own SUBJ (since that GF comes in from the matrix clause), so that e.g. English 'The good thing is that he did not throw the snowball.' will not be felicitous under an XCOMP analysis of copula. Make sure you check on all of these criteria and choose one analysis, preferably one throughout your grammar.

References

  • Attia, Mohammed. 2008. A Unified Analysis of Copula Constructions in LFG. In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG08 Conference, pages 89‚Äì108, CSLI Publications.
  • Dalrymple, Mary, Dyvik, Helge and King, Tracy Holloway. 2004. Copular Complements: Closed or Open? In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG04 Conference, pages 188‚Äì198, CSLI Publications.
  • Laczk√≥, Tibor. 2012. On the (Un)Bearable Lightness of Being an LFG Style Copula in Hungarian. In Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG12 Conference, pages 341‚Äì361.