SpringMeeting2011Abstracts

Total Parser Combinators Using Mixed Induction and Coinduction

Nils Anders Danielsson

Abstract:

Parser combinators provide an elegant method for implementing parsers. However, most parser combinator libraries fail to guarantee that parsing will terminate. I will talk about a library which provides such a guarantee. The library's interface is similar to that of many other parser combinator libraries, with three main differences: one is that the interface clearly specifies which parts of the constructed parsers may be infinite, and which parts have to be finite, using dependent types and a combination of induction and coinduction; another is that the interface allows many forms of left recursion; and the last is that the parser type is unusually informative.

The GF Grammar Development Tools

Thomas Hallgren

Abstract:

Translation systems in MOLTO are based on multilingual grammars written in GF (Grammatical Framework). The traditional environment available to GF grammar developers consist of the GF command shell, a generic text editor and the GF documentation. This is a simple and effective environment for the experienced grammar developer. To better support less experienced grammar developers, one of the goals of the MOLTO project is to create an IDE (Integrated Development Environment) for GF grammar development, and an appropriate GF compiler API to support the IDE. In this talk we will first give a summary of the main functionalities of the traditional GF tools: grammar compilation, error detection, testing, and visualization. Then we will show a prototype of a web-based grammar development environment and briefly discuss the wide range of options available when building a GF IDE, and how this might affect what you require from the GF compiler API. The web-based environment will enable the creation of web-based translation systems without installation of any software. It gives the quickest conceivable access to GF to novice and occasional users. But the API is general enough even to serve power users, who run the tools on their own favourite desktop environment but want to profit from IDE facilities such as project management and library browsing.

GF into the Wild

Krasimir Angelov

Abstract:

So far GF was used only for parsing small controlled languages but the improvements in the parsing performance in the last few years made it possible to dream about parsing open domain unrestricted text. In the resource libraries, we already have wide coverage grammars for many languages but having a grammar is only part of the problem. Even if we improve more and more our grammars it will be always possible to find syntactic constructions which are not covered by the grammar. Another problem is that when we add more syntactic constructions in the grammar, this usually makes it more ambiguous. We need a parser that is robust and is able to do statistical ranking when there are ambiguities in the grammar. I did some preliminary expreriments in robust parsing but the general conclusion is that we need a good treebank which we can use for statistical training. Since we don't want to build our own treebanks an attractive alternative is to try to convert some existing one. In this talk I will present the current state of the GF port of Penn Treebank. All parse trees in the treebank were converted to abstract syntax trees for the English Resource Grammar. When there are unknown syntactic constructions then we just leave placeholders in the abstract tree. Currently we have matched 69% of the constructions with the grammar. More is possible but takes time.

Competitive Group Testing with Minimum Adaptivity

Azam Sheikh Muhammad

Abstract:

Suppose that given is a collection of $n$ elements where $d$ of them are \emph{defective}. We can query an arbitrarily chosen subset of elements which returns Yes if the subset contains at least one defective and No if the subset is free of defectives. The problem of group testing is to identify the defectives with a minimum number of such queries. By the information-theoretic lower bound at least $\log_2 \binom {n}{d} \approx d\log_2 (\frac{n}{d}) \approx d\log_2 n$ queries are needed. Using adaptive group testing, i.e., asking one query at a time, the lower bound can be easily achieved. However, strategies are preferred that work in a fixed small number of stages, where queries in a stage are asked in parallel. A group testing strategy is called \emph{competitive} if it works for completely unknown $d$ and requires only $O(d\log_2 n)$ queries. Usually competitive group testing is based on sequential queries. We have shown that actually competitive group testing with expected $O(d\log_2 n)$ queries is possible in only $2$ or $3$ stages. Then we have focused on minimizing the hidden constant factor in the query number and proposed a systematic approach for this purpose.