Linguistic typology studies what is universal in the world's approximately 6,000 languages, as well as their potential for diversity. A big challenge is the issue of data comparability. Canonical Typology is a new method that solves this problem by treating linguistic features as multidimensional. We will enhance a dataset for grammatical agreement in 15 languages, adding new information for dimensions of variation, and analyze the dataset to determine how much of the possible space for variation is actually used.
A long-term aim of linguistic typology is a definitive list of features constituting the universal inventory from which human languages draw, akin to the chemist's table of elements, or the physicist's account of particles (Corbett 2012). There are two fundamental problems in achieving this goal: (1) the difficulty of data comparability for a wide range of languages; (2) the difficulty of determining the limits on variation. Data comparability is an issue even for an apparently basic notion such as 'past tense' (Dahl & Velupillai 2011). It has come increasingly to the fore in recent years. This problem can be resolved by using the Canonical Typology method (CT) (Brown & Chumakina 2013), but the limits on variation have received little attention and constitute an entire research programme in their own right, and the aim of this project is to tackle this issue in relation to the area of grammatical agreement.
More specifically, our aims here are to examine the limits of variation evaluated with respect to a subset of CT criteria, which are applied to the data points in the Surrey Database of Agreement. In other words, we are concerned with why languages exhibit the agreement patterns that they do within their empirical boundaries, and do not exhaust the logical space of variation.