STA 290 Seminar: Stefan Gries (UC Santa Barbara)

Joint Statistics – Linguistics Seminar


This is the second talk of the joint colloquium. The first talk was hosted by Linguistics on Wednesday April 9th, 4:10pm in 53A Olson; click here for the abstract.  

Thursday, April 10th 2014 at 4:10pm, MSB 1147 (Colloquium Room)*
Join us for refreshments at 3:30pm in MSB 1147 (Colloquium Room)

Speaker: Stefan Gries, UC Santa Barbara

Title: "Quantitative methods in corpus linguistics: examples and requests for feedback”

Abstract: Corpus linguistics is a linguistic methodology based on the exploration/analysis of (ideally) large amounts of data retrieved from textual databases. Even though this method has been around for several decades and even though it is by definition a quantitative approach, it is only in the last 10 years or so that more and more advanced quantitative methods have become more widespread. As a result, the field is still very much in flux as practitioners are trying to develop suitable methods to deal with the large amounts of often very noisy and interrelated data. In this talk, I will present several problems that corpus linguists have (begun) to deal with in recent work with an eye to (i) giving an overview of current considerations in corpus linguistics and (ii) getting feedback on some suggestions that have been made. The areas/questions I will discuss involve (depending on time):

  • how to quantify the association/repulsion between words in corpora - the roles of directionality and diversity of association;
  • how to quantify the distribution of words or syntactic patterns in corpora;
  • how to determine the internal homogeneity of corpora;
  • how to identify stages in temporally-organized corpora;
  • how to explore differences between different speaker populations with regard to their linguistic behavior.