Registration open for Corpus Linguistics Summer School (CCR, University of Birmingham)

Dear all,

registration is now open for the CCR Summer School 2016:

http://www.birmingham.ac.uk/research/activity/corpus/events/corpus-linguistics-summer-school.aspx

Centre for Corpus Research, University of Birmingham 

Corpus Linguistics Summer School 2016

 

20-24 June 2016

 

The summer school is open to undergraduate, postgraduate, and doctoral students, as well as researchers who want to improve their skills to apply corpus methods in their own research. The programme combines presentations on cutting-edge research with practical hands-on sessions. There will also be the opportunity for participants to present their own work and receive feedback from our expert team. Given the specialised nature of the programme, a basic understanding of corpus linguistics is required.

 

Topics of sessions include:

 

·      Standard corpus tools and specialised software

·      Collocations, patterns and networks

·      Statistics (with R) in corpus linguistics

·      Corpus Stylistic Analysis methods

·      Python essentials for corpus linguists

·      Corpus linguistics and mixed methods approaches

 

A detailed programme will be published closer to the time.

Teachers of the summer school:

 

Gareth Carrol, Lecturer in Psycholinguistics, University of Birmingham
Johan de Joode, Research Fellow in Corpus Linguistics, University of Birmingham
Stefan Evert, Professor of Corpus Linguistics, FAU Erlangen-Nürnberg, Germany
Stefan Th. Gries, Professor of Linguistics, University of California, US
Susan Hunston, Professor of English Language, University of Birmingham
Andrew Kehoe, Associate Professor, School of English, Birmingham City University 
Michaela Mahlberg, Professor of Corpus Linguistics, University of Birmingham
Lorenzo Mastropierro, Teaching Fellow, University of Birmingham 
Pablo Ruano, Teaching Assistant, University of Extremadura, Spain
Simon Preston, Assistant Professor, School of Mathematical Sciences, University of Nottingham
Paul Thompson, Senior Lecturer in Corpus Linguistics, University of Birmingham
Viola Wiegand, Research Assistant, University of Birmingham
 

 

Research Centres A – E (historia, materiales de corpus-linguistics.com 24.04.2004)

 
Corpus Linguistics: Research / Organizations: A – E
 

Corpus
Linguistics
Research
Centres
Projects
Events
Mailing
Lists
Tutorials
Corpora
Software
CL
in Applied Linguistics

 

You are now in section > Corpus
Linguistics > Research
Centres > A – E

 

A – E
F – J
K – O
P – T
U – Z

ACL
CETH
CHILDES
Collins Cobuild
Computational Linguistics Group @ Oxford
CECL
Dictionary Research Centre, Birmingham
Dictionary Research Centre, Macquarie
ELRA
ELSNET


ACL
Association for Computational Linguistics

Coordination: ACL – Association for Computational Linguistics; President
(2003): Mark Johnson
Design/Purpose: “The Association for Computational Linguistics
is THE international scientific and professional society for people
working on problems involving natural language and computation.”

Projects:

  • Journal Computational
    Linguistics
Newsletter: no
Notes: Archives of the ACL are now available

CETH
-Center for Electronic Texts in the Humanities

Coordination: Brian Hancock ,
Rutger University
Design/Purpose: Production of eTexts in the Humanities
Projects:
  • Spectator
    Project
  • Caesar
    with Concordance
  • Electronic
    Journal of Boundary Elements
  • see more
Newsletter: no
Notes:

CHILDES
– Child Language Data Exchange System

Coordination: CHILDES, Carnegie Mellon University, U.S.
Design/Purpose: “The CHILDES system provides tools for studying
conversational interactions. These tools include a database of transcripts,
programs for computer analysis of transcripts, methods for linguistic
coding,and systems for linking transcripts to digitized audio and
video.”

Projects:

  • Talkbank
Newsletter: no, but see their Overview
of the CHILDES system. (PDF format)
Notes: “The data, programs, and manuals are freely accessible
over the net and can also be obtained on CD-ROM”

Collins
Cobuild

Coordination: “Cobuild is a department of HarperCollins Publishers
[…]based at the University of Birmingham, UK”
Design/Purpose: “[S]ince 1980, we have carried out research into
corpus-based lexicography. Throughout the 1980s, following the computational
corpus-based approach to language analysis developed by Professor
John Sinclair, Cobuild built up a large corpus of modern English,
software tools to manipulate and analyse the corpus data, and a
team of specialist corpus linguists and lexicographers.”

Projects:

  • Bank
    of English
  • Wordbanks Online
Newsletter: no
Notes:

Computational
Linguistics Group@Oxford

Coordination: Professor
Stephen Pullman
Design/Purpose: “The CLG comprises academics and graduate students
(all of the latter are working toward a doctorate (D.Phil)) and
is led by Professor Stephen Pulman.

Our website contains information about the people in the group,
the work in progress here, and CLG news – the latter is part of
our forum/discussion-board which is open to all.”

Projects:

  • Grammar Learning Using Inductive Logic Programming
  • Intelligent Assessment of Examination Scripts
  • and lot’s more. Check it out.
Newsletter: The forum will serve as a news exchange.
Notes:

Centre
for Corpus Linguistics, Birmingham

Coordination: Staff of the Centre
for Corpus Linguistics at the university of Birmingham
Design/Purpose: “The Centre for Corpus Linguistics (CCL), Department
of English at the University of Birmingham was established in December
2000 by the newly appointed professor Wolfgang Teubert. The CCL
carries forth the strong corpus linguistic tradition of Birmingham,
dating back to the 70’s, with the compilation of COBUILD, the first
corpus-based general language dictionary. John Sinclair, the project
leader, was one of the pioneers of corpus linguistics. “

Projects:

  • Chinese-English
    Translation Base
  • Global
    English Monitor Corpus
  • TELRI
  • TRACTOR
Newsletter: no
Notes:

Dictionary
Research Centre (Birmingham)

Coordination: Department of English, University of Birmingham
Design/Purpose:

Projects:

  • Johnson’s A Dictionary of the English Language
  • corpus-based lexicography
  • bilingual and multilingual lexicography
  • the lexicographical description of collocation
  • the language of definitions
  • perceptions of dictionaries
  • metaphor and dictionaries
Newsletter: no
Notes:

Dictionary
Research Centre (Macquarie University)

Coordination: Department of Linguistics, Macquarie University, Sydney,
Australia; director is Associate Professor Pam Peters
Design/Purpose: “[The DRC] promotes systematic research in lexicography,
lexicology, Australian English and English usage in general.”
Projects:
  • Corpus
    Linguistics
  • Lexicography
  • Australian
    English
  • more
Newsletter: no
Notes:

ELRA
– EUROPEAN LANGUAGE RESOURCES ASSOCIATION

Coordination: ELRA, Luxembourg
Design/Purpose: “The overall goal of ELRA is to provide
a centralized organization for the validation, management, and distribution
of speech, text, and terminology resources and tools, and to promote
their use within the European telematics R&TD community”

Projects:

European Language resources – Distribution
Agency (ELDA)
Newsletter: read their Newsletter:
here
Notes: website available in English and in French

ELSNET
– European Language and Speech Network

Coordination: ELSNET office at Utrecht University, Netherlands
Design/Purpose: Europe-based forum dedicated to human language technologies

Projects:

  • HLT
    Central – Gateway to Speech & Language Technology Opportunities
    on the Web
Newsletter: visit their news-section;
subscription to ElsNews is free
Notes:

You are now in section > Corpus
Linguistics > Research
Centres > A – E

 

Data-driven
learning
Virtual
Resources
Bibliography
Email
About

 

<FILE ARCHIVED ON  AND RETRIEVED FROM THE INTERNET ARCHIVE ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
SECTION 108(a)(3)). contact rubtcova.com

 

 

English Corpora A – E (materiales de corpus-linguistics.com 24.04.2004)

 

Corpora: English Corpora: A – E

 

 

Corpus
Linguistics
Tutorials
Corpora
English
Corpora
German
Corpora
More
Languages
Spoken
Corpora
Learner
Corpora
ICE

Corpora

Parallel
Corpora
Historical
Corpora
Treebanks
Text

Archives

Alphabetical
List
Software
CL
in Applied Linguistics

 

You are now in section > Corpora
> English Corpora
> A – E

 

A – E
F – J
K – O
P – T
U – Z

 

ACE
Bank
of English
BNC
BROWNCEECS
Christine
CIC
CLCCOLT
CPSAECI

ACE – Australian Corpus of English

Org: Macquarie University, Sydney, Australia
Time: 1986

Size:

1 mio words (500×2000 words samples)
Contents: written and spoken, multigeneric (15 different genres)

Access:

available on the ICAME
CD-ROM
Notes: modelled
on BROWN and LOB
for linguistic research

Bank of English – Cobuild

Org: Cobuild and the University of Birmingham, UK (John
Sinclair)
Time: majority of the material originates after 1990

Size:

415 mio words in Oct 2000 (still growing)
Contents: written and spoken; multigeneric (

Access:

CobuildDirect allows restricted access to the corpus
through a java or telnet interface; restricted concordance and collocation
queries are possible
Notes:

BNC
– British National Corpus

Org: Lead by an industrial/academic consortium lead by
Oxford University Press
Time: completed in 1994; first release in 1995; second release
in 2001

Size:

over 100 mio words (4,125 texts)
Contents: multigeneric; 90% written and 10% spoken materials

Access:

Licensed; Guest account available by using the SARA
Client at the
BNC Online Service or conduct
a simple search at the BNC.
Notes: SGML
Markup according to the TEI
guidelines; POS
tagging carried out with CLAWS

BROWN
University Corpus

Org: Brown University, Rhode Island,U.S.
Time: 1960s

Size:

ca. 1 mio words
Contents: American written English; 500 text samples of approximately
2,000 words distributed over 15 text categories

Access:

available on the ICAME
CD-ROM
Notes:

CEECS
– Corpus of Early English Correspondence Sampler

Org: University of Helsinki, Finnland
Time: 1418-1680

Size:

approx. 450,000 words
Contents: click
here for a list of included texts

Access:

available on the ICAME
CD-ROM
Notes: represents the non-copyrighted materials included
in the Corpus of Early English Correspondence

CHRISTINE
Corpus

Org: Geoffrey
Sampson, University of Essex, UK
Time: first distributed in August 2000

Size:

Contents: spoken English, and particularly spontaneous,
informal spoken English

Access:

freely available for download
here
Notes: see also SUSANNE

name=”CIC”

CIC
– Cambridge International Corpus

Org: Cambridge University Press
Time: ongoing

Size:

300 mio words and expanding
Contents: multigeneric; written and spoken British and American
materials, learners’ English

Access:

“Currently, it can only be used by authors and
writers working for Cambridge University Press and by members of
staff at UCLES.”
Notes: “Authors, editors and lexicographers use the
CIC
[…] when they are working on books for Cambridge University Press.”

CLC
– Cambridge Learner Corpus

Org: Cambridge University Press and UCLES.
Time: ongoing

Size:

10 mio and expanding
Contents: anonymised exam scripts written by students taking
UCLES
English exams around the world

Access:

“Currently, it can only be used by authors and
writers working for Cambridge University Press and by members of
staff at UCLES.”
Notes: It forms part of the Cambridge
International Corpus

COLT
– Bergen Corpus of London Teenage Language

Org: University of Bergen, Norway
Time: material collected in 1993

Size:

500.000 words; Pilot-version consists of 151 texts
Contents: transcripts of spoken ‘London Teenage Language’

Access:

search in the pilot version is available; reg. users
can search the entire corpus online; COLT is available on the ICAME
CD-ROM
Notes: COLT is part of the BNC;
it is tagged for word classes

CPSA
– Corpus of Spoken Professional American English

Org: Contact: Michael
Barlow
Time: 1994-1998

Size:

2 main sub-corpora, 1 mio words each
Contents: short interchanges by 400 speakers – professional activities broadly
tied to academics and politics

Access:

Registered users only ($79 for the individual using
the tagged version)
Notes: The tagging was performed by Tony McEnery and Paul
Baker using the CLAWS programme at UCREL, Lancaster University; available both tagged and untagged

ECI
Corpus

Org: ELSNET
Time: materials collected between 1984 and 1993

Size:

Four different corpora ranging from 4 to 34 mio. words
Contents: German, French and Dutch newspaper texts; parallel
texts in English Spanish and French

Access:

available on CD Rom for € 50 for research purposes
only
Notes:

You are now in section > Corpora
> English Corpora >
A – E

Data-driven
learning
Virtual
Resources
Bibliography
Email
About

 

 

 

 

<FILE ARCHIVED ON  AND RETRIEVED FROM THE INTERNET ARCHIVE ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
SECTION 108(a)(3)). contact rubtcova.com