 |
So far we have annotated 1,999 abstracts, with 16,819 sentences
and 460k words. The whole corpus contains 45,982 markables,
among which 32,464 are anaphoric and 13,518 are discourse-new.
Four types of co-reference relations are anotated, namely, identic
(IDENT), pronominal (PRON), appositive (APPOS), relative (RELAT).
This is a joint project of Institute for Infocomm Research(I2R) team,
Singapore and Tsujii Laboratory, Tokyo University. Tsujii Lab provides the
funding support and biology validation of linguistic annotation done by
I2R team. Dr. Tateisi Yuka from Tsujii's Lab coordinated the biology
validation with 5 biology Master and PhD Students from Tokyo U.
The following personnel involve in the linguistic annotation.
|