Home   |   Structure   |   Research   |   Resources   |   Members   |   Training   |   Activities   |   Contact


Corpus de Português Escrito em Periódicos (CoPEP)

CoPEP - Corpus de Português Escrito em Periódicos (Corpus of Written Portuguese from Journals) (Tanara Zingano Kuhn and José Pedro Ferreira, 2018) was especially compiled for a lexicographic project designing an online corpus-driven dictionary of Portuguese for university students (PhD research of Tanara Zingano Kuhn). Data for this dictionary should be representative of the way language is used in academic written productions in different areas of knowledge in Brazil and Portugal. CoPEP was built to comply with this demand.

CoPEP contains around 10.000 texts extracted from journals published on the Brazilian and Portuguese national collections of SciELO (Scientific Electronic Library Online), distributed among six Great Areas, which in turn are grouped in three Schools of Knowledge, totalling over 48 M tokens. It is a synchronic corpus, the vast majority of its texts having been published between 2000 and 2016. The supcorpora for each language variety have almost exactly the same size and a vey similar number of tokens per both Great Areas and Schools of Knowledge, making it evenly balanced.

Metadata have been carefully recorded in order to allow for advanced corpus search options, e.g. year of publication, or Great Area. Interoperability with SciELO is available through the journals’ ISSN numbers, which were also retained as metadata.

For more information, check the detailed description or the publications.

How to cite this corpus:
Tanara Zingano Kuhn & José Pedro Ferreira (2018). CoPEP - Corpus de Português Escrito em Periódicos (v.1.5). Coimbra: CELGA-ILTEC.