1. Texts selection

1.1 Search texts which contain elements little or a lot different from the standard.
1.2 Selection of unpublished texts and published texts (preferably, the first edition).
1.3 Selection of texts which belong to the genres less represented in the initial choice.

2. Description of the project

Transcription of texts: criteria establishment

2.1 Transcript texts (and published)
2.2 unpublished texts

2.2.1 Medieval texts: use of the edition rules of “Els Nostres Clàssics”
Word separation.
Apostrophes and interpoints as necessary.

Hyphens or apostrophes to separate the pronouns from the verb.
Regularization of the u/v and i/j.

2.2.2 Texts from XVII century: we respect the original spelling

2.2.3 Transcription marks

We use:

< >. When we remove elements that appear in the text.
[ ]. As well as when we add elements, within the brackets we put whether we are dealing with the writer’s notes, interlineated and crossed out elements.
[…]. When we do not transcript since it is incomprehensible or illegible.
(…). When we want to indicate the omission of a fragment, which we have not selected.
(¿). Unclear reading.

Italics. We develop always the abbreviations. If the editor does not mark it, we do not mark either.

2.2.4 The treatment of the notes.

If they are ours, we numerate them without any mark. If they are from the editor, we indicate it with a *, apart from the corresponding numeration.

3. Computerisation of the corpus

3.1 Texts digitalization

3.2 Application of the text format (Word)

3.3 Elaboration of the technic database (Access)

  • Variety, code, year, century, genre
  • Title of the text fragment, name of the author
  • The complete bibliographic references of the text

4. Code for the text identification

Every text has been encoded according to its dialect data, its year and its textual genre.

4.1 Dialect and subdialect coding

  • Balear: Bm (mallorquí), Bme (menorquí), Be (eivissenc)
  • Central: C (central), Ct (tarragoní), Cs (septentrional de transició)
  • Rossellonès: R
  • Alguerès: A
  • Valencià: V (valencià), Vs (septentrional), Va (apitxat), Vm (meridional)
  • Nord-occidental: N (nord-occidental), Np (pallarès), Nt (tortosí), Nr (ribagorçà)

4.2 Genre coding

Text-type categorization (according to the register and the theme) from the criteria of the IEC’s “Corpus Textual Informatitzat de la Llengua Catalana” (directed by Joaquim Rafel), adapted to the features of the “corpus Scripta”.

  1. Narrative
    1.1 Storytelling
  2. Poetry
    2.1 Popular poetry
  3. Theatre
    3.1 Colloquia
    3.2 Literary investigation
  4. correspondence
  5. Philosophy
  6. Religion
    6.1 Sermons
    6.2 Goigs (a kind of religious poetry)
    6.3 Hagiography
    6.4. Catechesis
    6.5 Liturgy
    6.6 Edicts, ecclesiastical ordinations
    6.7 Prophecies
  7. Social science
    7.1 Testaments
    7.2 Inventory
    7.3 Judicial legal texts
    7.4 Contracts
    7.5. Pregó (Street cries), edicts, acts, ordinations, determinations, instances
    7.6 Books of account and receipts
    7.7 Education, urbanity
    7.8 Others
  8. Press
  9. Natural sciences
    9.1 Meteorology
    9.2 Astronomy, astrology
    9.3 Zoology
    9.4 Botany
    9.5 Geology
    10 Applied sciences
    10.1 Medicine
    10.2 Agriculture
    10.3 Navy
    10.4 Veterinary
    10.5 Gastronomy
    10.6 Arts and crafts (building)
    10.7 Economy
    10.8 Militia
  10. Fine Arts. Entertainments. Games. Sports
  11. Linguistics
    12.1 Lexicography
    12.2 Grammar
    12.3 Others (apologies)
    12.4 Onomastics
  12. History and geography
    13.1 History
    13.2 Geography
    13.3 Diaries
  13. Fundamental science

4.3 Text code identifier

According to the year in which the text was written (if the data is unknown) and with the established codification for the dialects and the text types, the code which identifies the texts is as follows:


Example: Be-1560-7.2 (= eivissenc-year 15600-inventories)

5. Formalization and selection of the linguistic data

5.1 Codification of the more outstanding phonetic, morphological, syntactic and lexical phenomena.

5.2 Application of the linguistic data encoded in each of the texts from the balearic dialect.
5.3 Elaboration of the standard codes for a linguistic analysis of each of the texts selected

6. The planning for the redaction of the comments

6.1 Headline

  • Dialect
  • Text code
  • Author: title of the text

6.2 Linguistic commentary


  • Documental and/or bibliographic reference comments about the text documental (unpublished text, published text correction, edition, etc.)
  • Author’s reference, when the text is not anonymous
  • Contextualization of the document


  • Graphy
  • Phonetics
  • Morphosyntax
  • Lexis


  • Evaluation of the text’s linguistic data

6.3 Text

Reproduction of both the transcript and digitalized version of the text

7. Glossary

7.1 Marking of the forms discussed from the balear dialect

7.2 Index glossary: variant and text code