Methodology - Scripta, dialectologia catalana

Continguts

1 1. Texts selection
2 2. Description of the project
3 3. Computerisation of the corpus
4 4. Code for the text identification
5 5. Formalization and selection of the linguistic data
6 6. The planning for the redaction of the comments
7 7. Glossary

1. Texts selection

1.1 Search texts which contain elements little or a lot different from the standard.
1.2 Selection of unpublished texts and published texts (preferably, the first edition).
1.3 Selection of texts which belong to the genres less represented in the initial choice.

2. Description of the project

Transcription of texts: criteria establishment

2.1 Transcript texts (and published)
2.2 unpublished texts

2.2.1 Medieval texts: use of the edition rules of “Els Nostres Clàssics”
Word separation.
Apostrophes and interpoints as necessary.

Hyphens or apostrophes to separate the pronouns from the verb.
Regularization of the u/v and i/j.

2.2.2 Texts from XVII century: we respect the original spelling

2.2.3 Transcription marks

We use:

< >. When we remove elements that appear in the text.
[ ]. As well as when we add elements, within the brackets we put whether we are dealing with the writer’s notes, interlineated and crossed out elements.
[…]. When we do not transcript since it is incomprehensible or illegible.
(…). When we want to indicate the omission of a fragment, which we have not selected.
(¿). Unclear reading.

Italics. We develop always the abbreviations. If the editor does not mark it, we do not mark either.

2.2.4 The treatment of the notes.

If they are ours, we numerate them without any mark. If they are from the editor, we indicate it with a *, apart from the corresponding numeration.

3. Computerisation of the corpus

3.1 Texts digitalization

3.2 Application of the text format (Word)

3.3 Elaboration of the technic database (Access)

Variety, code, year, century, genre
Title of the text fragment, name of the author
The complete bibliographic references of the text

4. Code for the text identification

Every text has been encoded according to its dialect data, its year and its textual genre.

4.1 Dialect and subdialect coding

Balear: Bm (mallorquí), Bme (menorquí), Be (eivissenc)
Central: C (central), Ct (tarragoní), Cs (septentrional de transició)
Rossellonès: R
Alguerès: A
Valencià: V (valencià), Vs (septentrional), Va (apitxat), Vm (meridional)
Nord-occidental: N (nord-occidental), Np (pallarès), Nt (tortosí), Nr (ribagorçà)

4.2 Genre coding

Text-type categorization (according to the register and the theme) from the criteria of the IEC’s “Corpus Textual Informatitzat de la Llengua Catalana” (directed by Joaquim Rafel), adapted to the features of the “corpus Scripta”.

Narrative
1.1 Storytelling
Poetry
2.1 Popular poetry
Theatre
3.1 Colloquia
3.2 Literary investigation
correspondence
Philosophy
Religion
6.1 Sermons
6.2 Goigs (a kind of religious poetry)
6.3 Hagiography
6.4. Catechesis
6.5 Liturgy
6.6 Edicts, ecclesiastical ordinations
6.7 Prophecies
Social science
7.1 Testaments
7.2 Inventory
7.3 Judicial legal texts
7.4 Contracts
7.5. Pregó (Street cries), edicts, acts, ordinations, determinations, instances
7.6 Books of account and receipts
7.7 Education, urbanity
7.8 Others
Press
Natural sciences
9.1 Meteorology
9.2 Astronomy, astrology
9.3 Zoology
9.4 Botany
9.5 Geology
10 Applied sciences
10.1 Medicine
10.2 Agriculture
10.3 Navy
10.4 Veterinary
10.5 Gastronomy
10.6 Arts and crafts (building)
10.7 Economy
10.8 Militia
Fine Arts. Entertainments. Games. Sports
Linguistics
12.1 Lexicography
12.2 Grammar
12.3 Others (apologies)
12.4 Onomastics
History and geography
13.1 History
13.2 Geography
13.3 Diaries
Fundamental science

4.3 Text code identifier

According to the year in which the text was written (if the data is unknown) and with the established codification for the dialects and the text types, the code which identifies the texts is as follows:

Dialect-year-genre

Example: Be-1560-7.2 (= eivissenc-year 15600-inventories)

5. Formalization and selection of the linguistic data

5.1 Codification of the more outstanding phonetic, morphological, syntactic and lexical phenomena.

5.2 Application of the linguistic data encoded in each of the texts from the balearic dialect.
5.3 Elaboration of the standard codes for a linguistic analysis of each of the texts selected

6. The planning for the redaction of the comments

6.1 Headline

Dialect
Text code
Author: title of the text

6.2 Linguistic commentary

Documental and/or bibliographic reference comments about the text documental (unpublished text, published text correction, edition, etc.)
Author’s reference, when the text is not anonymous
Contextualization of the document

Graphy
Phonetics
Morphosyntax
Lexis

Evaluation of the text’s linguistic data

6.3 Text

Reproduction of both the transcript and digitalized version of the text

7. Glossary

7.1 Marking of the forms discussed from the balear dialect

7.2 Index glossary: variant and text code