Creative Commons License
Zargan Lexical Database for Turkish by Zargan Ltd. is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Citation Information:
Bilgin, O. (2016). Frequency Effects in the Processing of Morphologically Complex Turkish Words (Unpublished master’s thesis). Bogaziçi University, Istanbul, Turkey. Retrieved from http://st2.zargan.com/public/resources/turkish/frequency_effects_in_turkish.pdf

Contact: info@zargan.com




Package contents

File Description
word_forms_stems_and_frequencies_full.zip 1,337,898 word-forms and their stems, morphological analyses and frequencies (Source: BOUN Corpus)
word_families.zip 8,766 stems grouped into 3,209 derivational families
noun_properties.zip 25,723 noun stems and 9 properties that describe them (frequencies of bare form, inflected forms, derived forms, -(s)I forms, number of characters, number of syllables, mean bigram frequency, number of orthographic neighbors, noun type (simplex, complex and transparent, complex and opaque)
suffix_sequence_properties.zip 24,394 suffix sequences and 18 properties that describe them (total frequency, number of parents, frequency of parents, number of children, frequency of children, number of siblings, frequency of siblings, number of suffixes, mean suffix unigram frequency, number of suffix bigrams, mean suffix bigram frequency, number of suffix trigrams, mean suffix trigram frequency, number of inflectional suffixes, number of derivational suffixes, number of -(s)I compound markers, blocking position, sequence length)
suffix_tree.zip All suffix sequences attested in the BOUN Corpus represented as a single tree (GML: Graph Modelling Language) (Source: BOUN Corpus)
stems_and_frequencies_200d.zip 30,862 stems and their 200-dimensional frequencies (ordered list of dimensions in top_200_suffix_sequences_attached_to_nouns.zip)
top_200_suffix_sequences_attached_to_nouns.zip Most frequent 200 suffix sequences that attach to noun roots (Source: BOUN Corpus)
letter_unigram_frequencies_anywhere.zip Frequencies of 31 letters anywhere in the words (Source: BOUN Corpus)
letter_unigram_frequencies_word-final.zip Frequencies of 31 letters at the end of words (Source: BOUN Corpus)
letter_unigram_frequencies_word-initial.zip Frequencies of 28 letters at the beginning of words (Source: BOUN Corpus)
letter_bigram_frequencies_anywhere.zip Frequencies of 790 two-letter sequences anywhere in the words (Source: BOUN Corpus)
letter_bigram_frequencies_word-final.zip Frequencies of 560 two-letter sequences at the end of words (Source: BOUN Corpus)
letter_bigram_frequencies_word-initial.zip Frequencies of 514 two-letter sequences at the beginning of words (Source: BOUN Corpus)
letter_trigram_frequencies_anywhere.zip Frequencies of 9,241 three-letter sequences anywhere in the words (Source: BOUN Corpus)
letter_trigram_frequencies_word-final.zip Frequencies of 4,276 three-letter sequences at the end of words (Source: BOUN Corpus)
letter_trigram_frequencies_word-initial.zip Frequencies of 4,041 three-letter sequences at the beginning of words (Source: BOUN Corpus)
suffix_unigram_frequencies_anywhere.zip Frequencies of 72 suffixes anywhere in the suffix sequence (Source: BOUN Corpus)
suffix_unigram_frequencies_template-final.zip Frequencies of 55 suffixes at the end of suffix sequences (Source: BOUN Corpus)
suffix_unigram_frequencies_template-initial.zip Frequencies of 69 suffixes at the beginning of suffix sequences (Source: BOUN Corpus)
suffix_bigram_frequencies_anywhere.zip Frequencies of 942 two-suffix sequences anywhere in the suffix sequence (Source: BOUN Corpus)
suffix_bigram_frequencies_template-final.zip Frequencies of 801 two-suffix sequences at the end of suffix sequences (Source: BOUN Corpus)
suffix_bigram_frequencies_template-initial.zip Frequencies of 921 two-suffix sequences at the beginning of suffix sequences (Source: BOUN Corpus)
suffix_trigram_frequencies_anywhere.zip Frequencies of 4,565 three-suffix sequences anywhere in the suffix sequence (Source: BOUN Corpus)
suffix_trigram_frequencies_template-final.zip Frequencies of 3,948 three-suffix sequences at the end of suffix sequences (Source: BOUN Corpus)
suffix_trigram_frequencies_template-initial.zip Frequencies of 4,414 three-suffix sequences at the beginning of suffix sequences (Source: BOUN Corpus)
non-words_CV.zip All possible non-words of the form CV and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_VC.zip All possible non-words of the form VC and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_CVC.zip All possible non-words of the form CVC and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_VCV.zip All possible non-words of the form VCV and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_VCC.zip All possible non-words of the form VCC and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_CVCV.zip All possible non-words of the form CVCV and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_VCVC.zip All possible non-words of the form VCVC and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_CVCVC.zip All possible non-words of the form CVCVC and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
non-words_CVCCV.zip All possible non-words of the form CVCCV and their substitution, addition, deletion and transposition neighbors (Wordlist: KelimetriK)
zargan_lexical_database_for_turkish.zip All datasets in a single ZIP file