Skip to content

Create dataset malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian #360

Open
@albertvillanova

Description

@albertvillanova
  • uid: malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian
  • type: processed
  • description:
    • name: MALINDOMorph: Morphological dictionary and analyser for Malay/Indonesian
    • description: Malay/Indonesian lacked an open wide-coverage dictionary that can be used for both NLP tasks and non-NLP purposes. The MALINDO Morph morphological dictionary is the first such dictionary. It provides morphological information (root, prefix, suffix, circumfix, reduplication) for roughly 232K surface forms. The entry forms are those found in the authoritative dictionaries in Malaysia (Kamus Dewan4) and Indonesia (Kamus Besar Bahasa Indonesia5) (core dictionary) as well as frequent words in the Leipzig Corpora Collection (Goldhahn et al., 2012) (expanded dictionary). The morphological analyses were checked by hand for all surface forms, except for (i) basic and di-forms in the expanded dictionary whose existence is predicted from the corresponding meN-active forms in the core dictionary and (ii) the case variants of the items in the core dictionary. This paper also discusses the morphological analyser that we developed to create our morphological dictionary. Our morphological analyser is more linguistically rigorous than previous morphological analysers and stemmers/lemmatizers such as MorphInd (Larasati et al., 2011) because it takes into account circumfixes, which have previously been neglected, largely due to a misunderstanding among NLP researchers that circumfixes are no more than combinations of a prefix and a suffix.
    • homepage: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=http%3A%2F%2Flrec-conf.org%2Fworkshops%2Flrec2018%2FW29%2Fpdf%2F8_W29.pdf&clen=201938&chunk=true
    • validated: True
  • languages:
    • language_names:
      • Indonesian
    • language_comments:
    • language_locations:
      • Asia
      • Indonesia
    • validated: False
  • custodian:
    • name: Hiroki Nomoto
    • in_catalogue:
    • type: A university or research institution
    • location: Japan
    • contact_name: Hiroki Nomoto
    • contact_email: [email protected]
    • contact_submitter: False
    • additional: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=http%3A%2F%2Flrec-conf.org%2Fworkshops%2Flrec2018%2FW29%2Fpdf%2F8_W29.pdf&clen=201938&chunk=true
    • validated: False
  • availability:
    • procurement:
      • for_download: No - but the current owners/custodians have contact information for data queries
      • download_url:
      • download_email:
    • licensing:
      • has_licenses: Yes
      • license_text:
      • license_properties:
      • license_list:
    • pii:
      • has_pii: Yes
      • generic_pii_likely:
      • generic_pii_list:
      • numeric_pii_likely:
      • numeric_pii_list:
      • sensitive_pii_likely:
      • sensitive_pii_list:
      • no_pii_justification_class:
      • no_pii_justification_text:
    • validated: False
  • processed_from_primary:
    • from_primary: Taken from primary source
    • primary_availability: Yes - their documentation/homepage/description is available
    • primary_license: Unclear / I don't know
    • primary_types:
    • validated: False
    • from_primary_entries:
  • media:
    • category:
      • text
    • text_format:
      • .PDF
    • audiovisual_format:
    • image_format:
    • database_format:
      • other
      • pdf
    • text_is_transcribed: No
    • instance_type:
    • instance_count:
    • instance_size:
    • validated: False
  • fname: malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions