Японские корпуса и электронные библиотеки

Corpus of Spontaneous Japanese

a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology

Oxford NINJAL Corpus of Old Japanese (ONCOJ)

long-term collaborative research project between the University of Oxford and the National Institute for Japanese Language and Linguistics, which is developing a lemmatized, parsed and comprehensively annotated digital corpus of all texts in Japanese from the Old Japanese period.

Large collection of Japanese texts www.aozora.gr.jp

jpWaC: Japanese corpus on sketchengine.eu

from the .jp domain The Japanese web corpus (jpWaC) is a Japanese corpus made up of texts collected from the Internet. The corpus was prepared by Tomaž Erjavec using a list of URLs provided by Serge Sharoff at the University of Leeds. The standards of corpus preparation are described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010).

Japanese web corpora by Sketch Engine

Japanese treebanks on Universal dependencies website

Japanese-English Parallel Corpus - 日英パラレルコーパス

Small collection of Okinawan Proverbs