long-term collaborative research project between the University of Oxford and the National Institute for Japanese Language and Linguistics, which is developing a lemmatized, parsed and comprehensively annotated digital corpus of all texts in Japanese from the Old Japanese period.
from the .jp domain
The Japanese web corpus (jpWaC) is a Japanese corpus made up of texts collected from the Internet. The corpus was prepared by Tomaž Erjavec using a list of URLs provided by Serge Sharoff at the University of Leeds. The standards of corpus preparation are described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010).