Multilingual, free, interesting
Select two of the following three: (i) translated into many languages, (ii) freely available to the public, and (iii) interesting to do lexical semantic work on. That’s basically the problem that’s facing some of us at a certain lexicographic effort. In a bid to increase collaboration among some of our colleagues, as well as explore the cross-linguistic (”cross-lingual”?) similarities and differences among semantic frames already posited for English, we’ve decided that it would be a good idea to find a text available in English, German, Spanish, and Japanese, all copyright-free, and hopefully interesting to work with, and then give it the full-text annotation treatment. However, this has been harder than expected.
Up to now, FrameNet has been doing a large amount of work on texts related to WMD (non-)proliferation, as well as several WSJ articles. These are two very peculiar genres, and as such require a lot of lexicographic work in a few semantic fields (like weapons, treaty/agreement-making, conflict resolution, prediction, and so on). So we thought it might be nice to do work on some fiction, which should have a wider range of frame-evoking words (e.g., verbs, adjectives, relational nouns) of the type that would be interesting to look at crosslinguistically (motion and communication words among the trot-out examples). Unfortunately, there’s the issue copyright that rears its ugly head, and there doesn’t seem to be any archive or list of multilingual public-domain texts out there.
So, we started with some thoughts on authors likely to be widely translated - Doyle, Orwell, and Christie came to mind. Unfortunately, there’s not much freely-available Agatha Christie in other languages, and 1984 is still in copyright - but only in the US. So Holmes seemed like our best bet, but unfortunately the text we most wanted to do, Hound of the Baskervilles, is copyrighted in Japan. Other works seem unavailable in German and Spanish. Essentially, it’s been frustrating finding something that fits all of our exacting criteria. What if we were to relax the “interesting” criterion? Well, there are always news articles, international treaties, UN and Europarl documents, and so on. Surely at least the treaties are publicly available (well, at least for the signatories, or so one would hope). And they might even be fun! (note: this is the fun that means ’superbly lame’)
And in case anyone out there has some good ideas, here are a few places to start: Project Gutenberg (and the Euro and Australian versions), the Etext center at Virginia, and , and Aozora Bunko (青空文庫).
Anne of Green Gables by L. Montgomery
Thanks for the suggestion. I wasn’t aware that Montgomery had such a following in Japan. Unfortunately, it looks like the oldest translation is from 1952 by MURAOKA Hanako, who passed away in 1968. So we’d have to waid until 2018 for the copyright to expire (barring extention…).