nlp - Russian Document Corpus for Search Engine -
I am working on a cross-language information retrieval that asks questions in English and searches for documents in Russian. To evaluate this system, it would be nice to have a collection of Russian documents to find it. Can anyone find out the collection of documents that I can search or websites I can easily bunch together Russian documents (from different Wikipedia)?
Documents can be anything though it would be nice if they were in certain areas of human knowledge (CS, Architecture, Engineering, Arts, Literature analysis, whatever ...) ... < / P>
Not sure that this is what you want, but They are in DBC 4 format, and have approximately 57.3 GB data.
Comments
Post a Comment