Text files from Gutenberg database
Beskrivning
Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset. This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB. More precisely, the following datasets can be found in this package: 1. 184MB 2. 357MB 3. 670MB 4. 1GB 5. 1.7GB In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.
Visa merPubliceringsår
2019
Typ av data
Projekt
Övriga uppgifter
Vetenskapsområden
Data- och informationsvetenskap
Språk
engelska
Öppen tillgång
Öppet