When finding quantitative information like a number, a table, or a whole dataset we want be able to judge its quality to help us select the right information.
Quality needs to be defined more precisely. There are several aspects:
- Creation - If data is created by a measuring device, the quality of the data depends upon the accuracy of the measuring device. If data is produced by measuring human behaviour, the quality depends upon the way the experiment is conducted (methodological issues).
- Research data management - Research data management is about metadata, file formats and access.
- Metadata: data describing the data. Examples are a log, descriptions of variables, questionnaires, readme files about folders structures etc.
- File formats: choosing a file format impacts the way you can use your data. It can determine which software you need to use to open your files. It also influences ease of use in about, let’s say, 10 years. A data repository like DANS has preferred file formats. Choosing one of their preferred file formats guarantees support to convert files in the future if necessary.
- Access: ownership and privacy can be issues.
- Usefulness - Assessing the quality of data in terms of its usefulness for research purposes (scholarly merit) is difficult since it basically depends on your research questions. Peer reviewing datasets might be an indication of academic status. A very practical indicator of the popularity of datasets are download statistics or data citation.