Quality of Bibliometric Databases: Accuracy in Classification of Document Types
Quality of Bibliometric Databases: Accuracy in Classification of Document Types
Purpose: Scholarly publications are usually classified into document types (DTs), which are
predefined categories outlining their nature (e.g., research articles, conference proceedings, reviews, short notes, letters, book chapters, etc.). This research presents a new semi-automated methodology to assess the accuracy of DT classification in bibliometric databases, such as Scopus and Web of Science (WoS). The methodology can handle a relatively large amount of documents (on the order of tens/hundreds of thousands) and is adaptable to the different classes of DTs covered by the databases in use, without requiring an a priori definition of a correspondence between their DTs.
Methodological approach: The first phase of the proposed methodology is automated and exploits discrepancies in DT classifications by two competing databases (e.g., Scopus and WoS), in order to identify a subset of potentially misclassified documents, i.e., with possible DT-classification errors.
The second phase involves the manual analysis of this subset of documents, resulting in the
identification and attribution of DT-classification errors. The novel methodology is illustrated
through a realistic application example.
Findings: The methodology is shown to be effective in identifying DT-classification errors,
suggesting a path to improve the quality and reliability of bibliometric databases. With reference to the application example provided, Scopus and WoS have overall error rates around 1.7% and 1.2%, respectively. A similar analysis based on a larger sample of documents is still in progress.
Practical/social implications: By improving database accuracy, the academic community can benefit from more reliable bibliometric indicators, which can affect (at least to some extent) research funding, decision making and academic reputation
Bibliometric databases, Document type, Semi-automated analysis, Database accuracy