Background: The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities.
Methods: Five
research questions led our review: what is the state of the evidence base; how
has workload reduction been evaluated; what are the purposes of semi-automation
and how effective are they; how have key contextual problems of applying text
mining to the systematic review field been addressed; and what challenges to
implementation have emerged?
We answered these questions using
standard systematic review methods: systematic and exhaustive searching,
quality-assured data extraction and a narrative synthesis to synthesise
findings.
Results: The evidence
base is active and diverse; there is almost no replication between studies or
collaboration between research teams and, whilst it is difficult to establish
any overall conclusions about best approaches, it is clear that efficiencies
and reductions in workload are potentially achievable.
On the whole, most suggested that a
saving in workload of between 30% and 70% might be possible, though sometimes
the saving in workload is accompanied by the loss of 5% of relevant studies
(i.e. a 95% recall).
Conclusions:
Using text
mining to prioritise the order in which items are screened should be considered
safe and ready for use in 'live' reviews. The use of text mining as a 'second
screener' may also be used cautiously. The use of text mining to eliminate
studies automatically should be considered promising, but not yet fully proven.
In highly technical/clinical areas, it may be used with a high degree of
confidence; but more developmental and evaluative work is needed in other disciplines.
Finn Y,
Cantillon P, Flaherty G. Exploration of a
possible relationship between examiner stringency and personality factors in
clinical assessments: a pilot study. BMC
Medical Education 2014, 14:1052
doi:10.1186/s12909-014-0280-3
ABSTRACT
Background:
The
reliability of clinical examinations is known to vary considerably.
Inter-examiner variability is a key source of this variability. Some examiners
consistently give lower scores to some candidates compared to other examiners
and vice versa – the ‘hawk- dove’ effect. Stable examiner characteristics, such
as personality factors, may influence examiner stringency. We investigated
whether examiner stringency is related to personality factors.
Methods: We recruited
12 examiners to view and score a video-recorded five station OSCE of six Year 1
undergraduate medical students at our institution. In addition examiners
completed a validated personality questionnaire. Examiners’ markings were
tested for statistically significant differences using non-parametric one way
analysis of variance. The relationship between examiners’ markings and examiner
personality factors was investigated using Spearman correlation coefficient.
Results: At each
station there was a statistically significant difference between examiners
markings, confirming the presence of inter-examiner variability. Correlation
analysis showed no association between stringency and any of the five major
personality factors. When we omitted an outlier examiner we found a
statistically significant negative correlation between examiner stringency and
openness to experience with a correlation coefficients (rho) of – 0.66
(p = 0.03). Conversely there was a moderate positive correlation between
examiner stringency and neuroticism with a correlation coefficient (rho) of
0.73 (p = 0.01).
Conclusions:
In this
study we did not find any relationship between examiner stringency and examiner
personality factors. However, following the elimination of an outlier examiner
from the analysis, we found a significant relationship between examiner
stringency and two of the big five personality factors (neuroticism and
openness to experience). The significance of this outlier is not known. As this
was a small pilot study we recommend further studies in this field to
investigate if there is a relationship between examiner stringency in clinical
assessments and personality factors.