Efficient elicitation approaches to estimate collective crowd answers

Cited 0 time in webofscience Cited 15 time in scopus
  • Hit : 432
  • Download : 0
When crowdsourcing the creation of machine learning datasets, statistical distributions that capture diverse answers can represent ambiguous data better than a single best answer. Unfortunately, collecting distributions is expensive because a large number of responses need to be collected to form a stable distribution. Despite this, the efficient collection of answer distributions—that is, ways to use less human effort to collect estimates of the eventual distribution that would be formed by a large group of responses—is an under-studied topic. In this paper, we demonstrate that this type of estimation is possible and characterize different elicitation approaches to guide the development of future systems. We investigate eight elicitation approaches along two dimensions: annotation granularity and estimation perspective. Annotation granularity is varied by annotating i) a single “best” label, ii) all relevant labels, iii) a ranking of all relevant labels, or iv) real-valued weights for all relevant labels. Estimation perspective is varied by prompting workers to either respond with their own answer or an estimate of the answer(s) that they expect other workers would provide. Our study collected ordinal annotations on the emotional valence of facial images from 1, 960 crowd workers and found that, surprisingly, the most fine-grained elicitation methods were not the most accurate, despite workers spending more time to provide answers. Instead, the most efficient approach was to ask workers to choose all relevant classes that others would have selected. This resulted in a 21.4% reduction in the human time required to reach the same performance as the baseline (i.e., selecting a single answer with their own perspective). By analyzing cases in which finer-grained annotations degraded performance, we contribute to a better understanding of the trade-offs between answer elicitation approaches. Our work makes it more tractable to use answer distributions in large-scale tasks such as ML training, and aims to spark future work on techniques that can efficiently estimate answer distributions.
Publisher
Association for Computing Machinery
Issue Date
2019-11
Language
English
Article Type
Article
Citation

Proceedings of the ACM on Human-Computer Interaction, v.3, no.CSCW

ISSN
2573-0142
DOI
10.1145/3359164
URI
http://hdl.handle.net/10203/268860
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0