UW Interactive Data Lab
IDL logo

Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis

Tongshuang (Sherry) Wu, Daniel S. Weld, Jeffrey Heer. ACM Trans. on Computer-Human Interaction, 2019
Figure for Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis
Performance trajectory examples for four participants in our user study, with different performance impact on the development set and the test set. Even for individuals who successfully reach an improved model ((a), with increased performance on both sets), the model performance oscillates as they revise the feature space.
Materials
Abstract
Tools for Interactive Machine Learning (IML) enable end users to update models in a "rapid, focused, and incremental"—yet local—manner. In this work, we study the question of local decision making in an IML context around feature selection for a sentiment classification task. Specifically, we characterize the utility of interactive feature selection through a combination of human-subjects experiments and computational simulations. We find that, in expectation, interactive modification fails to improve model performance and may hamper generalization due to overfitting. We examine how these trends are affected by the dataset, learning algorithm, and the training set size. Across these factors we observe consistent generalization issues. Our results suggest that rapid iterations with IML systems can be dangerous if they encourage local actions divorced from global context, degrading overall model performance. We conclude by discussing the implications of our feature selection results to the broader area of IML systems and research.
BibTeX
@article{2019-iml-pitfalls,
  title = {Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis},
  author = {Wu, Tongshuang AND Weld, Dan AND Heer, Jeffrey},
  journal = {ACM Trans. on Computer-Human Interaction},
  year = {2019},
  volume = {26},
  number = {4},
  url = {https://uwdata.github.io/papers/iml-pitfalls},
  doi = {10.1145/3319616}
}