in-person
Interpretability of large-scale neural models of text
Abstract: We introduce “POLAR”—a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analyzing its position on a scale between two polar opposites (e.g., cold–hot, soft–hard). The core idea of our approach is to transform existing, pre-trained word embeddings via semantic differentials to a new “polar” space with interpretable dimensions defined by such polar opposites. Our framework also allows for selecting the most discriminative dimensions from a set of polar dimensions provided by an oracle, i.e., an external source. We demonstrate the effectiveness of our framework by deploying it to various downstream tasks and discuss its applications and various settings.
Export event as
iCal