August Lohse

Ph.D Student in Social Data Science at University of Copenhagen


Dialectograms: Machine Learning Differences between Discursive Communities

Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards leveraging the richness of the full embedding space, by using word embeddings to map out how words are used differently. Specifically, we describe the construction of dialectograms, an unsupervised way to visually explore the characteristic ways in which each community use a focal word. Based on these dialectograms, we provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words. We apply our methods to explore the discourses of two US political subreddits and show how our methods identify stark affective polarisation of politicians and political entities, differences in the assessment of proper political action as well as disagreement about whether certain issues require political intervention at all.

This project is available as a preprint at:

Fixing Fieldnotes: Developing and Testing a Digital Tool for the Collection, Processing, and Analysis of Ethnographic Data

Ethnographic fieldnotes can contain richer and more thorough descriptions of social phenomena compared to other data sources. Their open-ended and flexible character makes them especially useful in explorative research. However, fieldnotes are typically highly unstructured and personalized by individual researchers, which make them harder to use as a method for data collection in collaborative and mixed methods research. More precisely, the unstructured nature of ethnographic fieldnotes presents three distinct challenges: 1) Organizability—it can be difficult to search and sort fieldnotes and thus to get an overview of them, 2) Integrability—it is difficult to meaningfully integrate fieldnotes with other more quantitative data types such as more such as surveys or geospatial data, and 3) Computational Processability—it is hard to process and analyze fieldnotes with computational methods such as topic models and network analysis. To solve these three challenges, we present a new digital tool, for the systematic collection, processing, and analysis of ethnographic fieldnotes. The tool is developed and tested as part of an interdisciplinary mixed methods pilot study on attention dynamics at a political festival in Denmark. Through case examples from this study, we show how adopting this new digital tool allowed our team to overcome the three aforementioned challenges of fieldnotes, while retaining the flexible and explorative character of ethnographic research, which is a key strength of ethnographic fieldwork.

Paper published Social Science Computer Review and can be found her: Link to paper

Ideological scaling of the Danish Parliament using word embeddings

Using transcripts from the Danish Parliament, I perform unsupervised ideological scaling of the Danish parties. For the project I embed around 450.000 political speeches alongside tokens for political parties and parliamentary sessions, using the static word embedding algorithm doc2vec. I perform dimensionality using PCA and reduce the word embedding to the two dimensions of ideological scaling.The resulting two-dimensional space can be interpreted as the most important dimension in Danish politics and is based on what the parties are actually saying in Parliament. The parties are thus placed alongside the words in the vector space, and their position can be interpreted using the word position.The work was presented in 2021 at SODAS at the University of Copenhagen as part of the Data Discussion event series.
Developing tools for Computational ethnography.

As part of my Ph.D, I work on integrating ethnographic fieldwork with computational and qual/quant methods. As a part of this work, I am involved in developing an app, for ethnographers to store, access, and manage fieldnotes in a more structured way. This allows for after-the-fact NLP analysis of the notes, but also for ethnographers to sort and access fieldnotes based on time, place, keyword, etc.

Read more about the app and desktop version and sign up here:

This work was generously supported by the Carlsberg Fondation’s reserach infrastructure grant. Read more here: Link to Carlberg Foundation

A paper is also currently being prepared, which aims to introduce the benefits of working with the App. 

Responsiveness of politicians to social media feedback

A main project of my Ph.D. where I examine to what extend social media feedback such as “likes”, can be said to drive the political agenda. The project is built on 8 years of Facebook data labeled using deep learning.

I am currently writing the results into a paper.