7th ANNUAL COMPTEXT Conference 2025
Workshop Program on April 24, 2025

Automated Content Analysis of Visual Data
Workshop Leader: Tobias Heidenreich
With the surge of visual information from sources like social media, automated visual content analysis is becoming an essential tool for social science research to navigate large quantities of data. This beginner-level workshop introduces participants to basic techniques for handling, processing, and analyzing visual material, exploring unsupervised and supervised approaches like transfer learning and vision transformer models. The session includes hands-on coding exercises, requiring a basic knowledge of Python to follow along. We will also briefly discuss theoretical frameworks to showcase how these methods can be meaningfully embedded in social science research, enabling ways to study and interpret visual data.
Tobias Heidenreich is a postdoctoral research fellow in the Global Governance unit at the WZB Berlin Social Science Center. His research explores political communication among diverse stakeholders in international and comparative contexts. Employing computational methods, he analyzes large-scale datasets comprising textual and visual content.
Bayesian Text Analysis
Workshop Leader: Petro Tolochko
This workshop introduces participants to the fundamental concepts of Bayesian inference and demonstrates how these principles apply to textual data. It guides attendees through building models that capture linguistic nuances, interpreting posterior distributions, and accounting for uncertainty in analysis. The workshop covers various techniques, including what is traditionally considered “Bayesian” (like LDA), as well as Bayesian extensions of more classical methods (e.g., sentiment analysis). No prior knowledge of Bayesian methods is required. Basic knowledge of programming (Python or R) is welcome. The workshop will be held in Python.
Petro Tolochko is a postdoctoral researcher at the Communication Department at the University of Vienna. His interests revolve around statistical modelling, text analysis, and computational social science.
Follow the User?! Data Donation Studies for Collecting Digital Trace Data
Workshop Leaders: Valerie Hase, Frieder Rodewald
Data donation studies are a new method for collecting digital traces: Users download their data from digital platforms and donate it to science via Data Donation Tools. Researchers use CSS to filter, preprocess, and analyze data in a privacy-by-design approach. The workshop introduces data donation studies (key steps; technical, legal, and ethical considerations). It targets scientists with no/little experience with data donation studies. Please note that this is a less technical workshop, as we will focus on the general design of studies and less on programming code. Experience with programming (R/Python) is beneficial but optional.
Dr. Valerie Hase is a Postdoctoral Scholar at the Department of Media and Communication at LMU Munich (previously University of Zurich, LSE). Her research focuses on computational social science (e.g., automated content analysis, digital traces) and digital journalism. She co-leads a DFG-project on “Integrating data donation in survey infrastructure” and is involved in policy efforts to improve platform data access.
Frieder Rodewald is a PhD-student of Social Data Science at the University of Mannheim. He studies what people do online, especially how they deal with their privacy and how to measure and explain their behavior. He is currently working on several data donation studies regarding Instagram, YouTube, LinkedIn or TikTok.
Brave New Data Access World: What the Digital Services Act (DSA) Means for Researcher Access to Digital Platforms
Workshop Leader: Jakob Ohme
The European Union’s Digital Services Act (DSA) introduces novel opportunities for researchers by establishing a legal framework for access to data from digital platforms, including leading social media companies. Article 40 of the DSA explicitly outlines how researchers can apply for data access, offering an unprecedented gateway to studying the digital ecosystem.
However, while this legal right to access is a game-changer, it is not without challenges. Navigating the application process, understanding the modalities of access, and managing ethical and technical considerations require careful preparation.
This interactive workshop will:
Provide an overview of the DSA’s data access provisions and the types of data available.
Equip participants with strategies for crafting successful data access requests.
Offer practical training in preparing applications, including technical and ethical prerequisites.
Facilitate hands-on exercises where participants draft their own data access requests, which could serve as the foundation for future research projects.
Participants should have a clear research interest in digital platforms, basic knowledge of quantitative or qualitative research methods, experience or interest in ethics and data protection, and ideally a preliminary research idea or question related to platform data.
Dr. Jakob Ohme leads the “Digital News Dynamics” group at the Weizenbaum Institute, studying digital journalism’s impact versus influencers and AI. His research focuses on news consumption, political engagement, and using digital trace data to advance political communication and journalism. He is a Co-Principal Investigator in the #DSA40 Collaboratory, focusing on collaborative access to platform data under the EU’s Digital Services Act.
Computational Approaches to Narratives
Workshop Leader: Maria Antoniak
Narratives can be used to persuade, to educate, to spread misinformation, to support personal growth, and to entertain, and studying their spread across large datasets requires using computational methods. In this workshop, we will cover a variety of computational approaches for the study and measurement of narratives. This will include topics such as story detection, comparison of character framings, and extraction of narrative shapes, and we will experiment together with a variety of datasets, drawn from literary, political, and social media sources. This workshop is aimed at researchers interested in using text analytics for the study of narratives in big datasets. Beginners are very welcome but some knowledge of Python and text analytics will be helpful.
Maria Antoniak is a Postdoc at the Pioneer Centre for AI at the University of Copenhagen and an incoming Assistant Professor of Computer Science at the University of Colorado Boulder. She completed her PhD in Information Science at Cornell University and has an MS in Computational Linguistics from the University of Washington. She has also spent time in industry at the Allen Institute for AI, Microsoft Research FATE, Twitter Cortex, and Facebook Core Data Science. Her work focuses on natural language processing and cultural analytics.
Expand your Toolkit! How to Harness More Diverse Task Types for Concept Measurement
Workshop Leader: Hauke Licht
Supervised text classification is a popular method for assigning categories to texts. However, in political and communication science, most applications focus on classifying individual documents into single, exclusive categories (e.g., policy topics, sentiment, or positions). This focus obscures a wide range of alternative annotation formats and task types, such as multi-label classification, pairwise classification, or entity extraction and classification. This workshop introduces you to these diverse task types and provides an overview of how to implement and evaluate them in Python using Transformer fine-tuning, few-shot learning, and LLM prompting. By expanding your text analysis toolkit, you’ll gain new skills to measure complex social science concepts in more creative and effective ways.
Hauke is Assistant Professor of Computational Political Science at the University of Innsbruck, Austria. He develops and applies computational text analysis methods to study how politicians, parties, and governments communicate to the public.
The Power of Less: Efficient LLM Training with Adapters
Workshop Leaders: Christopher Klamm, Julia Romberg
Large Language Models have become an essential tool in social science research for analyzing vast amounts of data. However, their training requires significant computing resources, which can present monetary and environmental challenges. In this workshop, we will discuss the potential of using smaller models and parameter-efficient training of LLMs for social science research. We will demonstrate how to use the Adapter framework for parameter-efficient models using Python. A basic understanding of Python, data analysis and familiarity with statistical concepts will be beneficial. All tools and software required for the hands-on sessions will be provided.
Christopher Klamm is an interdisciplinary researcher at the University of Cologne. His research focuses on Natural Language Processing and Computational Political Science. Christopher is also passionate about open science and open source.
Julia Romberg is a postdoctoral researcher in Computational Social Science at GESIS. Her research interests include the development of machine learning models that accommodate human label variation, improved understanding of the subjectivity inherent in natural language perception and production, computational argumentation, and the support of public participation and deliberation processes in the political domain.
During the event, photographs, video recordings, and audio recordings may be taken to document key moments, including interactions during workshops, roundtable discussions, presentations, coffee breaks, lunch breaks, and other informal conversations or meetings, such as networking sessions or spontaneous discussions between participants. These recordings may be used by the organisers for promotional purposes, including publication on official websites, social media channels, and in printed materials such as reports, posters, and brochures.
By registering for this event, participants explicitly consent to being recorded, and such agreement is a requirement for participation. Please note that images and recordings shared on public platforms may be reshared or republished by third parties, limiting the organisers’ ability to fully exercise participants’ Right to Erasure as set out in Article 17 of the General Data Protection Regulation (Regulation (EU) 2016/679, GDPR).
