The Daily Pennsylvanian is a student-run nonprofit.

Please support us by disabling your ad blocker on our site.

09-27-24-gutmann-hall-tour-chenyao-liu
Penn’s Brachio Lab is evaluating the use of large language models in accessing cyberbullying. Credit: Chenyao Liu

Doctoral students in the Brachio Lab at Penn Engineering developed a mechanism that examines AI models for signs of cyberbullying capabilities.

Penn’s Brachio Lab, which has been evaluating the applicability of large language models in practical uses such as legal reasoning or economic forecasting, is now researching the extent to which LLMs might be capable of cyberbullying. Their goal is to improve machine learning for societal benefit, making this area of AI research a high priority.

Generative AI has been connected to the creation of harassing posts, comments, and messages on a variety of platforms and networks. Automated troll bots increase the dissemination and impact of digital harassment and increasingly leverage AI techniques to create more convincing content. The possibility of LLMs autonomously exhibiting cyberbullying behavior raises significant concerns.

Faculty lead for the Brachio Lab and assistant professor in the Department of Computer and Information Science in the School of Engineering and Applied Science Eric Wong noted the project’s relevance given the widespread personal usage of LLMs. As the emergence of cases of LLMs encouraging self-harm has been attributed to faulty mechanisms, this research aims to evaluate the safety and effectiveness of LLMs to remedy these faults and to help prevent them from occurring in the first place in the future.

The “evaluator agents” developed by students in the Brachio Lab are LLMs themselves, designed to assess other LLMs for cyberbullying by interacting with, studying, and potentially mending bugs in the models. The agents are trained on data from areas of interest such as market forecasting and legal reasoning, according to Davis Brown, a first-year doctoral student in CIS and the Brachio Lab and developer of the framework.

PhD student Helen Jin, project lead for the Brachio Lab’s cyberbullying capability case study, tests these evaluator agents with profiles that reflect diverse populations that use LLMs. Leveraging census data such as occupation and socioeconomic status, Jin prompts LLMs with questions and studies their responses in relation to the given attributes.

The evaluator agents produce profiles of LLM cyberbullying capabilities, adapting these with increased exposure. Brachio Lab researchers found that some LLMs have deficits in their reasoning abilities that can lead to cyberbullying behavior.

The lab is currently evaluating LLMs for “cultural politeness and emotional capabilities” to discern whether different cultural contexts might color a user’s perception of LLMs as engaging in cyberbullying.