CyLab researchers to present at ACM CHI 2023

Apr 20, 2023

ACM CHI 2023 graphic

CyLab Security and Privacy Institute researchers are set to present seven papers at the upcoming ACM CHI 2023 (Conference on Human Factors in Computing Systems).

The conference will take place in Hamburg, Germany, from April 23rd through the 28th, bringing together researchers and practitioners from diverse cultures, backgrounds, and positionalities who aim to improve the world with interactive digital technologies.

Carnegie Mellon authors from a variety of schools and disciplines contributed to more than 40 papers accepted to CHI 2023, including one Best Paper and six Honorable Mention Awards. Below, we’ve compiled a list of papers authored by CyLab Security and Privacy Institute members that are being presented at this year’s event.
 

A US-UK Usability Evaluation of Consent Management Platform Cookie Consent Interface Design on Desktop and Mobile

Authors: Elijah Robert Bouma-Sims, Yanzi Lin, Alexandra Nisenoff, Eleanor Birrell, Hana Habib, Megan Li, Adia Sakura-Lemessy, Ellie Young, Lorrie Faith Cranor

Websites implement cookie consent interfaces to obtain users’ permission to use non-essential cookies, as required by privacy regulations. The study’s authors look to extend prior research evaluating the impact of interface design on cookie consent through an online behavioral experiment (𝑛 = 1359) in which they prompted mobile and desktop users from the UK and US to make cookie consent decisions using one of 14 interfaces implemented with the OneTrust consent management platform (CMP). Researchers found significant effects on user behavior and sentiment for multiple explanatory variables, including more negative sentiment towards the consent process among UK participants and lower comprehension of interface information among mobile users. The design factor that had the largest effect on user behavior was the initial set of options displayed in the cookie banner. In addition to providing more evidence of the inadequacy of current cookie consent processes, their results have implications for website operators and CMPs.
 

Less is Not More: Improving Findability and Actionability of Privacy Controls for Online Behavioral Advertising
 

Authors: Jane Im, Weikun Lyu, Hana Habib, Nikola Banovic, Ruiyi Wang, Nick Cook, Lorrie Faith Cranor, Florian Schaub

Tech companies that rely on ads for business argue that users have control over their data via ad privacy settings. However, these ad settings are often hidden. This work aims to inform the design of findable ad controls and study their impact on users’ behavior and sentiment. The study’s authors iteratively designed ad control interfaces that varied in the setting's (1) entry point (within ads, at the feed’s top) and (2) level of actionability, with high actionability directly surfacing links to specific advertisement settings, and low actionability pointing to general settings pages (which is reminiscent of companies' current approach to ad controls). Researchers built a Chrome extension that augments Facebook with our experimental ad control interfaces and conducted a between-subjects online experiment with 110 participants. Results showed that entry points within ads or at the feed’s top, and high actionability interfaces, both increased Facebook ad settings’ findability and discoverability, as well as participants' perceived usability of them. High actionability also reduced users' effort in finding ad settings. Participants perceived high and low actionability as equally usable, which shows it is possible to design more actionable ad controls without overwhelming users. The authors conclude by emphasizing the importance of regulation to provide specific and research-informed requirements to companies on how to design usable ad controls.
 

Understanding Frontline Workers’ and Unhoused Individuals’ Perspectives on AI Used in Homeless Services

Authors: Tzu-Sheng Kuo, Jisoo Geum, Jason I. Hong, Kenneth Holstein, Hong Shen, Nev Jones, Haiyi Zhu

🏆 ACM CHI Best Paper Award

Recent years have seen growing adoption of AI-based decision-support systems (ADS) in homeless services, yet we know little about stakeholder desires and concerns surrounding their use. In this work, the study’s authors aim to understand impacted stakeholders’ perspectives on a deployed ADS that prioritizes scarce housing resources. Researchers employed AI lifecycle comicboarding, an adapted version of the comicboarding method, to elicit stakeholder feedback and design ideas across various components of an AI system’s design. They elicited feedback from county workers who operate the ADS daily, service providers whose work is directly impacted by the ADS, and unhoused individuals in the region. The study’s participants shared concerns and design suggestions around the AI system’s overall objective, specific model design choices, dataset selection, and use in deployment. The authors’ findings demonstrate that stakeholders, even without AI knowledge, can provide specific and critical feedback on an AI system’s design and deployment, if empowered to do so.
 

Participation and Division of Labor in User-Driven Algorithm Audits: How Do Everyday Users Work Together to Surface Algorithmic Harms?

Authors: Rena Li, Chelsea Fan, Nora Wai, Hong Shen, Jason I. Hong, Sara Kinglsey, Proteeti Sinha, Jaimie Lee, Motahhare Eslami

Recent years have witnessed an interesting phenomenon in which users come together to interrogate potentially harmful algorithmic behaviors they encounter in their everyday lives. Researchers have started to develop theoretical and empirical understandings of these user-driven audits, with a hope to harness the power of users in detecting harmful machine behaviors. However, little is known about users’ participation and their division of labor in these audits, which are essential to support these collective efforts in the future. Through collecting and analyzing 17,984 tweets from four recent cases of user-driven audits, the study’s authors shed light on patterns of users’ participation and engagement, especially with the top contributors in each case. Researchers also identified the various roles users’ generated content played in these audits, including hypothesizing, data collection, amplification, contextualization, and escalation. The authors discuss implications for designing tools to support user-driven audits and users who labor to raise awareness of algorithm bias.
 

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Authors: Angel Alexander Cabrera, Donald R. Bertucci, Ameet Talwalker, Adam Perer, Erica Fu, Kenneth Holstein, Jason I. Hong

Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. The study’s authors conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, researchers designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, the team found that practitioners were able to reproduce previous manual analyses and discover new systematic failures.
 

Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability

Authors: Avinash Bhat, Grace Hu, Nadia Nahar, Christian Kastner, Austin Coursey, Sixian Li, Shurui Zhou, Jin L.C. Guo

The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, the study’s authors systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Their analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. The researchers then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of their tool towards long-term documentation quality and accountability.
 

ONYX: Assisting Users in Teaching Natural Language Interfaces Through Multi-Modal Interactive Task Learning

Authors: Marcel Ruoff, Alexander Maedche, Brad A. Myers

Users are increasingly empowered to personalize natural language interfaces (NLIs) by teaching how to handle new natural language (NL) inputs. However, the researchers’ formative study found that when teaching new NL inputs, users require assistance in clarifying ambiguities that arise and want insight into which parts of the input the NLI understands. In this paper, the authors introduce ONYX, an intelligent agent that interactively learns new NL inputs by combining NL programming and programming-by-demonstration, also known as multi-modal interactive task learning. To address the aforementioned challenges, ONYX provides suggestions on how ONYX could handle new NL inputs based on previously learned concepts or user-defined procedures, and poses follow-up questions to clarify ambiguities in user demonstrations, using visual and textual aids to clarify the connections. Their evaluation shows that users provided with ONYX’s new features achieved significantly higher accuracy in teaching new NL inputs (median: 93.3%) in contrast to those without (median: 73.3%).