CyLab faculty, students present research at the 33rd USENIX Security Symposium

Michael Cunningham

Aug 7, 2024

Carnegie Mellon faculty and students presented on a wide range of topics at the 33rd USENIX Security Symposium. Held in Philadelphia on August 14-16, the event brought together experts from around the world, who highlighted the latest advances in the security and privacy of computer systems and networks.

Here, we’ve compiled a list of the eight papers co-authored by CyLab Security and Privacy Institute members that were presented at the event.

A Taxonomy of C Decompiler Fidelity Issues

Authors: Luke Dramko, Jeremy Lacomis, Edward J. Schwartz, Bogdan Vasilescu, and Claire Le Goues, Carnegie Mellon University

Abstract: Decompilation is an important part of analyzing threats in computer security. Unfortunately, decompiled code contains less information than the corresponding original source code, which makes understanding it more difficult for the reverse engineers who manually perform threat analysis. Thus, the fidelity of decompiled code to the original source code matters, as it can influence reverse engineers' productivity. There is some existing work in predicting some of the missing information using statistical methods, but these focus largely on variable names and variable types. In this work, we more holistically evaluate decompiler output from C-language executables and use our findings to inform directions for future decompiler development. More specifically, we use open-coding techniques to identify defects in decompiled code beyond missing names and types. To ensure that our study is robust, we compare and evaluate four different decompilers. Using thematic analysis, we build a taxonomy of decompiler defects. Using this taxonomy to reason about classes of issues, we suggest specific approaches that can be used to mitigate fidelity issues in decompiled code.

The Impact of Exposed Passwords on Honeyword Efficacy

Authors: Zonghao Huang, Duke University; Lujo Bauer, Carnegie Mellon University; Michael K. Reiter, Duke University

Abstract: Honeywords are decoy passwords that can be added to a credential database; if a login attempt uses a honeyword, this indicates that the site's credential database has been leaked. In this paper we explore the basic requirements for honeywords to be effective, in a threat model where the attacker knows passwords for the same users at other sites. First, we show that for user-chosen (vs. algorithmically generated, i.e., by a password manager) passwords, existing honeyword-generation algorithms do not simultaneously achieve false-positive and false-negative rates near their ideals of ≈0 and ≈ 1/1+n, respectively, in this threat model, where n is the number of honeywords per account. Second, we show that for users leveraging algorithmically generated passwords, state-of-the-art methods for honeyword generation will produce honeywords that are not sufficiently deceptive, yielding many false negatives. Instead, we find that only a honeyword-generation algorithm that uses the same password generator as the user can provide deceptive honeywords in this case. However, when the defender's ability to infer the generator from the (one) account password is less accurate than the attacker's ability to infer the generator from potentially many, this deception can again wane. Taken together, our results provide a cautionary note for the state of honeyword research and pose new challenges to the field.

Bending microarchitectural weird machines towards practicality

Authors: Ping-Lun Wang, Riccardo Paccagnella, Riad S. Wahby, and Fraser Brown, Carnegie Mellon University

Abstract: A large body of work has demonstrated attacks that rely on the difference between CPUs' nominal instruction set architectures and their actual (microarchitectural) implementations. Most of these attacks, like Spectre, bypass the CPU's data-protection boundaries. A recent line of work considers a different primitive, called a microarchitectural weird machine (µWM), that can execute computations almost entirely using microarchitectural side effects. While µWMs would seem to be an extremely powerful tool, e.g., for obfuscating malware, thus far they have seen very limited application. This is because prior µWMs must be hand-crafted by experts, and even then have trouble reliably executing complex computations.

In this work, we show that µWMs are a practical, near-term threat. First, we design a new µWM architecture, Flexo, that improves performance by 1–2 orders of magnitude and reduces circuit size by 75–87%, dramatically improving the applicability of µWMs to complex computation. Second, we build the first compiler from a high-level language to µWMs, letting experts craft automatic optimizations and non-experts construct state-of-the-art obfuscated computations. Finally, we demonstrate the practicality of our approach by extending the popular UPX packer to encrypt its payload and use a µWM for decryption, frustrating malware analysis.

GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers

Authors: Boru Chen, University of Illinois Urbana-Champaign; Yingchen Wang, University of Texas at Austin; Pradyumna Shome, Georgia Institute of Technology; Christopher Fletcher, University of California, Berkeley; David Kohlbrenner, University of Washington; Riccardo Paccagnella, Carnegie Mellon University; Daniel Genkin, Georgia Institute of Technology

Abstract: Microarchitectural side-channel attacks have shaken the foundations of modern processor design. The cornerstone defense against these attacks has been to ensure that security-critical programs do not use secret-dependent data as addresses. Put simply: do not pass secrets as addresses to, e.g., data memory instructions. Yet, the discovery of data memory-dependent prefetchers (DMPs)—which turn program data into addresses directly from within the memory system—calls into question whether this approach will continue to remain secure.

This paper shows that the security threat from DMPs is significantly worse than previously thought and demonstrates the first end-to-end attacks on security-critical software using the Apple m-series DMP. Undergirding our attacks is a new understanding of how DMPs behave which shows, among other things, that the Apple DMP will activate on behalf of any victim program and attempt to "leak" any cached data that resembles a pointer. From this understanding, we design a new type of chosen-input attack that uses the DMP to perform end-to-end key extraction on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).

Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants

Authors: Yuanyuan Feng, University of Vermont; Abhilasha Ravichander, Allen Institute for Artificial Intelligence; Yaxing Yao, Virginia Tech; Shikun Zhang, Carnegie Mellon University, Rex Chen, Carnegie Mellon University; Shomir Wilson, Pennsylvania State University; Norman Sadeh, Carnegie Mellon University

Abstract: Understanding and managing data privacy in the digital world can be challenging for sighted users, let alone blind and low-vision (BLV) users. There is limited research on how BLV users, who have special accessibility needs, navigate data privacy, and how potential privacy tools could assist them. We conducted an in-depth qualitative study with 21 US BLV participants to understand their data privacy risk perception and mitigation, as well as their information behaviors related to data privacy. We also explored BLV users' attitudes towards potential privacy question answering (Q&A) assistants that enable them to better navigate data privacy information. We found that BLV users face heightened security and privacy risks, but their risk mitigation is often insufficient. They do not necessarily seek data privacy information but clearly recognize the benefits of a potential privacy Q&A assistant. They also expect privacy Q&A assistants to possess cross-platform compatibility, support multi-modality, and demonstrate robust functionality. Our study sheds light on BLV users' expectations when it comes to usability, accessibility, trust and equity issues regarding digital data privacy.

GFWeb: Measuring the Great Firewall's Web Censorship at Scale

Authors: Nguyen Phong Hoang, University of British Columbia and University of Chicago; Jakub Dalek and Masashi Crete-Nishihata, Citizen Lab - University of Toronto; Nicolas Christin, Carnegie Mellon University; Vinod Yegneswaran, SRI International; Michalis Polychronakis, Stony Brook University; Nick Feamster, University of Chicago

Abstract: Censorship systems such as the Great Firewall (GFW) have been continuously refined to enhance their filtering capabilities. However, most prior studies, and in particular the GFW, have been limited in scope and conducted over short time periods, leading to gaps in our understanding of the GFW's evolving Web censorship mechanisms over time. We introduce GFWeb, a novel system designed to discover domain blocklists used by the GFW for censoring Web access. GFWeb exploits GFW's bidirectional and loss-tolerant blocking behavior to enable testing hundreds of millions of domains on a monthly basis, thereby facilitating large-scale longitudinal measurement of HTTP and HTTPS blocking mechanisms.

Over the course of 20 months, GFWeb has tested a total of 1.02 billion domains, and detected 943K and 55K pay-level domains censored by the GFW's HTTP and HTTPS filters, respectively. To the best of our knowledge, our study represents the most extensive set of domains censored by the GFW ever discovered to date, many of which have never been detected by prior systems. Analyzing the longitudinal dataset collected by GFWeb, we observe that the GFW has been upgraded to mitigate several issues previously identified by the research community, including overblocking and failure in reassembling fragmented packets. More importantly, we discover that the GFW's bidirectional blocking is not symmetric as previously thought, i.e., it can only be triggered by certain domains when probed from inside the country. We discuss the implications of our work on existing censorship measurement and circumvention efforts. We hope insights gained from our study can help inform future research, especially in monitoring censorship and developing new evasion tools.

Does Online Anonymous Market Vendor Reputation Matter?

Authors: Alejandro Cuevas and Nicolas Christin, Carnegie Mellon University

Abstract: Reputation is crucial for trust in underground markets such as online anonymous marketplaces (OAMs), where there is little recourse against unscrupulous vendors. These markets rely on eBay-like feedback scores and forum reviews as reputation signals to ensure market safety, driving away dishonest vendors and flagging low-quality or dangerous products. Despite their importance, there has been scant work exploring the correlation (or lack thereof) between reputation signals and vendor success. To fill this gap, we study vendor success from two angles: (i) longevity and (ii) future financial success, by studying eight OAMs from 2011 to 2023. We complement market data with social network features extracted from a OAM forum, and by qualitatively coding reputation signals from over 15,000 posts and comments across two subreddits. Using survival analysis techniques and simple Random Forest models, we show that feedback scores (including those imported from other markets) can explain vendors' longevity, but fail to predict vendor disappearance in the short term. Further, feedback scores are not the main predictors of future financial success. Rather, vendors who quickly generate revenue when they start on a market typically end up acquiring the most wealth overall. We show that our models generalize across different markets and time periods spanning over a decade. Our findings provide empirical insights into early identification of potential high-scale vendors, effectiveness of "reputation poisoning" strategies, and how reputation systems could contribute to harm reduction in OAMs. We find in particular that, despite their coarseness, existing reputation signals are useful to identify potentially dishonest sellers, and highlight some possible improvements.

“I Don't Know If We're Doing Good. I Don't Know If We're Doing Bad”: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products

Authors: Hao-Ping (Hank) Lee, Carnegie Mellon University; Lan Gao, Georgia Institute of Technology; Stephanie Yang, Georgia Institute of Technology; Jodi Forlizzi, Carnegie Mellon University; Sauvik Das, Carnegie Mellon University

Abstract: How do practitioners who develop consumer AI products scope, motivate, and conduct privacy work? Respecting privacy is a key principle for developing ethical, human-centered AI systems, but we cannot hope to better support practitioners without answers to that question. We interviewed 35 industry AI practitioners to bridge that gap. We found that practitioners viewed privacy as actions taken against pre-defined intrusions that can be exacerbated by the capabilities and requirements of AI, but few were aware of AI-specific privacy intrusions documented in prior literature. We found that their privacy work was rigidly defined and situated, guided by compliance with privacy regulations and policies, and generally demotivated beyond meeting minimum requirements. Finally, we found that the methods, tools, and resources they used in their privacy work generally did not help address the unique privacy risks introduced or exacerbated by their use of AI in their products. Collectively, these findings reveal the need and opportunity to create tools, resources, and support structures to improve practitioners' awareness of AI-specific privacy risks, motivations to do AI privacy work, and ability to address privacy harms introduced or exacerbated by their use of AI in consumer products.