CyLab student seminar: Chi Zhang
April 12, 2022
12:00 p.m. ET
Contact Brigette Bernagozzi at bbernago@andrew.cmu.edu for location details
April 12, 2022
12:00 p.m. ET
Contact Brigette Bernagozzi at bbernago@andrew.cmu.edu for location details
Please note that CyLab seminars are closed to the public and open to CyLab partners and Carnegie Mellon University faculty, students and staff.
Speaker: Chi Zhang, ECE Ph.D. Student
Title: Degradation Attacks on Certifiably Robust Neural Networks
Deep neural networks have become increasingly effective at many difficult machine-learning tasks, including visual and speech recognition problems. Unfortunately, many neural networks are vulnerable to adversarial examples. An adversarial example for a neural classifier is the result of applying small modifications to a correctly classified valid input such that the modified input is classified incorrectly. Certifiably robust classifiers offer the most rigorous solution to this problem and provably protect models against adversarial attacks. These classifiers are constructed by composing a standard classifier with a certified run-time defense. The defenses aim to detect adversarial examples during model evaluation by checking if the model is epsilon-locally robust at the evaluated input. If the check fails, the input is flagged as (potentially) adversarial, and the model rejects the input. This ensures that the model is only ever used for prediction at inputs where the adversary cannot cause a misclassification.
We show through examples and experiments that even complete defenses are inherently over-cautious. Specifically, those defenses flag inputs for which local robustness checks fail, but that are not adversarial, i.e., they are classified consistently with all valid inputs within a distance of epsilon. As a result, while a norm-bounded adversary cannot change the classification of input, it can use norm-bounded changes to degrade the utility of certifiably robust networks by forcing them to reject otherwise correctly classifiable inputs. The speaker will empirically demonstrate the efficacy of such attacks against state-of-the-art certifiable defenses. He also will discuss possible defenses against degradation attacks and evaluate the one based on training models with double the radius to be enforced by the certified run-time defense.
Note: This is a practice talk for the ECE Ph.D. qualifying examination.