Skip to main content

Augmenting the Wisdom of Crowds for Anti-Phishing

Researchers: Jason Hong, Bryan Pendleton

Cross Cutting Thrusts: Usable Privacy and Security

Abstract

Scope: PhishTank is an open community where volunteers submit possible phishing URLs, with other volunteers voting on URLs. If four other people vote that a site is a phish, then that URL is added to a blacklist. While PhishTank is effective, our observation is that there is still some room for improvement, in terms of coverage, speed, and accuracy. As of September 2010, PhishTank has had over 4 million votes, which implies a theoretical maximum of 1 million phishing sites identified, but actually only has about 600,000 phish verified. The median time to verify is about 4.5 hours once submitted, which still leaves a window of attacks open. We are examining how we can use a variety of information retrieval and machine learning techniques to facilitate this process. For example, by clustering similar phish together, people can be voting on a group of phish rather than one at a time. By including our phish detection algorithms, we might be able to reduce the number of votes required. By weighting votes based on the voter’s reputation, we can again reduce the number of votes required while not impacting accuracy. We are pursuing a number of avenues along these lines.

Outcomes: Since phish are generated from toolkits, multiple copies of a site are likely to appear. As such, we have implemented a way of clustering similar phish together to improve people’s ability to identify phish, as well as to improve efficiency. We are currently evaluating the effectiveness of these techniques.