1. Start with Paul Brians' classic website "Common Errors in English."
2. Identify some potential "semantic minimal pairs"(a term I thought I just invented but apparently did not) that seem interesting, like "good-by / good-bye" or "gray/grey" or some more obscure and/or clearly "errory" ones like "easedrop" vs "eavesdrop".
(Note: this is pretty subjective and you have to already know a lot about English to really choose interesting ones.)
3. Find a corpus courtesy of Brian Lee's list of "Freely Accessible Online Corpora of English"
4. Learn how to search it, use it, etc.
5. search for the stuff you found in #2.
6. Somehow do some kind of cool analysis of what you find.
(#6 needs some work.)