Hi! I'm Kai-Cheng Yang (杨凯程), the pronunciation is KY-cheng YAHNG. I also go by Kevin.
I'm a third year Ph.D student in Informatics at School of Informatics, Computing and Engineering in Indiana University Bloomington. I mainly work with Filippo Menczer, Yong-Yeol Ahn and Brea L. Perry. Check out the Projects and Publications sections for what I have been working on.
Before joining the Ph.D program at IU, I received my bachelor and master degree in theoretical physics from Lanzhou University in China.
Botometer is a machine learning tool that extracts over 1000 different features from a Twitter account and evaluates its likelihood of being social bot. Currently Botometer is handling over 250,000 requests every day and serves as the foundation for many researches.
Contribution: maintaining, training data annotation and model retraining
BotometerLite is a light version Botometer. Using only a small part of the features from Botometer, BotometerLite is 200x faster. With novel evaluation system and model selection method, BotometerLite is able to achieve comparable performance with Botometer. Because of the simplified design, it becomes possible to interpret BotometerLite's results.
BEV is a tool that visualizes the activity of likely bots on Twitter around the 2018 US midterm elections. It allows to explore how active bots are on a daily basis in efforts to influence online discourse about the elections. It also shows what topics are being targeted by likely bots.
BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. Equipped with BotometerLite and newly developed algorithms, BotSlayer is able to detect coordinated amplification from bots in real time. With some simple configuration, everyone can have a customized instance running in the could. BotSlayer is under alpha test now.
So far, studies of social bots have largely been conducted in computational perspectives. How social media users perceive social bots and a series of related questions remain unclear. In this human subject research project we use experimental design to understand social media users' perception towards social bots. We also characterize the effect of human biases on the efficacy of social bot detection task.
Hoaxy is a tool that visualizes the spread of fake news and related fact checking articles on Twitter. With the incorporation of Botometer, Hoaxy can also visualize the bot-like activities involved in the spread of the articles.
Contribution: maintaining, developing API for Hoaxy to fetch Botometer scores
Impersonators are a type of bad actors who attempt to deceptively influence political communication by exploiting features designed to let social media users manage their public personas. Deleting tweets and editing profiles are common and legitimate practices for social media users, but we show that impersonators perform these actions in a systematic, coordinated, and deceptive manner. This study exposes a conflict between a user’s right to remove their content and the need to hold abusers and platforms accountable for healthy online communication.
Traditional methods for identifying drug seeking behavior focus on each patient's medical history individually. Typical criteria involves the number of different prescriber, visits of different pharmacies and total drug dose in certain time period. Our analysis shows such type of methods has become less useful as the patients are intentionally altering their behaviors to avoid being spotted by those methods. This project tends to utilize social network analysis to identify drug seeking behavior which has proven to be very effective and harder to trick.
Doctor shoppers are people that visit multiple physicians to obtain multiple prescriptions of controlled substances. The opioid doctor shoppers have been found to be more likely to overdose leading to the ever severer opioid crisis in US. The project intends to apply computational methods to over 9 years of longitudinal medical records from a large group of patients to characterize the geographic related behaviors of doctor shoppers.
Word2vec is applied to large scale of medical records to find a distributed representation of the diagnoses. The embedding can effectively reduce the dimensions needed to encode all the diagnoses therefore serves as a preprocessing step for other machine learning tasks. Besides, the embedding itself can reveal interesting relationship between diagnoses.
Multipartite viruses have multiple segmented genomes that are packaged in separate virus particles. This peculiar genetic organization makes multipartite viruses the most strange viruses and has drawn great attention from academia recently. Yet, a solid understanding of why such seemingly disadvantageous strategy has emerged is still lacking. This project extends the SIR model with multipartite mechanism and studies the spread of multipartite viruses on networks. Analysis reveals the sudden outbreak nature of multipartite viruses and finds that counterintuitively multipartite viruses favor static networks over dynamical networks. The results may may explain the prevalence of multipartite viruses in plants.