What do I do?

Broadly speaking, my primary research interests involve biomedical and social applications of natural language processing.

Public-Health, Social Media, and Bias

Social media platforms, such as Twitter, are used for many health-related applications, including, but not limited to, the early detection of disease outbreaks, monitoring adverse drug reactions, behavioral risk surveillance (such as monitoring smoking/drug use), and mining individuals mental and physical health. While many systems have achieved high overall accuracy, it is not enough to decide whether to deploy the systems into production or to use the predictions for public health-related policy development. If a system is evaluated on the racial majority, it may achieve 90% accuracy. However, the performance on underrepresented groups may be much lower. Unfortunately, evaluating fairness is non-trivial. Many fairness metrics require the demographics to be known, or at least, inferred. It is hard to estimate fairness for groups that do not appear in the training dataset. Moreover, many datasets are not large enough to contain every minority group. Hence, much of my recent work has focused on socia media data, public health, and associated issues of bias and fairness.

Relevant Publications

  1. Lwowski, Brandon, and Anthony Rios. “The risk of racial bias while tracking influenza-related content on social media using machine learning.” Journal of the American Medical Informatics Association 28.4 (2021): 839-849.
  2. Pritom, Mir Mehedi A., Rosana Montanez Rodriguez, Asad Ali Khan, Sebastian A. Nugroho, Esra Alrashydah, Beatrice N. Ruiz, and Anthony Rios. “Case Study on Detecting COVID-19 Health-Related Misinformation in Social Media.” arXiv e-prints (2021): arXiv-2106.
  3. Rios, Anthony. “FuzzE: Fuzzy fairness evaluation of offensive language classifiers on African-American English.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 01. 2020.
  4. Rios, Anthony, and Brandon Lwowski. “An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings.” Proceedings of the 28th International Conference on Computational Linguistics. 2020.
  5. Rios, Anthony, Reenam Joshi, and Hejin Shin. “Quantifying 60 years of gender bias in biomedical research with word embeddings.” Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. 2020.


CRII: SCH: A Computational Framework for Fair Public Health-Related Decisions.
National Science Foundation. CISE: IIS. 04/01/2020-03/31/2022. $174,797
PI: Anthony Rios

Security and Public Saftey Applications of NLP

Relevant Publications

  1. Bhatt, Paras and Anthony Rios. “Detecting Bot-Generated Text by Characterizing Linguistic Accommodation in Human-Bot Interactions.” ACL/IJCNLP (Findings) 2021: 3235-3247
  2. Nasrin, Nayeema, Kim-Kwang Raymond Choo, Myung Ko, and Anthony Rios. “How Many Users Are Enough? Exploring Semi-Supervision and Stylometric Features to Uncover a Russian Troll Farm.” In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pp. 20-30. 2019.


Machine Learning-centric Cyber Threat Intelligence and Hunting for IoT Systems
National Security Agency. Cybersecurity Research Innovation Grant. 08/01/2021-07/31/2023. $464,153
PI: Anthony Rios Co-PI(s): Glenn Dietrich and Raymond Choo

Biomedical NLP Applications

Language shapes the world we live in. Information is shared among individuals through verbal and written communication, including biomedical information. Doctors write notes describing patient symptoms, medical history, and diagnoses. The notes are used annotated by hospitals for billing purposes, e.g., medical coding. Scientists write research articles creating new information, including, but not limited to, new drug-drug, drug-gene, and gene-gene interactions. On social media, such as Facebook and Twitter, users describe life events giving insight on the users mental and physical health, indicating possible adverse drug events, depression, or PTSD. I develop methods that can extract biomedical information from many textual data sources.

  1. Rios, Anthony, and Ramakanth Kavuluru. “Neural transfer learning for assigning diagnosis codes to EMRs.” Artificial intelligence in medicine 96 (2019): 116-122.
  2. Rios, Anthony, Eric B. Durbin, Isaac Hands, Susanne M. Arnold, Darshil Shah, Stephen M. Schwartz, Bernardo HL Goulart, and Ramakanth Kavuluru. “Cross-registry neural domain adaptation to extract mutational test results from pathology reports.” Journal of biomedical informatics 97 (2019): 103267.
  3. Rios, Anthony, and Ramakanth Kavuluru. “EMR coding with semi–parametric multi–head matching networks.” Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. Vol. 2018.
  4. Rios, Anthony, and Ramakanth Kavuluru. “Few-shot and zero-shot multi-label learning for structured label spaces.” Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing. Vol. 2018.
  5. Rios, Anthony, and Ramakanth Kavuluru. “Convolutional neural networks for biomedical text classification: application in indexing biomedical articles.” Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. 2015.