Publications

Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution’s Characteristics

Published in ACL Main 2025, 2025

Well-calibrated model confidence scores can improve the usefulness of text generation models. For example, users can be prompted to review predictions with low confidence scores, to prevent models from returning bad or potentially dangerous predictions. However, confidence metrics are not always well calibrated in text generation. One reason is that in generation, there can be many valid answers, which previous methods do not always account for. Hence, a confident model could distribute its output probability among multiple sequences because they are all valid. We propose task-agnostic confidence metrics suited to generation, which rely solely on the probabilities associated with the model outputs without the need for further fine-tuning or heuristics. Using these, we are able to improve the calibration of BART and Flan-T5 on summarization, translation, and QA datasets.

Authors: Lorenzo Jaime Yu Flores, Ori Ernst, Jackie Chi Kit Cheung

Download Paper

On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization

Published in EACL 2024, 2024

Text summarization and simplification are widely used applications of AI. However, such models are often prone to hallucination, which can result from training models on unaligned data. One of the prominent approaches to address this issue has been Loss Truncation (LT) (Kang and Hashimoto, 2020), an approach to modify the standard log loss to adaptively remove noisy examples during training. However, we find that LT alone yields a considerable number of hallucinated entities on various datasets. We study the behavior of the underlying losses between factual and non-factual examples, to understand and refine the performance of LT. We demonstrate that LT’s performance is limited when the underlying assumption that noisy targets have higher NLL loss is not satisfied, and find that word-level NLL among entities provides better signal for distinguishing factuality. We then leverage this to propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets.

Authors: Lorenzo Flores, Arman Cohan

Download Paper

Medical Text Simplification: Optimizing for Readability with Unlikelihood Training and Reranked Beam Search Decoding

Published in EMNLP 2023 Findings, 2023

Text simplification has emerged as an increasingly useful application of AI for bridging the communication gap in specialized fields such as medicine, where the lexicon is often dominated by technical jargon and complex constructs. Despite notable progress, methods in medical simplification sometimes result in the generated text having lower quality and diversity. In this work, we explore ways to further improve the readability of text simplification in the medical domain. We propose (1) a new unlikelihood loss that encourages generation of simpler terms and (2) a reranked beam search decoding method that optimizes for simplicity, which achieve better performance on readability metrics on three datasets. This study’s findings offer promising avenues for improving text simplification in the medical field.

Authors: Lorenzo Flores, Heyuan Huang, Kejian Shi, Sophie Chheang, Arman Cohan

Download Paper

Caregivers Attitude Detection From Clinical Notes

Published in American Medical Informatics Association (AMIA) 2023 Annual Symposium, 2023

We propose a dataset to identify caregivers’ sentiment from clinical notes, and demonstrate that RoBERTa achieves the best performance on the sentiment classification task.

Authors: Gaetano Manzo, Leo Anthony Celi, Yasmeen Shabazz, Rory Mulcahey, Lorenzo Flores, Dina Demner-Fushman

Download Paper

LOFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

Published in EACL 2023, 2023

LOFT improves faithfulness and diversity of generated summaries from tabular data by enforcing logic forms in the generation process.

Authors: Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Flores, Dragomir Radev

Download Paper

Revisiting the Effectiveness of Automatic N-Gram Rule Generation for Spelling Normalization in Filipino

Published in EMNLP 2022, Workshop on Simple and Efficient Natural Language Processing, 2022

We explore a spelling/slang correction task in Filipino on a curated dataset, and demonstrate that an n-gram model can outperform augmented deep learning methods and Google Translate’s spelling correction feature. The n-gram model has the benefit of (1) requiring little training time and compute power (it “trains” in a second on a CPU!) and (2) is inherently interpretable, allowing users to troubleshoot the model.

Authors: Lorenzo Flores, Dragomir Radev

Download Paper

R2D2: Robust Data-to-Text with Replacement Detection

Published in EMNLP 2022, 2022

We tackle the problem of summarization using tabular data, and improve the faithfulness of the generated summaries using a discriminator objective and unlikelihood loss.

Authors: Linyong Nan, Lorenzo Flores, Yilun Zhao, Yixin Liu, Luke Benson, Weijin Zou, Dragomir Radev

Download Paper

An Adversarial Benchmark for Fake News Detection Models

Published in AAAI 2022, Workshop on Adversarial Machine Learning and Beyond, 2022

We demonstrate that fake news classification models are brittle: they can achieve great performance on fake news classification benchmarks, while also failing on adversarial examples.

Authors: Lorenzo Flores, Yiding Hao

Download Paper

Optimizing health facility location for universal health care: A case study from the Philippines

Published in PLOS One, 2021

We propose heuristics for optimizing where to build healthcare facilities as to maximize population coverage and reduce overlap between facilities; we demonstrate how this can be computed using open source data for areas in the Philippines.

Authors: Lorenzo Flores, Ramon Tonato, Gabrielle dela Paz, Valerie Ulep

Download Paper

Interpretable Poverty Mapping using Social Media Data, Satellite Images, and Geospatial Information

Published in NeurIPS 2020, Workshop on Machine Learning for the Developing World, 2020

We use open source data to predict poverty indices at a granular level across the Philippines and beat previous benchmarks on the task.

Authors: Chiara Ledesma, Oshean Garonita, Lorenzo Flores, Isabelle Tingzon, Danielle Dalisay

Download Paper

Lorenzo Flores (Lj)

Publications