Please use this identifier to cite or link to this item:
Title: Renewable Huber estimation method for streaming datasets
Authors: Jiang, R
Liang, L.
Yu, K
Keywords: high-dimensional estimation;Huber loss;online updating;streaming data
Issue Date: 23-Feb-2024
Publisher: Institute of Mathematical Statistics on behalf of the Bernoulli Society for Mathematical Statistics and Probability
Citation: Jiang, R. Liang, L. and Yu, K. (2024) 'Renewable Huber estimation method for streaming datasets', Electronic Journal of Statistics, 0 (accepted, in press), pp. 674 - 705. doi: 10.1214/24-EJS2223.
Abstract: Streaming data refers to a data collection scheme where observations arrive sequentially and perpetually over time, making it challenging to fit into computer memory for statistical analysis. The ordinary least squares estimate for linear regression is sensitive to heavy-tailed errors and outliers, which are commonly encountered in applications. In this case, the Huber loss function is a useful criterion for robust regression. In this paper, we propose robust regression estimation and variable selection for streaming datasets. Unlike the renewable estimation generalized linear regression for streaming datasets, however, the Huber loss function is only first-order differentiable, which poses challenges to renewable estimation in both computation and theoretical development. To address the challenge, we introduce a new smoothed version of the Huber first derivative, which admits a fast and scalable algorithm to perform optimization for streaming data sets and achieves the best fitting of Huber function among different versions. Theoretically, the proposed statistics are shown to have the same asymptotic properties as the standard version computed on an entire data stream with the data batches pooled into one data set, without additional condition. The proposed methods are illustrated using current data and the summary statistics of historical data. Both simulations and real data analysis are conducted to illustrate the finite sample performance of the proposed methods.
Description: MSC2020 subject classifications: Primary 60G08; secondary 62G20.
Other Identifiers: ORCiD: Keming Yu
Appears in Collections:Dept of Mathematics Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdf427.97 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons