A weighted quantile regression approach for complex high-dimensional heterogeneous data

Xiong, W; Pan, H; Yu, K; Tian, M

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30756

Title:	A weighted quantile regression approach for complex high-dimensional heterogeneous data
Authors:	Xiong, W Pan, H Yu, K Tian, M
Keywords:	mode;optimal quantile level;weighted quantile regression;partially linear additive model;variable selection;众数;最优分位水平;加权分位回归;部分线性可加模型;变量选择
Issue Date:	4-Dec-2023
Publisher:	Science China Press
Citation:	Xiong, W. et al. (2024) 'A weighted quantile regression approach for complex high-dimensional hetero geneous data [in Chinese]', Scientia Sinica Mathematica, 54 (2), pp. 181 - 210. doi: 10.1360/SSM-2022-0080.
Abstract:	With the development of digital intelligent technology, many problems arise, such as information flooding, computing power expansion, data heterogeneity, and complexity, which bring great challenges to the theories of data modeling. To this end, from the perspective of the mode, this paper proposes the concept of the optimal quantile level and mode-based weighted quantile regression (MWQR) to maximize the utilization of sample information. The proposed MWQR method is superior to the existing methods in the following aspects: (1) The proposed method is suitable for complex and high-dimensional heterogeneous data, the robustness can be ensured even when the error term is thick-tailed and skewed; (2) The MWQR method solves the 问题 of subjectivity in choosing quantile levels in quantile regression; (3) By assigning different weights to different quantile levels, estimation efficiency is greatly improved and computation time is reduced; (4) The entire conditional distribution of response variables can be investigated effectively in the MWQR method. Considering the advantages of the MWQR method, we apply it to partial linear additive models and propose two algorithms for robust coefficient estimation and variable selection, the consistency and asymptotic distribution of estimators are also demonstrated. The numerical simulation results and empirical study of the “implicit guarantee" of urban investment bonds and plasma β-carotene concentration problems further show that the proposed method can well explore the intrinsic structure of data, significantly improves computational efficiency, and has broad applicability. 摘要随着数字化智能技术的发展,信息泛滥、算力膨胀、数据异构性及混杂性等问题频现,给数据建模的理论方法带来极大挑战.本文从众数角度出发,提出最优分位水平概念和基于众数的加权分位回归(mode-based weighted quantile regression, MWQR) 方法, 以求最大程度利用样本信息. 与已有估计方法相比, MWQR方法具有如下优势: (1) 适用于复杂高维异质性数据,在误差分布厚尾和偏态时仍能保证稳健性; (2) 解决了分位回归建模中分位水平主观选择的问题; (3)通过赋予不同分位水平不同权重, 极大提升估计效率,减少运算时间; (4) 有效探测响应变量的条件分布.鉴于MWQR方法的优势, 本文进一步将其应用于部分线性可加模型, 提出两种算法进行变量选择和系数估计,并探究理论性质. 数值模拟及城投债“隐性担保”和血浆β-胡萝卜素浓度两组实际数据分析,表明该方法能很好地挖掘数据内蕴结构,显著提高运算效率,具有广泛的应用价值.
Description:	MSC (2020) 主题分类 62G05, 62P10, 62P20.
URI:	https://bura.brunel.ac.uk/handle/2438/30756
DOI:	https://doi.org/10.1360/SSM-2022-0080
ISSN:	1674-7216
Other Identifiers:	ORCiD: Keming Yu https://orcid.org/0000-0001-6341-8402
Appears in Collections:	Dept of Mathematics Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © Science China Press. All rights reserved. Free Content at https://doi.org/10.1360/SSM-2022-0080	659.86 kB	Adobe PDF	View/Open

Show full item record