[关键词]
[摘要]
共享单车系统日益普及,积累了海量的出行轨迹数据.在共享单车系统中,用户的借车和还车行为是随机的,且受天气、时间等动态因素影响,使得共享单车调度不平衡,影响单车用户体验,并给运营商造成巨大经济损失.提出了新型基于站点聚类的共享单车需求预测算法,通过构建单车转移网络计算站点活跃度,充分考虑站点地理位置和单车转移模式因素,基于数据场聚类思想,将距离相近和用车模式相似的站点聚合到一个聚簇中,给出最佳簇中心个数求取方法.充分分析时间和天气因素对站点单车需求的影响,利用皮尔逊相关系数,从真实天气数据中选择相关性最大的天气特征,结合历史聚簇内单车需求量,将其转化为三维向量,利用多特征长短时记忆深度神经网络LSTM (long short-term memory)对向量内的特征信息进行学习和训练,以30分钟为长时间间隔,对每个聚簇内的单车需求量进行预测分析.与传统机器学习算法和当前主流方法进行对比,实验结果表明,所提单车需求模型预测性能得到显著提升.
[Key word]
[Abstract]
Bike-sharing system is becoming more and more popular and there accumulates a large volume of trajectory data. In the bike-sharing system, the borrowing and returning behavior of users are arbitrary. In addition, bike-sharing system will be affected by weather, time period, and other dynamic factors, which makes shared bike scheduling unbalanced, affects user’s experience, and causes huge economic losses to operators. A novel shared-bike demand prediction model based on station clustering is proposed, the activeness of stations is calculated by constructing a bike transformation network. The geographical location of stations and the bike transmission patterns are taken into full consideration, and the stations with near distances and transformation patterns are aggregated into a cluster based on the idea of data field clustering. In addition, a method for computing the optimal number of cluster centers is presented. The influence of time and weather factors on bike demand is fully analyzed and the Pearson correlation coefficient is used to choose the most relevant weather features from the real weather data and transformed into a three-dimensional vector by taking into consideration the historical demand for bicycles in the cluster. In addition, long short-term memory (LSTM) neural network with multiple features is employed to learn and train the feature information in the vector, and the bike demand in each cluster is predicted and analyzed every thirty minutes. When compared with the traditional machine learning algorithms and the state-of-the-art methods, the results show that the prediction performance of the proposed model has been significantly improved.
[中图分类号]
[基金项目]
国家自然科学基金(61772091,61802035,61962006,62072311,U1802271,U2001212);四川省科技计划(2021JDJQ0021,2020YFG0153,20YYJC2785,2019YFS0067,2020YJ0481,2020YFS0466,2020YJ0430,2020YDR0164);CCF-华为数据库创新研究计划(CCF-HuaweiDBIR2020004A);广西自然科学基金(2018GXNSFDA138005)