一种分布式计算的空间离群点挖掘算法A spatial outlier mining algorithm based on distributed computing
张卫平;刘纪平;仇阿根;张用川;赵阳阳;
摘要(Abstract):
针对现有空间离群点挖掘算法无法适应大规模空间数据挖掘的需求,该文提出了一种分布式条件下的空间离群点挖掘算法。首先,该文针对集群上分布式计算和存储的特点提出使用空间填充曲线来划分数据集,加速寻找目标点的近似空间最近邻居。其次,使用信息熵的理论来定义空间离群系数,考虑到多维数据中不同属性对离群系数的影响具有差异性,该算法能够自动根据数据原有特点,计算各属性的权重;同时使用反距离权定义空间因素对离群系数的影响。最后,实验结果表明该算法在大规模的空间数据集中挖掘离群点的效率远高于传统算法,离群点的挖掘精度在90%以上。
关键词(KeyWords): 空间离群点;分布式计算;最近邻居;空间离群系数
基金项目(Foundation): 测绘地理信息公益性行业科研专项(201512032,201512027);; 中国测绘科学研究院基本科研业务费项目(7771414)
作者(Author): 张卫平;刘纪平;仇阿根;张用川;赵阳阳;
Email:
DOI: 10.16251/j.cnki.1009-2307.2017.08.016
参考文献(References):
- [1]SHEKHAR S,LU C T,ZHANG P.A unified approach to detecting spatial outliers[J].GeoInformatica,2003,7(2):139-166.
- [2]薛安荣.空间离群点挖掘技术的研究[D].镇江:江苏大学,2008.(XUE Anrong.Study on spatial outlier mining[D].Zhenjiang:Jiangsu University,2008.)
- [3]GUPTA M,GAO J,AGGARWAL C,et al.Outlier detection for temporal data[J].Synthesis Lectures on Data Mining and Knowledge Discovery,2014,5(1):1-129.
- [4]SCHUBERT E,ZIMEK A,KRIEGEL H P.Local outlier detection reconsidered:ageneralized view on locality with applications to spatial,video,and network outlier detection[J].Data Mining and Knowledge Discovery,2014,28(1):190-237.
- [5]VRIES T D,CHAWLA S,HOULE M E.Density-preserving projections for large-scale local anomaly detection[J].Knowledge and Information Systems,2012,32(1):25-52.
- [6]ZHANG Y,HAMM N A,MERATNIA N,et al.Statistics-based outlier detection for wireless sensor networks[J].International Journal of Geographical Information Science,2012,26(8):1373-1392.
- [7]YU D,PING L,LI W.Spatio-temporal outlier detection based on cloud computing[J].JCIS,2014,10(13):5481-5488.
- [8]SCHUBERT E,ZIMEK A,KRIEGEL H P.Fast and scalable outlier detection with approximate nearest neighbor ensembles[M]//Database Systems for Advanced Applications.Cham:Springer,2015:19-36.
- [9]DAVIDSON A,OR A.Optimizing shuffle performance in spark[R].University of California,Berkeley-Department of Electrical Engineering and Computer Sciences,2013.
- [10]KOUFAKOU A,GEORGIOPOULOS M.A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes[J].Data Mining and Knowledge Discovery,2010,20(2):259-289.
- [11]KOUFAKOU A,GEORGIOPOULOS M,ANAGNOSTOPOULOS G C.Detecting outliers in high-dimensional datasets with mixed attributes[C]//International Conference on Data Mining.[S.l.]:[s.n.],2008:427-433.
- [12]OTEY M E,GHOTING A,PARTHASARATHY S.Fast distributed outlier detection in mixed-attribute data sets[J].Data Mining and Knowledge Discovery,2006,12(2/3):203-228.
- [13]WU S,WANG S.Information-theoretic outlier detection for large-scale categorical data[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(3):589-602.