Comparative study of unsupervised anomaly detection methods on imbalanced time series data

Authors

  • Riza Aulia Hanifa Ahmad Dahlan University, Indonesia
  • Aris Thobirin Ahmad Dahlan University, Indonesia
  • Sugiyarto Surono Ahmad Dahlan University, Indonesia

DOI:

https://doi.org/10.21107/kursor.v13i2.431

Keywords:

Anomaly detection , Autoencoder, Imbalance data, Isolation Forest, K-Means

Abstract

Anomaly detection in time series data is essential, especially when dealing with imbalanced datasets such as air quality records. This study addresses the challenge of identifying point anomalies rare and extreme pollution levels within a highly imbalanced dataset. Failing to detect such anomalies may lead to delayed environmental interventions and poor public health responses. To solve this, we propose a comparative analysis of three unsupervised learning methods: K-means clustering, Isolation Forest (IForest), and Autoencoder (AE), including its LSTM variant. These algorithms are applied to monthly air quality data collected in 2023 from 2,110 cities across Asia. The models are evaluated using Area Under the Curve (AUC), Precision, Recall, and F1-score to assess their effectiveness in detecting anomalies. Results indicate that the Autoencoder and Autoencoder LSTM outperform the others with an AUC of 98.23%, followed by K-means (97.78%) and IForest (96.01%). The Autoencoder’s reconstruction capability makes it highly effective for capturing complex temporal patterns. K-means and IForest also show strong results, offering efficient and interpretable solutions for structured data. This research highlights the potential of unsupervised anomaly detection techniques for environmental monitoring and provides practical insights into handling imbalanced time series data.

Downloads

Download data is not yet available.

References

REFERENCES

[1] A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano, “A Review on Outlier/Anomaly Detection in Time Series Data,” ACM Comput. Surv., vol. 54, no. 3, pp. 1–33, Apr. 2022, doi: 10.1145/3444690. https://doi.org/10.1145/3444690

[2] S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering – A decade review,” Inf. Syst., vol. 53, pp. 16–38, Oct. 2015, doi: 10.1016/j.is.2015.04.007. https://doi.org/10.1016/j.is.2015.04.007

[3] H. Rahadian, S. Bandong, A. Widyotriatmo, and E. Joelianto, “Image encoding selection based on Pearson correlation coefficient for time series anomaly detection,” Alex. Eng. J., vol. 82, pp. 304–322, Nov. 2023, https://doi.org/10.1016/j.aej.2023.09.070 .

[4] M. B. Shrestha and G. R. Bhatta, “Selecting appropriate methodological framework for time series data analysis,” J. Finance Data Sci., vol. 4, no. 2, pp. 71–89, Jun. 2018, https://doi.org/10.1016/j.jfds.2017.11.001

[5] R. J. Hyndman and B. Rostami-Tabar, “Forecasting interrupted time series,” J. Oper. Res. Soc., pp. 1–14, Sep. 2024, https://doi.org/10.1080/01605682.2024.2395315 .

[6] R. Kablaoui, I. Ahmad, S. Abed, and M. Awad, “Network traffic prediction by learning time series as images,” Eng. Sci. Technol. Int. J., vol. 55, p. 101754, Jul. 2024, https://doi.org/10.1016/j.jestch.2024.101754 .

[7] R. Rahman and H. Nurul, “Keamanan Jaringan Kecerdasan Buatan dan Implementasi Solusi Keaman,” Technol. Sci. Insights J., vol. 1, no. 1, pp. 33–36.

[8] E. A. Fadlilah, “Identifikasi Anomali Data Akademik Menggunakan Dbscan Outlier Detection,” Pros. Sains Nas. Dan Teknol., vol. 12, no. 1, pp. 336–342, Nov. 2022, https://doi.org/10.36499/psnst.v12i1.7012 .

[9] G. R. Baihaqi and Mulaab, “LONG SHORT-TERM MEMORY FOR PREDICTION OF WAVE HEIGHT AND WIND SPEED USING PROPHET FOR OUTLIERS,” J. Ilm. Kursor, vol. 12, no. 2, pp. 59–68, Dec. 2023, doi: https://doi.org/10.21107/kursor.v12i2.351 .

[10] M. N. K. Sikder and F. A. Batarseh, “Outlier detection using AI: a survey,” in AI Assurance, Elsevier, 2023, pp. 231–291. https://doi.org/10.1016/B978-0-32-391919-7.00020-2 .

[11] G. Nassreddine, J. Younis, and T. Falahi, “Detecting Data Outliers with Machine Learning,” Al-Salam J. Eng. Technol., vol. 2, no. 2, pp. 152–164, May 2023, https://doi.org/10.55145/ajest.2023.02.02.018 .

[12] M. Čampulová, J. Michálek, P. Mikuška, and D. Bokal, “Nonparametric algorithm for identification of outliers in environmental data,” J. Chemom., vol. 32, no. 5, p. e2997, May 2018, https://doi.org/10.1002/cem.2997 .

[13] D. Liang, J. Wang, W. Zhang, Y. Liu, L. Wang, and X. Zhao, “Tabular Data Anomaly Detection Based on Density Peak Clustering Algorithm,” in 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China: IEEE, Jan. 2022, pp. 16–21. https://doi.org/10.1109/BDICN55575.2022.00011 .

[14] R. Baidya and H. Jeong, “Anomaly Detection in Time Series Data Using Reversible Instance Normalized Anomaly Transformer,” Sensors, vol. 23, no. 22, p. 9272, Nov. 2023, https://doi.org/10.3390/s23229272 .

[15] Q. Liu, P. Boniol, T. Palpanas, and J. Paparrizos, “Time-Series Anomaly Detection: Overview and New Trends,” Proc. VLDB Endow., vol. 17, no. 12, pp. 4229–4232, Aug. 2024, https://doi.org/10.14778/3685800.3685842 .

[16] J. Liu, D. Yang, K. Zhang, H. Gao, and J. Li, “Anomaly and change point detection for time series with concept drift,” World Wide Web, vol. 26, no. 5, pp. 3229–3252, Sep. 2023, https://doi.org/10.1007/s11280-023-01181-z .

[17] Z. Z. Darban, G. I. Webb, S. Pan, C. C. Aggarwal, and M. Salehi, “Deep Learning for Time Series Anomaly Detection: A Survey,” ACM Comput. Surv., vol. 57, no. 1, pp. 1–42, Jan. 2025, https://doi.org/10.1145/3691338 .

[18] A. Priarone, U. Albertin, C. Cena, M. Martini, and M. Chiaberge, “Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition,” Sep. 11, 2024, arXiv: arXiv:2409.07135. https://doi.org/10.1109/ICSRS63046.2024.10927428 .

[19] N. West, T. Schlegl, and J. Deuse, “Unsupervised anomaly detection in unbalanced time series data from screw driving processes using k-means clustering,” Procedia CIRP, vol. 120, pp. 1185–1190, 2023, https://doi.org/10.1016/j.procir.2023.09.146

[20] D. Ribeiro, L. M. Matos, P. Cortez, G. Moreira, and A. Pilastri, “A Comparison of Anomaly Detection Methods for Industrial Screw Tightening,” in Computational Science and Its Applications – ICCSA 2021, vol. 12950, O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, and C. M. Torre, Eds., in Lecture Notes in Computer Science, vol. 12950. , Cham: Springer International Publishing, 2021, pp. 485-500. https://doi.org/10.1016/j.aej.2023.09.070

[21] Y. Wei, J. Jang-Jaccard, W. Xu, F. Sabrina, S. Camtepe, and M. Boulic, “LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data,” Apr. 14, 2022, arXiv: arXiv:2204.06701.

https://doi.org/10.1109/JSEN.2022.3230361

[22] Y. Pratama, E. Sulistianingsih, N. N. Debataraja, and N. Imro’ah, “K-Means Clustering dan Mean Variance Efficient Portfolio dalam Portofolio Saham,” Jambura J. Probab. Stat., vol. 5, no. 1, pp.

24–30, Jun. 2024, https://doi.org/10.37905/jjps.v5i1.20298

[23] I. M. S. Bimantara and I. M. Widiartha, “Optimization of k-means clustering using particle swarm optimization algorithm for grouping traveler reviews data on tripadvisor siteS,” J. Ilm. Kursor, vol. 12, no. 1, pp. 1–10, Jun. 2023, https://doi.org/10.21107/kursor.v12i01.269

[24] S. Wang, “Isolation Forest Anomaly Detection Algorithm Based On Multilevel Sub-subspace Partition,” Int. J. Comput. Sci. Inf. Technol., vol. 4, no. 2, pp. 149–159, Oct. 2024, https://doi.org/10.62051/ijcsit.v4n2.20

[25] W. Skaf and T. Horváth, “Denoising Architecture for Unsupervised Anomaly Detection in Time-Series,” in New Trends in Database and Information Systems, vol. 1652, S. Chiusano, T. Cerquitelli, R. Wrembel, K. Nørvåg, B. Catania, G. Vargas-Solar, and E. Zumpano, Eds., in Communications in Computer and Information Science, vol. 1652. , Cham: Springer International Publishing, 2022, pp. 178–187. https://doi.org/10.1007/978-3-031-15743-1_17

Downloads

Published

2025-12-08

Issue

Section

Articles

Citation Check