Missing values simulation in rainfall time series for evaluating imputation methods

Authors

DOI:

https://doi.org/10.55761/abclima.v30i18.15243

Keywords:

Imputação de dados ausentes, Regressão linear múltipla, Redes neurais artificiais, Hidrologia

Abstract

Missing data in rainfall time series is one of the main problems in hydrological studies. In this regard, gap-filling techniques are an important tool for reconstructing rainfall data sets. This paper aims to compare different gap-filling methods for monthly rainfall time series. As a case study, time series ranging from 1974 to 2004 from meteorological stations of the Cariri region, Ceará, Brazil, were considered. For the imputation of missing values, methods such as the arithmetic average (AA), inverse distance weighting (IDW), regional weighting (RW), multiple linear regression (MLR), and artificial neural networks (ANN) were applied. Simulation of artificially generated missing values was performed using concepts of missing data mechanisms for different missing values rates, namely, 10% and 40%. The performance of the imputation methods was evaluated by error metrics such as the root mean squared error (RMSE) and mean absolute error (MAE). The seasonality of rainfall patterns was also considered. Numerically, the ANN method achieved the lowest RMSE and MAE averages, followed by the MLR, RW, AA, and IDW methods. However, the average values obtained by all methods were similar. The methods evaluated were able to estimate the missing values in the studied time series with good accuracy.

Downloads

Download data is not yet available.

References

AIEB, A.; MADANI, K.; SCARPA, M.; BONACCORSO, B.; LEFSIH, K. A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed, Algeria. Heliyon, v. 5, n. 2, 2019.

ASGHARINIA, S.; PETROSELLI, A. A comparison of statistical methods for evaluating missing data of monitoring wells in the Kazeroun Plain, Fars Province, Iran. Groundwater for Sustainable Development, v. 10, p. 100294, 2020.

AWAD, M.; KHANNA, R. Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, 2015.

AYDILEK, I. B.; ARSLAN, A. A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences, v. 233, p. 25-35, 2013.

BECK, M. W.; BOKDE, N.; ASENCIO-CORTÉS, G.; KULAT, K. R package imputetestbench to compare imputation methods for univariate time series. The R journal, v. 10, n. 1, p. 218, 2018.

BIELENKI JUNIOR, C.; SANTOS, F. M. D.; POVINELLI, S. C. S.; MAUAD, F. F. Alternative methodology to gap filling for generation of monthly rainfall series with GIS approach. RBRH, v. 23, 2018.

BIER, A. A.; FERRAZ, S. E. T. Comparação de metodologias de preenchimento de falhas em dados meteorológicos para estações no Sul do Brasil. Revista Brasileira de Meteorologia, v. 32, p. 215-226, 2017.

BRUBACHER, J. P.; OLIVEIRA, G. G.; GUASSELLI, L. A. Preenchimento de Falhas e Espacialização de Dados Pluviométricos: Desafios e Perspectivas. Revista Brasileira de Meteorologia, v. 35, p. 615-629, 2020.

COGERH. Plano de Monitoramento e Gestão dos Aquíferos da Bacia do Araripe: Estado do Ceará. Fortaleza: Companhia de Gestão dos Recursos Hídricos - COGERH, CE, 2009.

CORREIA, T. P.; DOHLER, R. E.; DAMBROZ, C. S.; BINOTI, D. H. B. Aplicação de redes neurais artificiais no preenchimento de falhas de precipitação mensal na região serrana do Espírito Santo. Geociências (São Paulo), v. 35, n. 4, p. 560-567, 2016.

EISCHEID, J. K.; PASTERIS, P. A.; DIAZ, H. F.; PLANTICO, M. S.; LOTT, N. J. Creating a serially complete, national daily time series of temperature and precipitation for the western United States. Journal of Applied Meteorology, v. 39, n. 9, p. 1580-1591, 2000.

FERNANDEZ, M. N. Preenchimento de falhas em séries temporais. Universidade Federal do Rio Grande – FURG. Curso de Pós-Graduação em Engenharia Oceânica. Dissertação de Mestrado, 2007.

FUNCEME. Fundação Cearense de Meteorologia - FUNCEME. 2019. Pré-Estação: entenda o período que antecede a quadra chuvosa do Ceará. Disponível em: http://www.funceme.br/?p=5963. Acesso em: 02 de ago. de 2021.

FUNCEME. Fundação Cearense de Meteorologia - FUNCEME. 2021. Página inicial. Disponível em: http://www.funceme.br. Acessado em: 02 de ago. de 2021.

GAO, Y.; MERZ, C.; LISCHEID, G.; SCHNEIDER, M. A review on missing hydrological data processing. Environmental earth sciences, v. 77, n. 2, p. 1-12, 2018.

GÓMEZ-CARRACEDO, M. P.; ANDRADE, J. M.; LÓPEZ-MAHÍA, P.; MUNIATEGUI, S.; PRADA, D. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometrics and Intelligent Laboratory Systems, v. 134, p. 23-33, 2014.

GÜNTHER, F.; FRITSCH, S. Neuralnet: training of neural networks. R J., v. 2, n. 1, p. 30, 2010.

GUPTA, A.; LAM, M. S. Estimating missing values using neural networks. Journal of the Operational Research Society, v. 47, n. 2, p. 229-238, 1996.

HAYKIN, S. Neural Networks: A comprehensive foundation. Prentice Hall, 1999.

HONGHAI, F.; GUOSHUN, C.; CHENG, Y.; BINGRU, Y.; YUMEI, C. A SVM regression based approach to filling in missing values. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Springer, Berlin, Heidelberg, 2005. p. 581-587.

HARMAN, B. I.; KOSEOGLU, H.; YIGIT, C. O. Performance evaluation of IDW, Kriging and multiquadric interpolation methods in producing noise mapping: A case study at the city of Isparta, Turkey. Applied Acoustics, v.112, p.147-157, 2016.

JUNGER, W. L.; DE LEON, A. P. Imputation of missing data in time series for air pollutants. Atmospheric Environment, v. 102, p. 96-104, 2015.

JUNNINEN, H.; NISKA, H.; TUPPURAINEN, K.; RUUSKANEN, J.; KOLEHMAINEN, M. Methods for imputation of missing values in air quality data sets. Atmospheric Environment, v. 38, n. 18, p. 2895-2907, 2004.

KARAMOUZ, M.; NAZIF, S.; FALAHI, M. Hydrology and hydroclimatology: principles and applications. CRC Press, 2012.

KASHANI, M. H.; DINPASHOH, Y. Evaluation of efficiency of different estimation methods for missing climatological data. Stochastic Environmental Research and Risk Assessment, v. 26, n. 1, p. 59-71, 2012.

KIM, J.; RYU, J. H. A Heuristic Gap Filling Method for Daily Precipitation Series. Water Resources Management, v. 30, n. 7, p. 2275-2294, 2016.

LEE, S.; LEE, K. K.; YOON, H. Using artificial neural network models for groundwater level forecasting and assessment of the relative impacts of influencing factors. Hydrogeology Journal, v. 27, n. 2, p. 567-579, 2019.

LIN, W. C.; TSAI, C. F. Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, v. 53, n. 2, p. 1487-1509, 2020.

LITTLE, R. J.; RUBIN, D. B. Statistical analysis with missing data. John Wiley & Sons, 2019.

MACHIWAL, D.; JHA, M. K. Hydrologic time series analysis: theory and practice. Springer Science & Business Media, 2012.

MAITY, R. Statistical methods in hydrology and hydroclimatology. Springer, 2018.

MEKANIK, F.; IMTEAZ, M. A.; GATO-TRINIDAD, S.; ELMAHDI, A. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes. Journal of Hydrology, v. 503, p. 11-21, 2013.

MEKIS, E.; DONALDSON, N.; REID, J.; ZUCCONI, A; HOOVER, J.; LI, Q.; NITU, R.; MELO, S. An overview of surface-based precipitation observations at environment and climate change Canada. Atmosphere-Ocean, v. 56, n. 2, p. 71-95, 2018.

MELLO, Y. R.; KOHLS, W.; OLIVEIRA, T. M. N. Uso de diferentes métodos para o preenchimento de falhas em estações pluviométricas. Boletim de geografia, v. 35, n. 1, p. 112-121, 2017.

MORITZ, S.; SARDÁ, A.; BARTZ-BEIELSTEIN, T.; ZAEFFERER, M.; STORK, J. Comparison of different methods for univariate time series imputation in R. arXiv preprint arXiv:1510.03924, 2015.

NAGHETTINI, M.; PINTO, E. J. A. Hidrologia estatística. Belo Horizonte: CPRM, 2017.

OLIVEIRA, G. G.; PEDROLLO, O. C.; CASTRO, N. M. R.; BRAVO, J. M. Simulações hidrológicas com diferentes proporções de área controlada na bacia hidrográfica. Rev. Bras. Recur. Hídricos, v. 18, n. 3, p. 193-204, 2013.

PALIT, A. K.; POPOVIC, D. Computational Intelligence in Time Series Forecasting: Theory and engineering applications. Springer, 2005.

RADI, N. F. A.; ZAKARIA, R.; AZMAN, M. A. Z. Estimation of missing rainfall data using spatial interpolation and imputation methods. In: AIP conference proceedings. American Institute of Physics, 2015. p. 42-48.

RUEZZENE, C. B.; MIRANDA, R. B.; TECH, A. R. B.; MAUAD, F. F. Preenchimento de falhas em dados de precipitação através de métodos tradionais e por inteligência artificial. Revista Brasileira de Climatologia. v. 29, p. 177-204, 2021.

SATTARI, M. T.; REZAZADEH-JOUDI, A.; KUSIAK, A. Assessment of different methods for estimation of missing data in precipitation studies. Hydrology Research, v. 48, n. 4, p. 1032-1044, 2017.

SATTARI, M. T.; FALSAFIAN, K.; IRVEM, A.; QASEM, S. N. Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Engineering Applications of Computational Fluid Mechanics, v. 14, n. 1, p. 1078-1094, 2020.

SEARCY, J. K.; HARDISON, C. H. Double-mass curves. US Government Printing Office, 1960.

TEAM, R. Core. R: A language and environment for statistical computing. 2021.

TEEGAVARAPU, R. S. V.; CHANDRAMOULI, V. Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. Journal of hydrology, v. 312, n. 1-4, p. 191-206, 2005.

TEIXEIRA, F. J. C. Modelos de gerenciamento de recursos hídricos: análises e proposta de aperfeiçoamento do sistema do Ceará. Dissertação (Mestrado em Recursos Hídricos) - Universidade Federal do Ceará, Fortaleza, 2003.

TUCCI, C. E. M. Hidrologia: ciência e aplicação. Porto Alegre: Ed. UFRGS, 2001.

TWALA, B. An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, v. 23, n. 5, p. 373-405, 2009.

ZHANG, G. P. An investigation of neural networks for linear time-series forecasting. Computers Operations Research, v. 28, n. 12, p. 1183–1202, 2001.

Published

10/06/2022

How to Cite

Cunha Júnior, R. O. da, & Firmino, P. R. A. (2022). Missing values simulation in rainfall time series for evaluating imputation methods. Brazilian Journal of Climatology, 30(18), 691–714. https://doi.org/10.55761/abclima.v30i18.15243

Issue

Section

Artigos