Global Ensemble Streamflow and Flood Modeling with Application of Large Data Analytics, Deep learning and GIS
AdvisorWigand, Peter E.
AltmetricsView Usage Statistics
ABSTRACTFlooding is one of the most dangerous natural disasters that repeatedly occur globally, and flooding frequently leads to major urban, financial, anthropogenic, and environmental impacts in the subjected area. Therefore, developing flood susceptibility maps to identify flood zones in the catchment is necessary for improved flood management and decision making. Streamflow and flood forecasting can provide important information for various applications including optimization of water resource allocations, water quality assessment, cost analysis, sustainable design of hydrological infrastructures, improvement in agriculture and irrigation practices. Compared to conventional or physically based hydrological modeling, which needs a large amount of historical data and parameters, the recent data-driven models which require limited amounts of data, have received growing attention among researchers due to their high predictive performance. This makes them more appropriate for hydrological forecasting in basin-scale and data-scarce regions. In this context, the main objective of this study was to evaluate the performance of various data-driven modeling approaches in flood and streamflow forecasting. One of the significant desires in daily streamflow prediction in today’s world is recognizing possible indicators and improving their applicability for effective water management strategies. In this context, the authors proposed an ensemble data mining algorithm coupled with various machine learning methods to perform data cleaning, dimensionality reduction, and feature subset selection. To perform the task of data mining, three data cleaning approaches: Principle Component Analysis (PCA), Tensor Flow (TF) and Tensor Flow K-means clustering(TF-k-means clustering) have been used. For the feature selection, four different machine learning approaches including K Nearest Neighbor (KNN), Bootstrap aggregating, Random Forest (RF) and Support Vector Machin (SVM) have been investigated. Out of twelve different combinations of data mining and machine learning, the best ensemble model was TF-k-means clustering coupled with RF, which outperformed the other methods with 96.52% classification accuracy. Thereafter, a modified Nonlinear Echo State Networks Multivariate Polynomial (NESN-MP) named in the current study as Robust Nonlinear Echo State Network (RNESN) was utilized for daily streamflow forecasting. The RNESN decreases the size of the reservoir (hidden layer which performs random weigh initialization), reduces the computational burden compared with NESN-MP, and increases the interactions between the internal states. The model is thus simple and user-friendly with better learning ability and more accurate forecasting performance. The method has been tested with data provided by the United States Geological Survey (USGS), Natural Resource Conservation Service (NRCS), National Weather Service Climate Prediction Center (NOAA) and Daymet Data Set from NASA through the Earth Science Data and Information System (ESDIS). Each data set includes the daily records of the local observed hydrological and large-scale weather/climate variability parameters. The efficiency of the proposed method has been evaluated in three regions namely Berkshire County (MA), Tuolumne County (CA), and Wasco County (OR). These basins were designated based upon the wide range of climatic conditions across the US that they represent. The simulation results were compared with NESN-MP and Adaptive Neuro-Fuzzy Inference System (ANFIS). The results validate the superiority of the proposed modeling approach compared to NESN-MP and ANFIS. The proposed RNESN approaches outperform the other methods with an RMSE = 0.98. For flood forecasting, an Evidential Belief Function (EBF) model, both as an individual model and in combination with Logistic Regression (LR) methods, has been proposed to prepare the flood susceptibility map. In in this study, we proposed a new ensemble of models of Bootstrap aggregating as a Meta classifier based upon the K-Nearest Neighbor (KNN) functions including coarse, cosine, cubic and weighted as base classifiers to perform spatial prediction of the flood. We first selected 10 conditioning factors to spatial prediction of floods and then their prediction capability using the relief-F attribute evaluation (RFAE) method was assessed. Model validation was performed using two statistical error-indexes and the area under the curve (AUC). Results concluded that the Bootstrap aggregating -cubic KNN ensemble model outperformed the other ensemble models. Therefore, the Bootstrap aggregating -cubic KNN model can be used as a promising technique for the sustainable management of flood-prone areas. Furthermore, the AUC results indicated that the EBF, EBF from LR, EBF-LR (enter), and EBF-LR (stepwise) success rates were 94.61%, 67.94%, 86.45%, and 56.31%, respectively, and the prediction rates were 94.55%, 66.41%, 83.19%, and 52.98%. The results showed that the EBF model had the highest accuracy in predicting the flood susceptibility map, in which 14% of the total areas were located in high and very high susceptibility classes and 62% were located in low and very low susceptibility classes.