If you have any problems related to the accessibility of any content (or if you want to request that a specific publication be accessible), please contact us at firstname.lastname@example.org.
A quantitative approach to estimating community change parameters in diverse systems: from meiofauna to insects
AuthorAsgharian Rostami, Masoud
AltmetricsView Usage Statistics
I examined diverse biotic communities, including meiofauna (Copepod and benthic foraminifera) and pollen grains, using quantitative approaches including analytical, statistical models, machine learning (ML) and deep learning (DL). Areas of research to which I applied these approaches included: 1) Quantifying the response of Harpacticoida communities to environmental pressures after the Deepwater Horizon oil spill, 2) Estimating the response of benthic foraminifera communities to physical and chemical pressure in the Adriatic Sea, and 3) Image classification of pollen grains. Here, I examined how these modeling tools can be used for predictive and explanatory modeling. The popularity of Machine learning ML and DL has sharply increased in recent years. Despite their popularity, the inner workings of ML and DL algorithms are unclear, and their relationship to statistical modeling methods remains debated. The major difference between machine learning and statistics is the goal – prediction for ML and DL and exploration or hypothesis testing for traditional statistics. Recent growth in data availability, mechanistic understanding, and computing power has increased the need for a new quantitative ecology approach. However, flexible methodological frameworks are needed to utilize these developments toward improved ecological prediction. Our result shows that ML approaches such as random forest and gradient forest are the best methods to use and discover the non-linear relationship between the effect of environmental pressure and the distribution of copepods and foraminifera microorganisms. However, in the case of the magnitude of the effect of environmental variables and their direct and indirect effect, the statistical modeling approach is more suitable. Therefore, ML models are designed to make predictions with the highest accuracy possible. On the other hand, Statistical models are designed to infer the relationships between variables. I used ML and structural equation models (SEM) approaches to quantify the response of Harpacticoida communities to environmental pressures after the Deepwater Horizon oil spill. Harpacticoida and environmental variables were sampled at 95 stations during September-October 2010. I tested the hypothesis that benthic effect can be more finely resolved if a lower level of taxonomic resolution is provided, as only major taxonomic community responses for meiofauna had been reported previously. For this purpose, I examined how assembleges of Harpacticoida (Copepoda), characterized at the family level (31 families), are affected by 29 environmental variables. The most important environmental variables affecting community structure of Harpacticoida, include distance from the wellhead, and concentrations of Mn, Ni, Ba and TPH. Ameiridae was the dominant family across all stations and increased in dominance at the impacted stations. Tisbidae also appeared tolerant and increased drastically (~4x higher in number compare to stations far from wellhead) in abundance and percent contribution at impacted stations. Ectinosomatidae, Cletodidae, Miraciidae and Zosimeidae were sensitive indicators because they had reduced abundance and percent contribution to the harpacticoid community near the DWH wellhead and in the trajectory of the sub-surface plume. The harpacticoid community response suggests an expanded area of benthic impacts associated with the Deepwater Horizon blowout and oil spill. Our data also supported causal hypotheses of direct and indirect effects of distance from wellhead and depth on heavy metals and the distribution of the petroleum compounds (TPH and PAH) distribution, all of which were associated with the a decline of Hill diversity and evenness after the DWH oil spill. In addition, I used ML and SEM to estimate the responses of benthic foraminifera communities to environmental pressure in the Adriatic Sea. Over the last two decades, benthic foraminiferal ecology has been intensively investigated to improve the potential application of these marine organisms as proxies of climate and environmental changes. It is still challenging to define the most important factors affecting foraminiferal communities and derived faunistic parameters. Here, I combined gradient forest and SEM to test hypotheses about determinants of benthic foraminiferal assemblages. These approaches helped determine the relative effect sizes of different environmental variables responsible for shaping living foraminiferal distributions. Four major faunal turnovers (at 13–28 m; 29-58; 59-215, and >215 m) were identified along a large bathymetric gradient (13-703 m water depth) that reflect the classical bathymetric distribution of benthic communities. Sand content and organic matter (TOC and N) were identified as the most important factors influencing the foraminiferal distribution either along the entire depth gradient or at selected bathymetric ranges. The SEM supported causal hypotheses for which different factors contributed directly and indirectly to assemblages at each bathymetric range, with direct effects of depth and indirect effects through the gradient forest identified environmental parameters (i.e., sand, PLI, TOC and N) on infauna and diversity. These results are relevant to understanding the basic ecology and conservation of foraminiferal communities, and are also a good proof-of-concept for mixing machine learning with traditional statistical hypothesis testing for studying foraminiferal ecology. Deep learning is a rapidly evolving branch of machine learning, yet it has received little attention in ecology and environmental science; this method involves training deep neural networks consisting of many layers and neurons. Here, using pollen datasets, I examined several pre-trained CNN architectures using transfer learning to overcome the small number of training data preferred over other machine-learning algorithms due to their excellent classification accuracy reported in literatures. I applied a novel scratch CNN approach and different architectures of well-known pre-trained deep convolutional neural network models (AlexNet; VGG-16; MobileNet-V2; ResNet [18, 34, 50,101]; ResNeSt [50, 101]; SE-ResNeXt; Vision transformer (ViT)) to define the most promising modeling approach for classification of pollen grains in the Great Basin. Each pre-trained deep CNN model was applied to pollen grain images to distinguish 40 pollen species owing to their more in-depth structure. The ResNeSt-110 model yielded the best results, with 97.24% accuracy, 97.89% precision, 96.86% F1-Score, and 97.13% recall. This study demonstrates that using previous information gained from the transfer learning models; we can accomplish better and much quicker image classification results than traditional CNN created from scratch. I demonstrated that DL is a great modeling approach to automate pollen grains' identification, usually done by palynologists with high error. I recommend using different modeling methods depending on the research questions and these methods will help elucidate potential mechanistic relationships in complicated interactions. As ecological datasets are becoming larger and more complicated, understanding which quantitative methods to use can be critical for ecologists, and hope this study can help researchers in this field find the best model based on their objective.