The vast treasure trove of 540,000 images of 185 Austrian butterfly and moth species was collected over several years by thousands of volunteers. Using this resource, Innsbruck researcher Friederike Barkmann trained an AI model for biodiversity studies on european supercomputers. The data are now available to scientists worldwide for further research.
As part of the Schmetterlinge Österreichs project, more than 25,000 volunteers captured hundreds of thousands of photos of Austrian butterflies and moths between 2016 and 2023 and made them available to the project. Using these images, Friederike Barkmann, ecologist and data scientist at the University of Innsbruck, trained an AI model to automatically identify individual species – saving both time and costs. From the outset, it was clear that the AI model should also be shared with other researchers to help improve biodiversity research elsewhere.
Training an AI model on such a massive dataset of 540,000 images requires enormous computing power – and this is precisely what supercomputing (also called high-performance computing, or HPC) provides. Friederike Barkmann first trained her model on the Innsbruck HPC system LEO5. When training on this machine became too time-consuming, HPC expert Andreas Lindner from EuroCC Austria supported her in parallelising the computational tasks (linking multiple processors to share the workload, which significantly accelerates the process). After this step, Friederike switched to LEONARDO, one of Europe’s largest supercomputers. This reduced training times by 90 percent. In the end, the AI model correctly identified 97 percent of butterfly species. Final fine-tuning was carried out on the LUMI supercomputer.
Butterflies and moths are important indicators of biodiversity. Knowing the habitat and density of species provides valuable insights into climate change and global biodiversity.
The open dataset provides researchers worldwide with a foundation for training and testing their own AI models for species identification. This strengthens a wide range of research areas, including climate change studies.
The dataset is available on figshare and GitHub (scripts for model training only).
The related Data Paper is published in Springer Nature Scientific Data.
Project: Schmetterlinge Österreichs