AutoML Prediction of Plant Length Responses to Nanoparticles

How NovaMechanics combined rigorous data curation, atomistic descriptor enrichment, and automated machine learning to predict how nanoparticle exposure affects plant growth — deployed as CeresAI-nano on the Enalos Cloud Platform.

Environmental Science: Nano • 2026
View Publication

The Challenge

Predicting Nano-Enabled Agriculture Impacts Computationally

Nanoparticles are increasingly used in agriculture as fertilisers, biostimulants, and pesticides to support sustainable food production. However, the interaction between NP properties, soil systems, and plant species is highly complex, and conventional assessment of NP–plant interactions requires long, resource-intensive experiments.

Existing datasets suffer from heterogeneous formats, missing metadata, non-systematic data coding, and lack of direct links to original publications — making it difficult to build generalisable ML models for nano-agriculture applications.

299
NP–plant interaction observations curated
85%
Accuracy of the optimised XGBoost model
83%
Balanced accuracy on external validation

Our Approach

From fragmented literature data to a validated, cloud-deployed prediction model

Curate and quality-control literature data

Performed extensive data curation on the publicly available NP–plant interactions dataset, cross-checking original publications, supplementing missing metadata such as NP core composition and crystal phase information, and standardising attribute encoding.

Enrich with atomistic descriptors

Calculated computationally derived atomistic descriptors based on the elemental composition and crystal phase of each nanoparticle, adding structural features that experimental characterisation alone cannot provide.

Address class imbalance with synthetic data

Applied synthetic data generation techniques to balance the dataset classes, combined with rigorous data filtering and variable selection through an automated ML framework evaluating seven different algorithms.

Optimise and validate with AutoML

The AutoML workflow selected XGBoost as the best-performing model, achieving 85% accuracy and 83% balanced accuracy on external validation. The model was validated following OECD guidelines with a defined applicability domain.

Deploy as CeresAI-nano and FAIRify data

Deployed the validated model as the CeresAI-nano web application on the Enalos Cloud Platform. Published the curated dataset through nanoPharos and documented the model in QMRF format for regulatory transparency.

Results at a Glance

85%
Model Accuracy
XGBoost model achieves high accuracy on external validation
83%
Balanced Accuracy
Strong performance across all plant response classes
No Lab
Virtual Screening
Predictions require no experimental input data
QMRF
Documented Model
Standardised QSAR model reporting format for regulatory use
FAIR
Open Data
Curated dataset available through nanoPharos database
SSbD
Sustainable Design
Supports safe and sustainable development of nano-agrochemicals

Related Publication

Peer-Reviewed Paper

Rigorous data curation, enrichment and meta-analysis enable autoML prediction of plant length responses to nanoparticles powered by the Enalos Cloud platform

Varsou D.-D., Theodori A., Papadiamantis A.G., Tsoumanis A., Zouraris D., Antoniou M., Koutroumpa N.-M., Melagraki G., Lynch I., Afantitis A. — Environmental Science: Nano, 2026, 13(1):621–640 — DOI: 10.1039/d5en00897b