Paper Number
ML10 My Program
Session
AI and ML in Rheology
Title
A general machine learning framework for accurate detection of cluster formation: A case study of polymer crystallization from molecular dynamics simulations with supervised and unsupervised approaches
Presentation Date and Time
October 22, 2025 (Wednesday) 2:30
Track / Room
Track 6 / Sweeney Ballroom C
Authors
- Tourani, Elyar (University of Tennessee, Chemical and Biomolecular Engineering)
- Edwards, Brian J. (University of Tennessee, Chemical and Biomolecular Engineering)
- Khomami, Bamin (University of Tennessee, Chemical and Biomolecular Engineering)
Author and Affiliation Lines
Elyar Tourani, Brian J. Edwards and Bamin Khomami
Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN 37996
Speaker / Presenter
Tourani, Elyar
Keywords
theoretical methods; computational methods; artificial intelligence; machine learning; polymer melts
Text of Abstract
In this study, we present an integrated machine learning workflow for the detection and quantification of structural transformations, such as crystallization, in polymeric systems using molecular dynamics simulation data. We construct a high-dimensional feature vector for each atom, combining thermodynamic-like variables (local entropy and enthalpy), geometric descriptors, and invariant bond orientational order parameters, and apply dimensionality reduction techniques (PCA, UMAP, and VAE) to project this space into low-dimensional embeddings, exposing latent structural fingerprints.
Unsupervised clustering methods (KMeans, GMM, and HDBSCAN) are then used to label atoms as crystalline or amorphous, with UMAP and subsequent HDBSCAN yielding the highest cluster quality metrics. To validate and interpret these clusters, we train supervised binary classifiers (logistic regression, random forest, and gradient boosting) on the same feature set and demonstrate that the clustering-derived label is markedly more predictable than any individual physical threshold. Feature importance analyzes reveal that q6, local entropy, and p2 contribute most strongly to classification performance.
We use these insights to define a composite crystallinity parameter, C, by fitting a logistic model to the top three descriptors. The resulting C distribution is robustly bimodal in all simulation time steps, providing a sensitive, generalizable indicator of crystalline versus amorphous environments. Crucially, C can be calibrated using just one or a few representative snapshots and then efficiently computed throughout the trajectory to track the crystallinity evolution. We further show that the time-dependent growth rate extracted from C closely matches the values reported in the literature for the kinetics of polymer crystallization.