The main goal of the project has been to enhance the performance and reduce the toxicity of a novel synthetic AMP (HHC-36), employ machine learning (ML) methods for discovering new, more potent antimicrobial peptides and to determine the hemolytic activity of these AMPs.
To that end, we aimed at exploring the extent to which publicly available data on antimicrobial peptides (AMPs) can be utilized using the state of the art models and training algorithms in machine learning (ML) to yield predictors that can screen any peptide sequence for their antimicrobial activity. Within this project we collected datasets on some pathogens of interest to the pork industry, performed ML trainings on best of the available models for this purpose, optimized the design (hyperparameters) of these models and explored the limits of the training using the currently available data.
We determined the asymptotic limits of the training scores for the graph convolutional models we employed on the available data. Within a mostly uncharted territory, these training results set one of the very first machine learning results on quantitatively predicting antimicrobial activity of AMPs. What is more, our results show a clear correlation between the dataset size and the final training score.
These results set the stage for next round of studies, globally and within Canada, where targeted AMP library screening can be performed with the aim of usability by ML models.