Publication Details
SoluProt: prediction of soluble protein expression in Escherichia coli
Marušiak Martin, Ing.
Martínek Tomáš, doc. Ing., Ph.D. (DCSY)
Kunka Antonín, Mgr., Ph.D.
Zendulka Jaroslav, doc. Ing., CSc. (UIFS)
Bednář David
Damborský Jiří, prof. Mgr., Dr. (UMEL)
protein solubility, machine-learning
Motivation: Poor protein solubility hinders the production of many therapeutic
and industrially useful proteins. Experimental efforts to increase solubility are
plagued by low success rates and often reduce biological activity. Computational
prediction of protein expressibility and solubility in Escherichia coli using
only sequence information could reduce the cost of experimental studies by
enabling prioritisation of highly soluble proteins.
Results: A new tool for sequence-based prediction of soluble protein expression
in Escherichia coli, SoluProt, was created using the gradient boosting machine
technique with the TargetTrack database as a training set. When evaluated against
a balanced independent test set derived from the NESG database, SoluProts
accuracy of 58.4% and AUC of 0.60 exceeded those of a suite of alternative
solubility prediction tools. There is also evidence that it could significantly
increase the success rate of experimental protein studies. SoluProt is freely
available as a standalone program and a user-friendly webserver at
https://loschmidt.chemi.muni.cz/soluprot/.
@article{BUT168540,
author="Jiří {Hon} and Martin {Marušiak} and Tomáš {Martínek} and Antonín {Kunka} and Jaroslav {Zendulka} and David {Bednář} and Jiří {Damborský}",
title="SoluProt: prediction of soluble protein expression in Escherichia coli",
journal="BIOINFORMATICS",
year="2021",
volume="37",
number="1",
pages="23--28",
doi="10.1093/bioinformatics/btaa1102",
issn="1367-4803",
url="https://www.fit.vut.cz/research/publication/12368/"
}