Early terminated transcripts and missing proteins reflect artifacts in bacterial proteomes (opens in new tab)
MMseqs2 clustering was used to examine the uniformity and heterogeneity of proteomes from 20 bacterial species. Using clustering parameters that required 50% sequence overlap, clusters with proteins from 50% of proteomes typically contain proteins from 95% of the proteomes and capture more than 80% of the proteins in an organism. Protein clusters are highly uniform in length; across the 20 bacteria, the median cluster has more than 99% of the proteins at the mode length. While protein lengths...
Read the original article