Annotation: from legacy to revised
| Legacy (H37Rv / Mycobrowser) | hypothetical protein |
| MTBC0 PGAP re-annotation | glycosyltransferase family 4 protein |
| Revised (this work) | Glycosyltransferase family 4 (GT-B fold) protein (Pfam Glycos_transf_1 PF00534 with GT4 domains). Transfers a sugar moiety to an acceptor; the specific donor/acceptor is not established. |
Curated reference (UniProt)
| UniProt |
P96407
TrEMBL · unreviewed
· Evidence at protein level
|
| UniProt name | Possible conserved protein |
Functional vocabulary (eggNOG-mapper, orthology transfer)
| COG category |
M Cell wall / membrane / envelope biogenesis
|
| eggNOG description | glycosyl transferase |
| Orthologous group | COG0438 |
Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are
computed annotations, not manual curation; cross-check against the primary literature
before treating a specific reaction as established.
Conservation & selection (intra-MTBC, 145 209 strains)
| pN/pS |
0.0 · strong purifying
|
| Polymorphic sites (≥ 0.1% of strains) |
3 synonymous, 0 missense, 0 nonsense, 0 frameshift
|
pN/pS from segregating SNPs (singletons removed) normalised by possible sites.
Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene).
A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic
variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A
clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a
convergent one (many independent alleles) is typical of resistance loss-of-function.
Domains (Pfam, hmmscan --cut_ga)
| Pfam | Accession | i-Evalue | Residues | Description |
Glyco_transf_4 | PF13439.13 |
6.1e-17 | 21–186 |
Glycosyltransferase Family 4 |
Glyco_trans_4_4 | PF13579.13 |
5.7e-14 | 21–183 |
Glycosyl transferase 4-like domain |
GT4-conflict | PF20706.4 |
1.1e-10 | 175–317 |
Family 4 Glycosyltransferase in conflict systems |
Glycos_transf_1 | PF00534.27 |
5.7e-31 | 200–353 |
Glycosyl transferases group 1 |
Glyco_trans_1_4 | PF13692.13 |
1.6e-23 | 200–340 |
Glycosyl transferases group 1 |
Glyco_trans_1_2 | PF13524.13 |
1.2e-07 | 221–367 |
Glycosyl transferase-like |
Functional interaction network (STRING v12, guilt-by-association)
Closest characterised functional partner:
Rv0224c (methyltransferase),
high confidence from genomic context alone
(score 898 excluding text-mining).
This association is the citable seed of a function hypothesis for this hypothetical protein.
| Partner | Product | Score | No text-mining | Channels (≥400) |
Rv0224c |
methyltransferase |
898 |
898 ctx |
neighborhood:615 cooccurence:745 |
Rv1562c treZ |
malto-oligosyltrehalose trehalohydrolase |
801 |
782 |
coexpression:411 database:572 |
Rv1326c glgB |
1,4-alpha-glucan branching protein |
815 |
781 |
coexpression:409 database:572 |
Rv0223c |
aldehyde dehydrogenase |
730 |
720 ctx |
neighborhood:705 |
Rv0226c |
transmembrane protein |
719 |
705 ctx |
cooccurence:692 |
Rv2529 hyp |
hypothetical protein |
673 |
661 |
database:516 |
Rv0236c aftD |
alpha-(1->3)-arabinofuranosyltransferase |
667 |
649 ctx |
cooccurence:627 |
Rv0227c |
membrane protein |
644 |
644 ctx |
cooccurence:634 |
Rv2673 aftC |
alpha-(1->3)-arabinofuranosyltransferase |
609 |
607 ctx |
cooccurence:601 |
Rv3784 |
dTDP-glucose 4,6-dehydratase |
591 |
570 |
coexpression:414 |
Rv3802c |
membrane protein |
583 |
560 ctx |
cooccurence:556 |
Rv0228 |
acyltransferase |
531 |
507 ctx |
cooccurence:476 |
Rv3464 rmlB |
dTDP-glucose 4,6-dehydratase |
529 |
505 |
coexpression:418 |
Rv1328 glgP |
glycogen phosphorylase |
534 |
492 |
coexpression:404 |
Rv3809c glf |
UDP-galactopyranose mutase |
515 |
490 |
coexpression:473 |
STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression,
experimental, database, text-mining) into a 0–1000 score. The ctx
badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion,
phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an
unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not
depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with
the operon context and the primary literature before assigning a function.
Evidence
- MTBC0 PGAP product: 'glycosyltransferase family 4 protein'
- Pfam: Glycos_transf_1 PF00534 (E=5.7e-31), Glyco_transf_4 PF13439, GT4 domains
ESM Atlas signal (exploratory)
Ancestral protein hash 8de903672994e04ce6368c5cafc19adc ·
10 ESM-space neighbours (max similarity 0.942).
SAE features are orienting indices, not validated domains.
| # | Index | Activation | Interpretation |
| 1 | 15113 |
1.36 |
Glycosyltransferase aromatic stacking clamp |
| 2 | 14002 |
1.29 |
Carbohydrate recognition surface elements |
| 3 | 12287 |
1.23 |
Glycosyltransferase donor-binding acidic loop |
| 4 | 10225 |
1.21 |
Glycosyltransferase acidic donor motif |
| 5 | 13813 |
1.18 |
Membrane-proximal GT acceptor hairpin |
| 6 | 5820 |
1.14 |
Conserved hydrophobic beta-strand scaffold |
| 7 | 11299 |
1.14 |
Glycosyltransferase donor binding core |
| 8 | 6700 |
1.11 |
Expand nucleotide sugar donors |
Sources
- Ancestral sequence & coordinates: Harrison LB et al. (2024),
An imputed ancestral reference genome for the MTBC,
doi:10.1101/2023.09.07.556366
- Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_214739.1)
- Domains: Pfam-A via hmmscan --cut_ga — Glyco_transf_4 (PF13439.13), Glyco_trans_4_4 (PF13579.13), GT4-conflict (PF20706.4), Glycos_transf_1 (PF00534.27), Glyco_trans_1_4 (PF13692.13), Glyco_trans_1_2 (PF13524.13)
- Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
- Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021,
doi:10.1093/molbev/msab293), eggNOG 5.0 DB
(Huerta-Cepas et al. 2019) — OG
COG0438
- Curated reference: UniProt
P96407
(TrEMBL, unreviewed; Evidence at protein level)
- Intra-MTBC selection: pN/pS and disruption from SPDI variants of
145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
- Interaction network: STRING v12.0 (Szklarczyk et al. 2023,
doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 —
36 functional partner(s); context anchor
Rv0224c
- Primary literature: none located yet; annotation rests on the domain/homology sources above.
Ancestral MTBC0 protein sequence
>mtbc0_000239|Rv0225|
MSALRSVLLLCWRDIGHPQGGGSEAYLQRIGAQLAASGIAVTLRTARYPGAPRHELVDGVRISRAGGRYSVYLWALLAMAAARCGLGPLRRVRPDVVVDTQNGWPFVARLLYGRRSLVLVHHCHREQWPVAGRMMGRLGWYVESMLSPRLHRRNQYVTVSLPSARDLIALGVDSERIAVVRNGLDEAPSPTLSGPRAPTPRVVVLSRLVPHKQIEDALAAVAELQPRIPGLHLDIVGGGWWRQRLVDHVHRLDIADAVTFHGHVDDVTKHHVLQSSWVHLLPSRKEGWGLAVIEAAQHGVPTIGYRSSGGLADSIVDGVTGILVDDRAELVAWLEQLLSDSVLRDQLGAKAQARSGEFSWRQSAEALRSVLEAVQASRFVSGVV
Spot an error? Suggest an improvement