Rv3784 Resolved · high auto-curated

H37Rv Rv3784 · MTBC0 - · 326 aa · 4230256–4231236 (+) · RefSeq YP_178015.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)dTDP-glucose 4,6-dehydratase
MTBC0 PGAP re-annotation
Revised (this work)DTDP-glucose 4,6-dehydratase. Pfam: RmlD_sub_bind (PF04321.24), Epimerase (PF01370.28), Polysacc_synt_2 (PF02719.22), GDP_Man_Dehyd (PF16363.12), 3Beta_HSD (PF01073.26), NAD_binding_4 (PF07993.19), NAD_binding_10 (PF13460.13).

Auto-curated: this verdict and function were generated by rules from PGAP + Pfam + Foldseek and have not been hand-reviewed.

Annotated on the H37Rv protein: this gene has no 1:1 ancestral MTBC0 anchor (PE/PPE, paralogue, IS element, or otherwise unanchored CDS).

Curated reference (UniProt)

UniProt P72050 TrEMBL · unreviewed · Evidence at protein level
UniProt namePossible dTDP-glucose 4,6-dehydratase

Functional vocabulary (eggNOG-mapper, orthology transfer)

COG category M Cell wall / membrane / envelope biogenesis
eggNOG descriptionPolysaccharide biosynthesis protein
Orthologous groupCOG0451
EC number EC 4.1.1.35, EC 4.2.1.46
KEGG orthology K01710, K08678, K21211
KEGG pathways map00520, map00521, map00523, map00525, map01055, map01059, map01100, map01130
KEGG modules M00361, M00793

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains) pseudogene candidate

pN/pS 0.739 · relaxed/neutral
Polymorphic sites (≥ 0.1% of strains) 3 synonymous, 7 missense, 0 nonsense, 2 frameshift
Disruption 2 distinct premature-stop/frameshift site(s); most common in 2.24% of strains (3253) · clonal

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

PfamAccessioni-EvalueResiduesDescription
RmlD_sub_bindPF04321.24 2.4e-161–181 RmlD substrate binding domain
EpimerasePF01370.28 7.3e-603–235 NAD dependent epimerase/dehydratase family
Polysacc_synt_2PF02719.22 5.5e-113–119 Polysaccharide biosynthesis protein
GDP_Man_DehydPF16363.12 6.9e-584–300 GDP-mannose 4,6 dehydratase
3Beta_HSDPF01073.26 2.2e-244–229 3-beta hydroxysteroid dehydrogenase/isomerase family
NAD_binding_4PF07993.19 1.2e-165–185 Male sterility protein
NAD_binding_10PF13460.13 2.9e-127–175 NAD(P)H-binding

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: rmlA (glucose-1-phosphate thymidylyltransferase), high confidence from genomic context alone (score 985 excluding text-mining).

PartnerProductScoreNo text-miningChannels (≥400)
Rv0334 rmlA glucose-1-phosphate thymidylyltransferase 991 985 ctx cooccurence:435 coexpression:733 database:900 textmining:415
Rv3465 rmlC dTDP-4-dehydrorhamnose 3,5-epimerase 987 980 coexpression:731 database:900
Rv3266c rmlD dTDP-4-dehydrorhamnose reductase 975 966 coexpression:714 database:800
Rv3464 rmlB dTDP-glucose 4,6-dehydratase 934 922 database:900
Rv0536 galE3 UDP-glucose 4-epimerase GalE 858 852 database:800
Rv0322 udgA UDP-glucose 6-dehydrogenase UdgA 838 788 coexpression:704
Rv1510 hyp hypothetical protein 789 760 coexpression:731
Rv3809c glf UDP-galactopyranose mutase 740 707 coexpression:690
Rv3630 integral membrane protein 707 680 coexpression:669
Rv1752 hyp hypothetical protein 686 664 coexpression:645
Rv1503c Rv1503c, (MTCY277.25c), len: 182 aa. Conserved hypothetical protein, similar to C-terminal region of P27833|RFFA_ECOLI lipopolysaccharide bi 651 629 coexpression:616
Rv1504c Rv1504c, (MTCY277.26c), len: 199 aa. Conserved hypothetical protein, similar to N-terminal region of P27833|RFFA_ECOLI lipopolysaccharide bi 626 603 coexpression:529
Rv3264c manB D-alpha-D-mannose-1-phosphate guanylyltransferase ManB 640 594 coexpression:469
Rv3402c hyp hypothetical protein 605 581 coexpression:478
Rv3785 hyp hypothetical protein 942 575 ctx neighborhood:548 textmining:870

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • Annotation from H37Rv (no MTBC0 1:1 anchor; H37Rv protein used): dTDP-glucose 4,6-dehydratase
  • Pfam (hmmscan --cut_ga): RmlD_sub_bind PF04321.24 (E=2e-16), Epimerase PF01370.28 (E=7e-60), Polysacc_synt_2 PF02719.22 (E=5e-11), GDP_Man_Dehyd PF16363.12 (E=7e-58), 3Beta_HSD PF01073.26 (E=2e-24), NAD_binding_4 PF07993.19 (E=1e-16), NAD_binding_10 PF13460.13 (E=3e-12)
  • (auto-curated by rules from PGAP + Pfam + Foldseek; not hand-reviewed)

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq YP_178015.1)
  • Domains: Pfam-A via hmmscan --cut_ga — RmlD_sub_bind (PF04321.24), Epimerase (PF01370.28), Polysacc_synt_2 (PF02719.22), GDP_Man_Dehyd (PF16363.12), 3Beta_HSD (PF01073.26), NAD_binding_4 (PF07993.19), NAD_binding_10 (PF13460.13)
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG COG0451
  • Curated reference: UniProt P72050 (TrEMBL, unreviewed; Evidence at protein level)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 51 functional partner(s); context anchor rmlA
  • Primary literature: none located yet; annotation rests on the domain/homology sources above.

Ancestral MTBC0 protein sequence

>H37Rv|Rv3784|
MEILVTGGAGFQGSHLTESLLANGHWVTVLDKSSRNAVRNMQGFRSHDRAAFISGSVTDGQTIDRAVRDHHVVFHLAAHVNVDQSLGDPESFLETNVMGTYRVLEAVRRYRNRLIYVSTCEVYGDGHNLKEGERLDEHAELKPNSPYGASKAAADRLCYSYFRSYGLDVTIVRPFNIFGVRQKAGRFGALIPRLVRQGINGEGLTIFGAGSATRDYLYVSDIVGAYNLVLRTPTLRGQAINFASGKDTRVRDIVEYVADKFGARIEHRDARPGEVQRFPADISLAKSIGFQPQVEIWDGIDRYINWAKDQPQYPYEQDGFSGSSVL