thrA Resolved · high auto-curated

H37Rv Rv1294 · MTBC0 mtbc0_001386 · 441 aa · 1458414–1459739 (+) · RefSeq NP_215810.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)homoserine dehydrogenase
MTBC0 PGAP re-annotationhomoserine dehydrogenase
Revised (this work)Homoserine dehydrogenase. Pfam: NAD_binding_3 (PF03447.23), Homoserine_dh (PF00742.25), ACT (PF01842.32).

Auto-curated: this verdict and function were generated by rules from PGAP + Pfam + Foldseek and have not been hand-reviewed.

Curated reference (UniProt)

UniProt P9WPX1 SwissProt · reviewed · Evidence at protein level
UniProt nameHomoserine dehydrogenase
EC (curated) EC 1.1.1.3
Curated functionCatalyzes the conversion of L-aspartate-beta-semialdehyde (L-Asa) to L-homoserine (L-Hse), the third step in the biosynthesis of threonine and methionine from aspartate.

Functional vocabulary (eggNOG-mapper, orthology transfer)

COG category E Amino acid transport and metabolism
Preferred namehom
eggNOG descriptionhomoserine dehydrogenase
Orthologous groupCOG0460
EC number EC 1.1.1.3, EC 2.7.2.4
KEGG orthology K00003, K12524
KEGG pathways map00260, map00261, map00270, map00300, map01100, map01110, map01120, map01130, map01230
KEGG modules M00016, M00017, M00018, M00526, M00527
Gene Ontology (10) GO:0005575, GO:0005618, GO:0005623, GO:0005886, GO:0008150, GO:0016020, GO:0030312, GO:0040007, GO:0044464, GO:0071944

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains)

pN/pS 0.073 · strong purifying
Polymorphic sites (≥ 0.1% of strains) 5 synonymous, 1 missense, 0 nonsense, 0 frameshift

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

PfamAccessioni-EvalueResiduesDescription
NAD_binding_3PF03447.23 2.3e-2314–131 Homoserine dehydrogenase, NAD binding domain
Homoserine_dhPF00742.25 9.0e-64139–322 Homoserine dehydrogenase
ACTPF01842.32 6.9e-08355–424 ACT domain

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: thrB (homoserine kinase), high confidence from genomic context alone (score 997 excluding text-mining).

PartnerProductScoreNo text-miningChannels (≥400)
Rv1296 thrB homoserine kinase 999 997 ctx neighborhood:731 coexpression:875 database:900 textmining:843
Rv1295 thrC threonine synthase 999 996 ctx neighborhood:881 coexpression:958 textmining:794
Rv3708c asd aspartate-semialdehyde dehydrogenase 984 964 coexpression:427 database:900 textmining:593
Rv3341 metA homoserine O-acetyltransferase 951 944 database:900
Rv2753c dapA 4-hydroxy-tetrahydrodipicolinate synthase 962 939 database:900 textmining:409
Rv1293 lysA diaminopimelate decarboxylase 949 928 ctx neighborhood:881
Rv2124c metH methionine synthase 958 913 coexpression:428 database:800 textmining:542
Rv3709c ask aspartokinase 972 910 ctx cooccurence:588 coexpression:420 experimental:415 textmining:707
Rv0391 metZ O-succinylhomoserine sulfhydrylase 915 899 coexpression:404 database:800
Rv1292 argS arginine--tRNA ligase 896 891 ctx neighborhood:881
Rv2210c ilvE branched-chain amino acid aminotransferase 926 884 coexpression:412 database:800
Rv1079 metB cystathionine gamma-synthase 935 882 coexpression:402 database:800 textmining:478
Rv3340 metC O-acetylhomoserine sulfhydrylase 888 872 database:800
Rv1133c metE 5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase 883 868 database:800
Rv0069c sdaA L-serine dehydratase 846 839 database:800

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • Legacy H37Rv annotation: homoserine dehydrogenase
  • MTBC0 PGAP product: homoserine dehydrogenase
  • Pfam (hmmscan --cut_ga): NAD_binding_3 PF03447.23 (E=2e-23), Homoserine_dh PF00742.25 (E=9e-64), ACT PF01842.32 (E=7e-08)
  • (auto-curated by rules from PGAP + Pfam + Foldseek; not hand-reviewed)

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_215810.1)
  • Domains: Pfam-A via hmmscan --cut_ga — NAD_binding_3 (PF03447.23), Homoserine_dh (PF00742.25), ACT (PF01842.32)
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG COG0460
  • Curated reference: UniProt P9WPX1 (SwissProt, reviewed; Evidence at protein level)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 82 functional partner(s); context anchor thrB
  • Primary literature: none located yet; annotation rests on the domain/homology sources above.

Ancestral MTBC0 protein sequence

>mtbc0_001386|Rv1294|thrA
MPGDEKPVGVAVLGLGNVGSEVVRIIENSAEDLAARVGAPLVLRGIGVRRVTTDRGVPIELLTDDIEELVAREDVDIVVEVMGPVEPSRKAILGALERGKSVVTANKALLATSTGELAQAAESAHVDLYFEAAVAGAIPVIRPLTQSLAGDTVLRVAGIVNGTTNYILSAMDSTGADYASALADASALGYAEADPTADVEGYDAAAKAAILASIAFHTRVTADDVYREGITKVTPADFGSAHALGCTIKLLSICERITTDEGSQRVSARVYPALVPLSHPLAAVNGAFNAVVVEAEAAGRLMFYGQGAGGAPTASAVTGDLVMAARNRVLGSRGPRESKYAQLPVAPMGFIETRYYVSMNVADKPGVLSAVAAEFAKREVSIAEVRQEGVVDEGGRRVGARIVVVTHLATDAALSETVDALDDLDVVQGVSSVIRLEGTGL