dnaE1 Family assigned · medium auto-curated

H37Rv Rv1547 · MTBC0 mtbc0_001654 · 1184 aa · 1757503–1761057 (+) · RefSeq NP_216063.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)DNA polymerase III subunit alpha
MTBC0 PGAP re-annotationDNA polymerase III subunit alpha
Revised (this work)DNA polymerase III subunit alpha. Pfam: PHP (PF02811.27), DNA_pol3_alpha (PF07733.19), DNA_pol3_finger (PF17657.7), HHH_6 (PF14579.13), tRNA_anti-codon (PF01336.32).

Auto-curated: this verdict and function were generated by rules from PGAP + Pfam + Foldseek and have not been hand-reviewed.

Curated reference (UniProt)

UniProt P9WNT7 SwissProt · reviewed · Evidence at protein level
UniProt nameDNA polymerase III subunit alpha
EC (curated) EC 2.7.7.7
Curated functionDNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. Pol III also exhibits 3' to 5' exonuclease activity. The alpha chain is the DNA polymerase (By similarity).

Functional vocabulary (eggNOG-mapper, orthology transfer)

COG category L Replication, recombination and repair
Preferred namednaE
eggNOG descriptionDNA polymerase
Orthologous groupCOG0587
EC number EC 2.7.7.7
KEGG orthology K02337
KEGG pathways map00230, map00240, map01100, map03030, map03430, map03440
KEGG modules M00260
Gene Ontology (10) GO:0005575, GO:0005618, GO:0005623, GO:0005886, GO:0008150, GO:0016020, GO:0030312, GO:0040007, GO:0044464, GO:0071944

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains)

pN/pS 0.162 · strong purifying
Polymorphic sites (≥ 0.1% of strains) 14 synonymous, 7 missense, 0 nonsense, 0 frameshift

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

PfamAccessioni-EvalueResiduesDescription
PHPPF02811.27 4.1e-4812–198 PHP domain
DNA_pol3_alphaPF07733.19 1.6e-92320–590 Bacterial DNA polymerase III alpha NTPase domain
DNA_pol3_fingerPF17657.7 2.1e-62593–762 Bacterial DNA polymerase III alpha subunit finger domain
HHH_6PF14579.13 1.1e-25835–922 Helix-hairpin-helix motif
tRNA_anti-codonPF01336.32 5.1e-081024–1100 OB-fold nucleic acid binding domain

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: dnaG (DNA primase), medium confidence from genomic context alone (score 694 excluding text-mining).

PartnerProductScoreNo text-miningChannels (≥400)
Rv0002 dnaN DNA polymerase III subunit beta 993 988 experimental:858 database:900 textmining:490
Rv2413c hyp hypothetical protein 988 985 experimental:829 database:900
Rv3721c dnaZX DNA polymerase III subunit gamma/tau 983 982 experimental:773 database:900
Rv3644c DNA polymerase 983 982 experimental:773 database:900
Rv2191 hyp hypothetical protein 976 953 experimental:510 database:900 textmining:515
Rv3711c dnaQ DNA polymerase III subunit epsilon 956 952 experimental:510 database:900
Rv0054 ssb single-strand DNA-binding protein 843 829 experimental:773
Rv1546 hyp hypothetical protein 796 796 ctx neighborhood:794
Rv2478c hyp hypothetical protein 799 781 experimental:773
Rv2116 lppK lipoprotein LppK 792 773 experimental:773
Rv2343c dnaG DNA primase 864 694 ctx cooccurence:610 textmining:576
Rv1544 ketoacyl reductase 681 681 ctx neighborhood:678
Rv2841c nusA transcription termination/antitermination protein NusA 533 534 ctx cooccurence:504
Rv3907c pcnA poly(A) polymerase PcnA 514 512 ctx cooccurence:496
Rv1543 oxidoreductase 506 507 ctx neighborhood:502

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • Legacy H37Rv annotation: DNA polymerase III subunit alpha
  • MTBC0 PGAP product: DNA polymerase III subunit alpha
  • Pfam (hmmscan --cut_ga): PHP PF02811.27 (E=4e-48), DNA_pol3_alpha PF07733.19 (E=2e-92), DNA_pol3_finger PF17657.7 (E=2e-62), HHH_6 PF14579.13 (E=1e-25), tRNA_anti-codon PF01336.32 (E=5e-08)
  • (auto-curated by rules from PGAP + Pfam + Foldseek; not hand-reviewed)

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_216063.1)
  • Domains: Pfam-A via hmmscan --cut_ga — PHP (PF02811.27), DNA_pol3_alpha (PF07733.19), DNA_pol3_finger (PF17657.7), HHH_6 (PF14579.13), tRNA_anti-codon (PF01336.32)
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG COG0587
  • Curated reference: UniProt P9WNT7 (SwissProt, reviewed; Evidence at protein level)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 65 functional partner(s); context anchor dnaG
  • Primary literature: none located yet; annotation rests on the domain/homology sources above.

Ancestral MTBC0 protein sequence

>mtbc0_001654|Rv1547|dnaE1
MSGSSAGSSFVHLHNHTEYSMLDGAAKITPMLAEVERLGMPAVGMTDHGNMFGASEFYNSATKAGIKPIIGVEAYIAPGSRFDTRRILWGDPSQKADDVSGSGSYTHLTMMAENATGLRNLFKLSSHASFEGQLSKWSRMDAELIAEHAEGIIITTGCPSGEVQTRLRLGQDREALEAAAKWREIVGPDNYFLELMDHGLTIERRVRDGLLEIGRALNIPPLATNDCHYVTRDAAHNHEALLCVQTGKTLSDPNRFKFDGDGYYLKSAAEMRQIWDDEVPGACDSTLLIAERVQSYADVWTPRDRMPVFPVPDGHDQASWLRHEVDAGLRRRFPAGPPDGYRERAAYEIDVICSKGFPSYFLIVADLISYARSAGIRVGPGRGSAAGSLVAYALGITDIDPIPHGLLFERFLNPERTSMPDIDIDFDDRRRGEMVRYAADKWGHDRVAQVITFGTIKTKAALKDSARIHYGQPGFAIADRITKALPPAIMAKDIPLSGITDPSHERYKEAAEVRGLIETDPDVRTIYQTARGLEGLIRNAGVHACAVIMSSEPLTEAIPLWKRPQDGAIITGWDYPACEAIGLLKMDFLGLRNLTIIGDAIDNVRANRGIDLDLESVPLDDKATYELLGRGDTLGVFQLDGGPMRDLLRRMQPTGFEDVVAVIALYRPGPMGMNAHNDYADRKNNRQAIKPIHPELEEPLREILAETYGLIVYQEQIMRIAQKVASYSLARADILRKAMGKKKREVLEKEFEGFSDGMQANGFSPAAIKALWDTILPFADYAFNKSHAAGYGMVSYWTAYLKANYPAEYMAGLLTSVGDDKDKAAVYLADCRKLGITVLPPDVNESGLNFASVGQDIRYGLGAVRNVGANVVGSLLQTRNDKGKFTDFSDYLNKIDISACNKKVTESLIKAGAFDSLGHARKGLFLVHSDAVDSVLGTKKAEALGQFDLFGSNDDGTGTADPVFTIKVPDDEWEDKHKLALEREMLGLYVSGHPLNGVAHLLAAQVDTAIPAILDGDVPNDAQVRVGGILASVNRRVNKNGMPWASAQLEDLTGGIEVMFFPHTYSSYGADIVDDAVVLVNAKVAVRDDRIALIANDLTVPDFSNAEVERPLAVSLPTRQCTFDKVSALKQVLARHPGTSQVHLRLISGDRITTLALDQSLRVTPSPALMGDLKELLGPGCLGS