Rv0004 Resolved · high

H37Rv Rv0004 · MTBC0 mtbc0_000004 · 187 aa · 4434–4997 (+) · RefSeq NP_214518.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)hypothetical protein
MTBC0 PGAP re-annotationDNA replication protein DciA
Revised (this work)DciA, DNA replication protein. Directly binds DNA and the replicative helicase DnaB and regulates the DnaB-DnaA interaction; functional analogue of the DnaC/DnaI helicase loaders that mycobacteria lack. Essential for viability; its depletion blocks cell-cycle progression.

Curated reference (UniProt)

UniProt P9WFL1 SwissProt · reviewed · Inferred from homology
UniProt nameUPF0232 protein Rv0004

UniProt still lists this protein as UPF0232 protein Rv0004; the revised annotation above is ahead of the current UniProt record.

Functional vocabulary (eggNOG-mapper, orthology transfer)

COG category S Function unknown
eggNOG descriptionBelongs to the UPF0232 family
Orthologous groupCOG5512

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains)

pN/pS 2.598 · diversifying/relaxed
Polymorphic sites (≥ 0.1% of strains) 1 synonymous, 7 missense, 0 nonsense, 0 frameshift

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

PfamAccessioni-EvalueResiduesDescription
DciAPF05258.18 4.5e-2475–162 Dna[CI] antecedent, DciA

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: recF (DNA replication/repair protein RecF), high confidence from genomic context alone (score 903 excluding text-mining). This association is the citable seed of a function hypothesis for this hypothetical protein.

PartnerProductScoreNo text-miningChannels (≥400)
Rv0003 recF DNA replication/repair protein RecF 931 903 ctx neighborhood:881
Rv0002 dnaN DNA polymerase III subunit beta 910 874 ctx neighborhood:855
Rv0005 gyrB DNA gyrase subunit B 797 785 ctx neighborhood:727
Rv0007 membrane protein 704 704 ctx neighborhood:635
Rv2256c hyp hypothetical protein 686 686 ctx cooccurence:685
Rv2050 rbpA RNA polymerase-binding protein RbpA 653 653 ctx cooccurence:652
Rv2708c hyp hypothetical protein 630 630 ctx cooccurence:630
Rv2413c hyp hypothetical protein 811 618 ctx cooccurence:605 textmining:528
Rv1830 HTH-type transcriptional regulator 618 618 ctx cooccurence:616
Rv0807 hyp hypothetical protein 607 608 ctx cooccurence:604
Rv1002c pmt dolichyl-phosphate-mannose--protein mannosyltransferase 592 592 ctx cooccurence:592
Rv2146c transmembrane protein 586 587 ctx cooccurence:584
Rv2699c hyp hypothetical protein 583 584 ctx cooccurence:582
Rv0006 gyrA DNA gyrase subunit A 553 534 ctx neighborhood:528
Rv1321 nucS endonuclease NucS 650 525 ctx cooccurence:521

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • MTBC0 PGAP product: 'DNA replication protein DciA' (was 'hypothetical protein' in the legacy H37Rv/Mycobrowser annotation)
  • ESM SAE top features describe N-terminal basic RNA/DNA-binding patches and charged low-complexity segments, consistent with a nucleic-acid-binding protein
  • Primary literature: Rv0004 is essential and binds DnaB, modulating DnaB-DnaA complex formation

ESM Atlas signal (exploratory)

Ancestral protein hash adfddfac6808f01e0815b66dcb7890c3 · 10 ESM-space neighbours (max similarity 0.921). SAE features are orienting indices, not validated domains.

#IndexActivationInterpretation
18624 0.85 Presequence-to-core transition helix
24008 0.75 N-terminal basic RNA-binding patches
314994 0.71 Charged low-complexity disordered tails
41968 0.66 Basic low-complexity N-termini
514146 0.62 Edge beta-hairpin interaction loops
615621 0.59 Short basic Zn-binding helices
7507 0.55 Short cytosolic amphipathic helices
85321 0.55 N-terminal leader/capping segments

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_214518.1)
  • Domains: Pfam-A via hmmscan --cut_ga — DciA (PF05258.18)
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG COG5512
  • Curated reference: UniProt P9WFL1 (SwissProt, reviewed; Inferred from homology)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 36 functional partner(s); context anchor recF
  • Primary literature: Mann KM, Huang DL, Hooppaw AJ, et al. (2017). Rv0004 is a new essential member of the mycobacterial DNA replication machinery PLoS Genetics. doi:10.1371/journal.pgen.1007115 PMID:29176877
  • Primary literature: Brézellec P, et al. (2022). DciA helicase operators exhibit diversity across bacterial phyla Journal of Bacteriology. doi:10.1128/jb.00163-22

Ancestral MTBC0 protein sequence

>mtbc0_000004|Rv0004|
MTGSVDRPDQNRGERSMKSPGLDLVRRTLDEARAAARARGQDAGRGRVASVASGRVAGRRRSWSGPGPDIRDPQPLGKAARELAKKRGWSVRVAEGMVLGQWSAVVGHQIAEHARPTALNDGVLSVIAESTAWATQLRIMQAQLLAKIAAAVGNDVVRSLKITGPAAPSWRKGPRHIAGRGPRDTYG