Rv2694c Family assigned · low auto-curated

H37Rv Rv2694c · MTBC0 mtbc0_002868 · 122 aa · 3033814–3034182 (-) · RefSeq NP_217210.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)hypothetical protein
MTBC0 PGAP re-annotationOB-fold nucleic acid binding domain-containing protein
Revised (this work)OB-fold nucleic acid binding domain-containing protein.

Auto-curated: this verdict and function were generated by rules from PGAP + Pfam + Foldseek and have not been hand-reviewed.

Curated reference (UniProt)

UniProt O07196 TrEMBL · unreviewed · Evidence at protein level
UniProt nameConserved protein

UniProt still lists this protein as Conserved protein; the revised annotation above is ahead of the current UniProt record.

Functional vocabulary (eggNOG-mapper, orthology transfer)

COG category K Transcription
L Replication, recombination and repair
eggNOG descriptionnucleic acid binding, OB-fold, tRNA helicase-type
Orthologous groupCOG1200

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains)

pN/pS n/a
Polymorphic sites (≥ 0.1% of strains) 0 synonymous, 7 missense, 0 nonsense, 0 frameshift

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

No Pfam-A domain above the gathering threshold (or not yet scanned).

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: Rv2693c (integral membrane protein), high confidence from genomic context alone (score 844 excluding text-mining). This association is the citable seed of a function hypothesis for this hypothetical protein.

PartnerProductScoreNo text-miningChannels (≥400)
Rv2693c integral membrane protein 844 844 ctx neighborhood:839
Rv2690c integral membrane protein 747 747 ctx cooccurence:740
Rv2744c 35kd_ag hyp hypothetical protein 747 737 coexpression:732
Rv0563 htpX protease HtpX 730 730 coexpression:730
Rv2692 ceoC TRK system potassium uptake protein CeoC 718 718 ctx cooccurence:717
Rv2691 ceoB TRK system potassium uptake protein CeoB 692 692 ctx cooccurence:691
Rv2695 hyp hypothetical protein 618 618 ctx neighborhood:616
Rv1929c hyp hypothetical protein 471 446 ctx cooccurence:440
Rv2185c TB16.3 hyp hypothetical protein 435 436 ctx cooccurence:430
Rv3013 hyp hypothetical protein 441 420
Rv1407 fmu 16S rRNA m5C967 methyltransferase 440 419
Rv2710 sigB RNA polymerase sigma factor SigB 447 418
Rv2343c dnaG DNA primase 482 416 coexpression:416
Rv0854 hyp hypothetical protein 402 403 ctx cooccurence:401
Rv3646c topA DNA topoisomerase I 568 295 textmining:413

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • Legacy H37Rv annotation: hypothetical protein
  • MTBC0 PGAP product: OB-fold nucleic acid binding domain-containing protein
  • (auto-curated by rules from PGAP + Pfam + Foldseek; not hand-reviewed)

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_217210.1)
  • Domains: Pfam-A via hmmscan --cut_ga — none above threshold
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG COG1200
  • Curated reference: UniProt O07196 (TrEMBL, unreviewed; Evidence at protein level)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 21 functional partner(s); context anchor Rv2693c
  • Primary literature: none located yet; annotation rests on the domain/homology sources above.

Ancestral MTBC0 protein sequence

>mtbc0_002868|Rv2694c|
MGAQGYLRRLTRRLTEDLEQRDVEELSDEVLNAGAQRAIDCQRGQEVTVVGTLRSVETNGKGCSGGVRAELFDGSDTVTLVWLGQRRIPGIDTGRTLRVRGRLGKLENGTKAIYNPHYEIQR