Rv1118c Resolved · high

H37Rv Rv1118c · MTBC0 - · 286 aa · 1241971–1242831 (-) · RefSeq NP_215634.1

Annotation: from legacy to revised

Legacy (H37Rv / Mycobrowser)hypothetical protein
MTBC0 PGAP re-annotation
Revised (this work)Circularly permuted NlpC/P60 (YaeF/YiiX-family) cysteine amidase with a structurally competent Cys234-His97 catalytic dyad (Glu115 a candidate third member). RefSeq leaves this locus 'hypothetical protein'; here it is re-annotated by structure-guided active-site mapping. The catalytic histidine partner of the nucleophile Cys234 is His97 (Sgamma-Ndelta1 = 3.6 A on the ESMFold model, 2.9 A on AlphaFold3), invisible to sequence proximity because the fold is circularly permuted - the three sequence-proximal histidines lie 7-23 A away. HHpred matches the permuted Peptidase_C92 / YaeF-YiiX family (>=99.7%) and lipid-acting permuted members (LRAT, H-REV107), and the 86%-hydrophobic pocket points to an N-acyl-amino-acid / lipoprotein amide substrate rather than a peptidoglycan muropeptide. Distinct from the five canonical peptidoglycan-hydrolase NlpC/P60 enzymes of H37Rv (Rv0024, RipA, RipB, RipD, Rv2190c). Catalytic codons essentially invariant across ~250,724 MTBC genomes. A structural prediction, not a biochemical assay.

Annotated on the H37Rv protein: this gene has no 1:1 ancestral MTBC0 anchor (PE/PPE, paralogue, IS element, or otherwise unanchored CDS).

Curated reference (UniProt)

UniProt O06570 TrEMBL · unreviewed · Evidence at protein level
UniProt nameConserved protein

UniProt still lists this protein as Conserved protein; the revised annotation above is ahead of the current UniProt record.

Functional vocabulary (eggNOG-mapper, orthology transfer)

Orthologous group28P5T

Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.

Conservation & selection (intra-MTBC, 145 209 strains) pseudogene candidate

pN/pS 0.518 · relaxed/neutral
Polymorphic sites (≥ 0.1% of strains) 4 synonymous, 5 missense, 1 nonsense, 1 frameshift
Disruption 2 distinct premature-stop/frameshift site(s); most common in 1.05% of strains (1530) · clonal

pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.

Domains (Pfam, hmmscan --cut_ga)

No Pfam-A domain above the gathering threshold (or not yet scanned).

Structural neighbours (Foldseek on the ESMFold model, exploratory)

ESMFold model confidence: mean pLDDT 84.6 (confident). A confident model makes the fold comparison meaningful.

Best matches against the PDB, ranked by Foldseek homology probability. A high probability / TM-score suggests a shared fold; unless flagged sig (E < 0.01) these are fold hypotheses, not assignments.

TargetProbTME-valueDescription
3kw0-assembly1_B 1.00 0.68 4.0e-06 sig 3kw0-assembly1_B Crystal structure of Cysteine peptidase (NP_982244.1) from BACILLUS CEREUS ATCC 10987 at 2.50 A resolution
3kw0-assembly1_C 1.00 0.67 4.0e-06 sig 3kw0-assembly1_C Crystal structure of Cysteine peptidase (NP_982244.1) from BACILLUS CEREUS ATCC 10987 at 2.50 A resolution
3kw0-assembly1_A 1.00 0.67 1.9e-05 sig 3kw0-assembly1_A Crystal structure of Cysteine peptidase (NP_982244.1) from BACILLUS CEREUS ATCC 10987 at 2.50 A resolution
3kw0-assembly1_D 1.00 0.64 2.6e-05 sig 3kw0-assembly1_D Crystal structure of Cysteine peptidase (NP_982244.1) from BACILLUS CEREUS ATCC 10987 at 2.50 A resolution
2if6-assembly2_B 1.00 0.58 9.4e-05 sig 2if6-assembly2_B Crystal structure of metalloprotein yiiX from Escherichia coli O157:H7, DUF1105
6z1p-assembly1_At 0.02 0.15 1.6e+00 6z1p-assembly1_At Structure of the mitochondrial ribosome from Tetrahymena thermophila
5a5k-assembly4_D 0.01 0.23 9.0e+00 5a5k-assembly4_D AtGSTF2 from Arabidopsis thaliana in complex with camalexin

Functional interaction network (STRING v12, guilt-by-association)

Closest characterised functional partner: lppW (lipoprotein LppW), medium confidence from genomic context alone (score 521 excluding text-mining). This association is the citable seed of a function hypothesis for this hypothetical protein.

PartnerProductScoreNo text-miningChannels (≥400)
Rv1119c hyp hypothetical protein 744 744 ctx neighborhood:710
Rv1120c hyp hypothetical protein 743 743 ctx neighborhood:710
Rv2905 lppW lipoprotein LppW 520 521 ctx cooccurence:518
Rv2027c dosT two component sensor histidine kinase DosT 533 485
Rv3132c devS two component sensor histidine kinase DevS 532 484
Rv0845 narS sensor histidine kinase NarS 490 467
Rv0941c hyp hypothetical protein 460 460 ctx cooccurence:457
Rv1121 zwf1 glucose-6-phosphate 1-dehydrogenase 442 443 ctx neighborhood:441
Rv1122 gnd2 6-phosphogluconate dehydrogenase (decarboxylating) 411 412 ctx neighborhood:408

STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.

Evidence

  • RefSeq: hypothetical protein
  • Active-site mapping: catalytic dyad Cys234-His97 (Sgamma-Ndelta1 3.6 A ESMFold / 2.9 A AlphaFold3); circularly permuted topology
  • HHpred: permuted Peptidase_C92 / YaeF-YiiX family >=99.7%; lipid-acting permuted members (LRAT, H-REV107)
  • Hydrophobic pocket (86%) -> N-acyl-amino-acid / lipoprotein amide substrate, not peptidoglycan
  • Distinct from the 5 canonical peptidoglycan-hydrolase NlpC/P60 enzymes of H37Rv
  • Catalytic codons invariant across ~250,724 MTBC genomes (purifying selection)
  • Curated against the companion dark-enzymes re-annotation (Guyeux 2026)

Sources

  • Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
  • Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_215634.1)
  • Domains: Pfam-A via hmmscan --cut_ga — none above threshold
  • Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
  • Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021, doi:10.1093/molbev/msab293), eggNOG 5.0 DB (Huerta-Cepas et al. 2019) — OG 28P5T
  • Curated reference: UniProt O06570 (TrEMBL, unreviewed; Evidence at protein level)
  • Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
  • Model confidence: ESMFold per-residue pLDDT (mean 84.6, confident)
  • Interaction network: STRING v12.0 (Szklarczyk et al. 2023, doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 — 9 functional partner(s); context anchor lppW
  • Primary literature: Guyeux C (2026). Structure-guided functional hypotheses for uncharacterised enzymes of Mycobacterium tuberculosis in preparation. doi:10.5281/zenodo.20571950

Ancestral MTBC0 protein sequence

>H37Rv|Rv1118c|
MQSGPHLVGRVGTSFPLIARHQGATRDDAGDTGQPDPLPHVAHPDRLYPPMVHGVDPSTLALDRALNETRTGDLWLFRGRSRPDRAIQTLTNAPVNHVGMTVAIDDLPPLIWHAELGDKLLDVWTGTNHRGVQLNDARQVVQQWAGRYRQRCWLRQLTPHANRDQEDKLLRVIARMNGTPFPTTARLTGRWLRGRLPTLNDWLRGIPVLDRKVREQTQRRKQQQRTMGLATAYCAETVAITYEEMGLLVTDKDAHWFDPGKFWSGDSLPLAPGYRLGHEIAVDVGG