Rv2738c Family assigned · low
H37Rv Rv2738c · MTBC0 mtbc0_002913 ·
68 aa · 3074221–3074427 (-) ·
RefSeq NP_217254.1
Annotation: from legacy to revised
| Legacy (H37Rv / Mycobrowser) | hypothetical protein |
|---|---|
| MTBC0 PGAP re-annotation | DUF3046 domain-containing protein |
| Revised (this work) | Psp-like envelope-stress system protein (paralogue of the Rv2742c Psp module). RefSeq leaves it 'hypothetical protein'. |
Curated reference (UniProt)
| UniProt |
I6YA47
TrEMBL · unreviewed
· Evidence at protein level
|
|---|---|
| UniProt name | DUF3046 domain-containing protein |
Functional vocabulary (eggNOG-mapper, orthology transfer)
| COG category |
S Function unknown
|
|---|---|
| eggNOG description | Protein of unknown function (DUF3046) |
| Orthologous group | 2EFU9 |
Orthology-based transfer (eggNOG 5.0.2, diamond). EC/KO/GO/CAZy are computed annotations, not manual curation; cross-check against the primary literature before treating a specific reaction as established.
Conservation & selection (intra-MTBC, 145 209 strains)
| pN/pS | 0.0 · strong purifying |
|---|---|
| Polymorphic sites (≥ 0.1% of strains) | 1 synonymous, 0 missense, 0 nonsense, 0 frameshift |
pN/pS from segregating SNPs (singletons removed) normalised by possible sites. Low pN/pS = purifying selection (a strong signal that a "hypothetical" is a real, constrained gene). A high pN/pS is ambiguous: relaxed constraint or positive selection (drug resistance, antigenic variation) inflate it; e.g. rpoB/katG/pncA score high here for resistance, not loss of function. A clonal disruption (one allele over a clade) suggests lineage pseudogenisation; a convergent one (many independent alleles) is typical of resistance loss-of-function.
Domains (Pfam, hmmscan --cut_ga)
| Pfam | Accession | i-Evalue | Residues | Description |
|---|---|---|---|---|
DUF3046 | PF11248.14 | 1.7e-28 | 5–65 | Protein of unknown function (DUF3046) |
Structural neighbours (Foldseek on the ESMFold model, exploratory)
ESMFold model confidence: mean pLDDT 79.8 (confident). A confident model makes the fold comparison meaningful.
Best matches against the PDB, ranked by Foldseek homology probability. A high probability / TM-score suggests a shared fold; unless flagged sig (E < 0.01) these are fold hypotheses, not assignments.
| Target | Prob | TM | E-value | Description |
|---|---|---|---|---|
6u8q-assembly1_L |
0.06 | 0.40 | 4.2e+00 | 6u8q-assembly1_L CryoEM structure of HIV-1 cleaved synaptic complex (CSC) intasome |
3ao4-assembly1_A |
0.06 | 0.38 | 4.5e+00 | 3ao4-assembly1_A Fragment-based approach to the design of ligands targeting a novel site on HIV-1 integrase |
6put-assembly1_B-2 |
0.06 | 0.35 | 3.7e+00 | 6put-assembly1_B-2 Structure of HIV cleaved synaptic complex (CSC) intasome bound with calcium |
1exq-assembly1_A |
0.05 | 0.39 | 5.5e+00 | 1exq-assembly1_A CRYSTAL STRUCTURE OF THE HIV-1 INTEGRASE CATALYTIC CORE DOMAIN |
9c9m-assembly1_B |
0.05 | 0.33 | 3.3e+00 | 9c9m-assembly1_B HIV-1 intasome core bound with DTG |
6puy-assembly1_B |
0.05 | 0.35 | 4.5e+00 | 6puy-assembly1_B Structure of HIV cleaved synaptic complex (CSC) intasome bound with magnesium and INSTI XZ426 (compound 4d) |
1k6y-assembly1_D |
0.04 | 0.42 | 7.6e+00 | 1k6y-assembly1_D Crystal Structure of a Two-Domain Fragment of HIV-1 Integrase |
6puz-assembly1_B-2 |
0.04 | 0.36 | 5.5e+00 | 6puz-assembly1_B-2 Structure of HIV cleaved synaptic complex (CSC) intasome bound with magnesium and INSTI XZ446 (compound 4f) |
Functional interaction network (STRING v12, guilt-by-association)
Closest characterised functional partner: Rv2739c (transferase), high confidence from genomic context alone (score 885 excluding text-mining). This association is the citable seed of a function hypothesis for this hypothetical protein.
| Partner | Product | Score | No text-mining | Channels (≥400) |
|---|---|---|---|---|
Rv2739c |
transferase | 884 | 885 ctx | neighborhood:879 |
Rv2740 ephG |
epoxide hydrolase | 788 | 788 ctx | neighborhood:787 |
Rv2736c recX |
regulatory protein RecX | 546 | 546 ctx | neighborhood:544 |
Rv2737c recA |
recombinase A | 541 | 541 ctx | neighborhood:541 |
Rv3416 whiB3 |
redox-responsive transcriptional regulator WhiB3 | 465 | 465 ctx | cooccurence:465 |
Rv0880 |
HTH-type transcriptional regulator | 449 | 450 ctx | cooccurence:448 |
Rv0011c crgA |
cell division protein CrgA | 436 | 437 ctx | cooccurence:436 |
Rv2412 rpsT |
30S ribosomal protein S20 | 871 | 55 | textmining:870 |
Rv3908 mutT4 |
mutator protein MutT | 805 | 54 | textmining:803 |
Rv1321 nucS |
endonuclease NucS | 870 | 47 | textmining:870 |
Rv2645 hyp |
hypothetical protein | 810 | 46 | textmining:810 |
Rv3221c TB7.3 |
acetyl-CoA carboxylase biotin carboxyl carrier protein subunit | 806 | 46 | textmining:805 |
Rv3018c PPE46 |
PPE family protein PPE46 | 521 | 45 | textmining:519 |
Rv0387c |
Rv0387c, (MTV036.22c), len: 244 aa. Conserved hypothetical protein, showing some similarity to MTCI237.20c, and M17282|HUMEL20_1 Human elast | 520 | 44 | textmining:519 |
Rv3022c PPE48 |
Rv3021c, (MTV012.35c), len: 358 aa. PPE47, Member of Mycobacterium tuberculosis PPE family. Should be continuation of upstream ORF MTV012.36 | 511 | 42 | textmining:511 |
STRING combines evidence channels (neighborhood, fusion, cooccurrence, coexpression, experimental, database, text-mining) into a 0–1000 score. The ctx badge marks edges carried by the genomic-context channels (conserved neighborhood, fusion, phylogenetic co-occurrence), which are independent of orthology and structure and the strongest signal for an unknown gene. The no text-mining column recomputes the score from data alone, so a link that does not depend on the literature is visible. Association is a function hypothesis, not proof: corroborate with the operon context and the primary literature before assigning a function.
Evidence
- Foldseek vs AFDB-SwissProt: envelope-preserving system protein Rv2742c, TM 0.92, E 5e-3
- Structural homology vs AlphaFold-Swiss-Prot (Foldseek; 542k curated SwissProt structures), project 'Still unknown gene function' phase13, 2026-06-10. Fold/family-level, not a demonstrated function.
Sources
- Ancestral sequence & coordinates: Harrison LB et al. (2024), An imputed ancestral reference genome for the MTBC, doi:10.1101/2023.09.07.556366
- Product annotation: NCBI PGAP on MTBC0; legacy from H37Rv NC_000962.3 (RefSeq NP_217254.1)
- Domains: Pfam-A via hmmscan --cut_ga — DUF3046 (PF11248.14)
- Sequence-level signal: ESM Atlas (EvolutionaryScale × BioHub) — exploratory
- Controlled vocabulary: eggNOG-mapper 2.1.12 (Cantalapiedra et al. 2021,
doi:10.1093/molbev/msab293), eggNOG 5.0 DB
(Huerta-Cepas et al. 2019) — OG
2EFU9 - Curated reference: UniProt I6YA47 (TrEMBL, unreviewed; Evidence at protein level)
- Intra-MTBC selection: pN/pS and disruption from SPDI variants of 145 209 MTBC strains (this work, local collection vs H37Rv NC_000962.3)
- Model confidence: ESMFold per-residue pLDDT (mean 79.8, confident)
- Interaction network: STRING v12.0 (Szklarczyk et al. 2023,
doi:10.1093/nar/gkac1000), taxon 83332, CC-BY 4.0 —
16 functional partner(s); context anchor
Rv2739c - Primary literature: none located yet; annotation rests on the domain/homology sources above.
Ancestral MTBC0 protein sequence
>mtbc0_002913|Rv2738c| MLAGVRLTEFHERVALHFGAAYGSSVLLDHVLTGFDGRSAAQAIEDGVEPRDVWRALCADFDVPHDRW