Skip to content
🧬SeqMorph

Scalable mutation & analysis for DNA/RNA/Protein

Store-first sequence editing (StringStore, ChunkedStore), deterministic structural mutations, and a clean FastAPI backend—optimized for correctness, scalability, and reproducibility.

Tip: start the server locally, then hit /docs for interactive endpoints.

Pipeline

Sequence → SequenceStructureBaseStore (StringStore / ChunkedStore)
         → MutationEngine (invert/dup/translocate)
         → Analysis (GC, k-mers, ORFs)
         → FASTA + events.json + manifest.json + log
O(log k) edits
Deterministic (seed)
Headless-safe

Install

Requires Python 3.11+. Install dependencies from requirements.txt.

pip install -r requirements.txt
# macOS/Linux
export PYTHONPATH=./src
python src/SeqMorph_Main.py
# then open http://127.0.0.1:8000/docs
# Windows PowerShell
$env:PYTHONPATH = "$PWD\src"
python .\src\SeqMorph_Main.py
# then open http://127.0.0.1:8000/docs

Quick run & minimal smoke

Add a short DNA sequence, then run structural mutations and view the report.

Add sequence
curl -X POST http://127.0.0.1:8000/sequence/add \
  -H "Content-Type: application/json" \
  -d '{"sequence":"ATGACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"}'
Mutate & analyze
curl -X POST http://127.0.0.1:8000/mutate-and-analyze \
  -H "Content-Type: application/json" \
  -d '{
    "accession_id": "seq_1",
    "struct_rate": 3.0,
    "mean_seg_len": 200,
    "start": 1,
    "seed": 123,
    "save_outputs": true
  }' | python -m json.tool

Endpoints

MethodPathPurposeNotes
GET/healthLiveness probeReturns {"status":"ok"}
POST/sequence/addAdd raw sequenceAuto-detects type (DNA/RNA/Protein)
POST/sequence/fetchFetch by accessionNCBI/UniProt via SequenceFetcher
POST/mutate-and-analyzeRun structural mutationsReturns full event list + analysis report

Analysis features

Each run produces a concise, headless-safe report comparing original vs. mutated sequences. Designed to scale to large inputs.

GC content & composition
Per-sequence GC%, base counts, deltas.
K-mer frequencies
Counts for k=1..6 (configurable); top Δ between original and mutated.
Codon usage & translation
DNA/RNA codon usage and protein translation summaries.
ORF scan
Start/stop detection; count + longest ORF.
Mutation summary
Event counts (invert/dup/translocate), length change.
Entropy & complexity
Optional Shannon-entropy windows for structure/complexity shifts.
Statistical tests (opt-in)
Chi-square on selected k-mers and simple t-tests for GC% can be enabled for small/medium inputs. For very large k-mer spaces, the report defaults to “top differences” for memory safety.

Example report (truncated)

{
  "length": {"original": 40000, "mutated": 41234, "delta": 1234},
  "gc": {"original": 0.49, "mutated": 0.50, "delta": 0.01},
  "kmer": {
    "k": 4,
    "top_deltas": [{"kmer":"CGCG","delta": 42}, {"kmer":"ATGC","delta": -31}]
  },
  "codon_usage": {"AAA": 120, "AAC": 98, "...": "..."},
  "orf_scan": {"count": 12, "longest": {"start": 1234, "end": 5678, "length": 1345}},
  "events": {"invert": 3, "duplicate": 2, "translocate": 1}
}
Fields vary by sequence type and options; χ² is off by default for large k-mer sets.

Design highlights

  • Store-first sequence model: callers work with a registry; backends implement BaseStore (get/set/insert/delete + invert/dup/translocate).
  • Deterministic mutations: one RNG per engine with an explicit seed ensures reproducibility.
  • Fast analysis: GC%, k-mers, entropy, ORFs/translation (RNA supported) without requiring a window system.
  • Audit-friendly outputs: optional manifest + per-event logs + FASTA & events JSON.

DIY smoke test

Quick sanity check without extra files.

# macOS/Linux
SEQ=$(python - <<'PY'
import random; random.seed(1)
print(''.join(random.choice('ACGT') for _ in range(20000)))
PY
)
curl -s -X POST http://127.0.0.1:8000/sequence/add -H "Content-Type: application/json" -d "{\"sequence\":\"$SEQ\"}"
curl -s -X POST http://127.0.0.1:8000/mutate-and-analyze -H "Content-Type: application/json" -d '{"accession_id":"seq_1","struct_rate":3.0,"mean_seg_len":200,"start":1,"seed":123,"save_outputs":true}' | python -m json.tool
# Windows PowerShell
$seq = python - <<'PY'
import random; random.seed(1)
print(''.join(random.choice('ACGT') for _ in range(20000)))
PY
curl -s -X POST http://127.0.0.1:8000/sequence/add -H "Content-Type: application/json" -d "{""sequence"":""$seq""}"
curl -s -X POST http://127.0.0.1:8000/mutate-and-analyze -H "Content-Type: application/json" -d "{""accession_id"":""seq_1"",""struct_rate"":3.0,""mean_seg_len"":200,""start"":1,""seed"":123,""save_outputs"":true}" | python -m json.tool