Latest visible artifacts

TensionLM / TS-Reasoner v10 and TS Trace Distilled v11.

These are the newest Hugging Face-visible model-line artifacts verified during this site sync. The claim boundary is part of the page: exact eval metrics are shown only where public cards or metadata expose them.

TensionLM-117M-TS-Reasoner-v10

HF-visible external artifact

A CPU TS reasoner package that keeps the frozen TensionLM-117M-Reasoning-v2 substrate and uses explicit TS graph/program operators. It supports graph/transitivity, arithmetic traces, code traces, boolean logic, set operations, and string transforms.

Receipt: HF card reports bounded system receipts including TAC v2/v3/v4 at 120/120 each, public v10 examples at 30/30, and generated-family receipts for standard, paraphrase, unknown, mixed, and all-family mixed prompts.

Limit: The model card says these are system scores over generated formal families, not raw LLM scores and not open-ended natural language understanding.

Open model card

TensionLM-TS-Trace-Distilled-v11

HF-visible external artifact

A compact CPU trace-distilled student trained from v10 trace rows. The artifact includes a student checkpoint, tokenizer, trace-distillation data splits, generation/eval scripts, and held-out imitation eval files.

Receipt: HF card/API report 1,920 dataset rows, 1,632 train / 144 val / 144 test, about 1.1M student parameters, 580 training steps, and validation perplexity around 2.45.

Limit: The v11 card reports raw exact answer hits 0/48 and raw exact rule hits 0/48. It is a neural bridge dataset/checkpoint, not the working reasoner.

Open model card

How to inspect or run

Use the commands from each model card after cloning or downloading the artifact. v10 exposes a `ts_reasoner_v10.py solve ... --json` path. v11 exposes `inference.py` and `eval_trace_distilled_v11.py`, but the card also says exact raw generation is not solved yet.

# v10 model-card path
python ts_reasoner_v10.py solve \
  --prompt "Logic board: A=true; B=false. Evaluate A XOR B:" \
  --category boolean_logic --json

# v11 model-card path
python eval_trace_distilled_v11.py \
  --checkpoint student/latest.pt \
  --test_jsonl data/test.jsonl

What this model line is trying to test

The line tests whether TS-Reasoner traces can become a training signal for tension-aware models while keeping the verifier/control loop inspectable. v10 is the bounded system/control artifact; v11 is the first compact student trained on v10-style traces.

How it connects to TS-Reasoner traces

TS-Reasoner defines the trace contract: candidate chains, local tension, global tension, selected action, rejected alternatives, repairs, settled answer, and failure reason. TensionLM artifacts can propose or imitate trace text, but TS-Reasoner remains the verifier path for claims.

Receipts and limitations

The visible receipts are narrow. v10 receipts are generated-family system receipts. v11 receipts show dataset/training/eval shape and a failed raw exact-generation result. Neither artifact proves broad reasoning, general chat ability, or transformer superiority.

Next technical step

The next useful step is to keep v10 as the bounded verifier baseline, rerun v11-style distillation with clearer held-out imitation metrics, and publish trace-level receipts that show when a learned proposer improves, fails, or abstains under the verifier.