AI Security Lab
A guided lab to peer review AI training
datasets.
You’ll pick a dataset release (TUUID), fetch 2–5
samples,
compare “what the AI saw”, and export a peer review report you can share or archive.
What am I looking at?
Close
Each sample record includes normal metadata + a field named decrypted_training_data.
That training data is a structured set of features (numbers/flags/fields) used by the
model.
This lab helps you verify: (1) schema consistency, (2) sane numeric ranges, (3) label plausibility, (4) duplicates/outliers, and (5) provenance fields.
What is a TUUID? Close
A TUUID is a dataset release ID. Each release contains a
list of sample hashes.
You pick a release, then you pick a few sample hashes from it to review.
Your goal is to sanity-check what’s inside: consistent schema, plausible values, and label/provenance quality.
How to choose samples Close
Easy mode: click Random 2 and move on.
Better: pick one “normal looking” and one “odd looking” sample after you fetch and see values.
Finding: if multiple hashes map to the same CID later, that’s worth noting in the report.
ipfs/find → ipfs/search.
What should I watch for? Close
After fetch, you’ll review:
- Missing fields / inconsistent schema
- Outliers (entropy, counts, sizes)
- Verdict vs capabilities mismatch
- Duplicates (same CID across hashes)
| SHA-256 | CID | Verdict | Model | Capabilities | Notes |
|---|
Select and compare two samples.
Select and compare two samples.
| Feature | Left | Right | Δ | Why it matters |
|---|
- • Feature keys consistent across samples? Missing keys can bias a model.
- • Numeric ranges sane? (entropy typically 0–8-ish, counts not absurdly high)
- • Verdict plausible vs capabilities?
- • Duplicates/near-duplicates (same CID / repeated hash)?
- • Provenance present (license, upload timestamp, model version, tx where relevant)?