Dataset

About dataset releases

Browse new open‑source dataset releases for training and fine‑tuning local AI. This archive covers curated collections, benchmark data, and community‑built datasets.

Latest datasets

June 28, 2026

Glint-Research Archives AI Thought Paths In Fable-5-traces

By vramkickedin

Fable-5-traces is a newly released dataset that captures the behavior of the Fable 5 coding agent. This collection converts original agent logs into a format that allows users to inspect […]

April 30, 2026

Tstars-VTON Surfaces To Elevate Realistic Virtual Outfit Testing

By vramkickedin

Tstars-VTON is an open evaluation dataset designed to test virtual try-on models under realistic shopping conditions. It contains 1,780 image pairs covering layered clothing, footwear, and accessories across dozens of […]

April 24, 2026

BCE-Prettybird-Nano-Math-v0.1 Sharpens Logic Skills

By vramkickedin

BCE-Prettybird-Nano-Math-v0.1 is a structured collection of 500 math problems and answers built to train language models on numerical reasoning. The dataset links specific prompts with input values and expected results, […]

April 7, 2026

World Model Bench Tests if AI Can Think Not Just See

By vramkickedin

World Model Bench is a new benchmark that tests whether AI world models can actually think about a scene rather than just generate smooth video. It measures cognitive intelligence through […]

March 26, 2026

Michael Hafftka Catalog Raisonné Chronicles 50 Years of Art

By vramkickedin

The Michael Hafftka Catalog Raisonné is a new open dataset containing approximately 3,800 artworks by a single artist spanning five decades. The collection covers work from the 1970s through 2025 […]

March 17, 2026

MoonshotAI WorldVQA Tests AI Memory

By vramkickedin

WorldVQA is a new benchmark designed to test how well AI models can identify and name visual objects from memory. Created by MoonshotAI, it measures factual visual knowledge rather than […]

March 15, 2026

Nyuuzyou Preserves Google Code Archive

By vramkickedin

The Google Code Archive is a massive dataset that preserves source code from the defunct Google Code hosting service. It contains over 65 million files gathered from nearly 500,000 repositories, […]