Dataset

About dataset releases

Browse new open‑source dataset releases for training and fine‑tuning local AI. This archive covers curated collections, benchmark data, and community‑built datasets.

Latest datasets

April 30, 2026
Tstars-VTON Surfaces To Elevate Realistic Virtual Outfit Testing

Tstars-VTON is an open evaluation dataset designed to test virtual try-on models under realistic shopping conditions. It contains 1,780 image pairs covering layered clothing, footwear, and accessories across dozens of […]

Read More
April 24, 2026
BCE-Prettybird-Nano-Math-v0.1 Sharpens Logic Skills

BCE-Prettybird-Nano-Math-v0.1 is a structured collection of 500 math problems and answers built to train language models on numerical reasoning. The dataset links specific prompts with input values and expected results, […]

Read More
April 7, 2026
World Model Bench Tests if AI Can Think Not Just See

World Model Bench is a new benchmark that tests whether AI world models can actually think about a scene rather than just generate smooth video. It measures cognitive intelligence through […]

Read More
March 26, 2026
Michael Hafftka Catalog Raisonné Chronicles 50 Years of Art

The Michael Hafftka Catalog Raisonné is a new open dataset containing approximately 3,800 artworks by a single artist spanning five decades. The collection covers work from the 1970s through 2025 […]

Read More
March 17, 2026
MoonshotAI WorldVQA Tests AI Memory

WorldVQA is a new benchmark designed to test how well AI models can identify and name visual objects from memory. Created by MoonshotAI, it measures factual visual knowledge rather than […]

Read More
March 15, 2026
Nyuuzyou Preserves Google Code Archive

The Google Code Archive is a massive dataset that preserves source code from the defunct Google Code hosting service. It contains over 65 million files gathered from nearly 500,000 repositories, […]

Read More