Traffic-IT Dataset Explore

Enhancing Multimodal Large Language Models for Traffic Scene Understanding

Introduction

Traffic-IT is a comprehensive dataset designed to enhance the capabilities of multimodal large language models (MLLMs) in understanding complex traffic scenes. With a focus on diverse traffic scenarios, the dataset aims to support research and development in intelligent transportation systems, autonomous driving, and smart city applications.



Dataset Statistics


230,000 Question-Answer pairs across 30,000 images

Covers 15 categories of weather and location, including sunny, rainy, snowy, foggy conditions

220,950 scene-specific annotations for various scenarios like main roads, city streets, and rural areas

Collected from 30,000 high-quality images across different times of day, environments, and locations

Includes 1,800+ hours of expert validation to ensure data accuracy and relevance

Global Scope: Includes images from Beijing, Chengdu, Melbourne, and more





Data Collection and Annotation



Traffic-IT was developed using a three-step process:


Image Collection: Sourced from dashcam footage, open-source datasets, and manual photography across diverse conditions and locations.

Question Design and AI Answering: Thirty traffic-specific questions were developed by experts, and answers were generated using GPT-4, covering real-world scenarios like weather impacts and traffic maneuvers.

Expert Validation: Each answer was reviewed and refined by traffic domain practitioners to ensure relevance and accuracy.




Sample Images and Annotations


Below are examples from the Traffic-IT dataset, illustrating typical traffic scenes with corresponding Q&A pairs that reflect real-world traffic situations.





seo seo