Most collections of training data used to develop self-driving car systems tend to focus on everyday objects like regular cars, people walking, and bicycles. This common approach, however, often leaves out important but less frequently seen vehicles such as ambulances and police cars. A newly released computer-generated dataset, named EMS3D-KITTI, aims to close this gap. It offers a well-balanced collection of scenes that include emergency medical vehicles. The dataset was created by researchers led by Dr. Chandra Jaiswal from North Carolina Agricultural and Technical State University. Their work is published in the journal Data in Brief.

To build this dataset, the Dr. Jaiswal’s team used a virtual driving platform called Car Learning to Act, a realistic simulation environment used for training and testing self-driving systems. This tool allowed them to simulate realistic traffic situations, including ambulances and police cars, as well as other road users. They equipped several digital test vehicles with cameras and laser sensors, known as Light Detection and Ranging or LiDAR, which measure distance using light to create detailed 3D maps of surroundings. These vehicles recorded scenes across different town layouts. These virtual towns included a variety of conditions, such as changing weather and unpredictable vehicle movements, to mirror real-life driving as closely as possible. All the captured data was then organized using a widely accepted format designed by Karlsruhe Institute of Technology and Toyota Technological Institute, which is a standard structure used in the field of autonomous vehicle research to store and process visual and spatial data.

Using this carefully planned method, the team recorded many different types of objects on the road. Emergency medical vehicles made up about a quarter of the total, which is a much higher share than in most existing datasets. “This dataset addresses a significant gap in most publicly available computer vision datasets by overcoming the challenge of limited data for rare objects,” Dr. Jaiswal explained.

The virtual ambulances and police cars were placed randomly in different parts of the simulated towns. This setup allowed the camera-equipped test vehicles, often referred to as ego vehicles meaning the main vehicle from which data is captured, to come across them from many angles and in different situations. The team also made sure that the images they kept for the dataset were varied by saving only selected frames. This helped ensure the dataset showed a wide range of driving scenarios. “To achieve a balanced presence of emergency medical vehicles in the dataset, we implemented a strategy within Car Learning to Act that increased the frequency of emergency medical vehicles in each scenario,” Dr. Jaiswal said.

The format used to organize this dataset makes it easy for researchers to work with. Each recorded frame includes a color image, a laser-based depth map known as a point cloud that shows the exact position of surfaces in three-dimensional space, a file showing camera settings called a calibration file, and a list of detected objects with their size, location, and direction. These details help train computer systems to accurately recognize and track different types of vehicles and people on the road. Key features such as how much of an object is visible or blocked, which is called truncation and occlusion, and the direction it is facing, referred to as orientation angles, are also included.

To test the quality of their dataset, the researchers ran their simulations in a number of different virtual towns. These towns represented a mix of environments, from quiet rural areas to busy city streets. This variety helps ensure that the data reflects many types of real-world roads. The end result is a rich training tool that helps improve how well self-driving systems perform across different settings.

One interesting part of the dataset is how it labels the direction from which each emergency vehicle is seen—whether it’s from the front, side, or back. This gives computer models more experience recognizing vehicles from multiple viewpoints, making the systems better at spotting them in different traffic conditions. On average, emergency vehicles showed up regularly in each recorded scene, giving the models more chances to learn from them.

Even though the dataset is based on simulations, the creators aimed to make it as realistic as possible. They also highlight that using virtual data has some limits, especially when compared to real-world images. To address this, they recommend further testing to confirm that models trained with this dataset work well in actual traffic. Still, the dataset is a step forward in helping automated driving systems better identify and respond to emergency vehicles, which is essential for safe and effective road navigation.

In conclusion, the EMS3D-KITTI dataset adds something important to the tools currently available for training self-driving cars. By focusing on emergency vehicle recognition, it supports the development of smarter, more responsive systems. As work continues to advance automated driving, resources like this dataset will become even more valuable.

Journal Reference

Jaiswal C., Acquaah S., Nenebi C., AlHmoud I., Islam A.K.M., Gokaraju B., “EMS3D-KITTI: Synthetic 3D dataset in KITTI format with a fair distribution of Emergency Medical Services vehicles for autodrive AI model training.” Data in Brief, 2025. DOI: https://doi.org/10.1016/j.dib.2024.111221

About the Author

Dr. Chandra Jaiswal holds a bachelor’s degree in computer science and engineering, an MBA, and a PhD in AI and Data Science from North Carolina Agricultural and Technical State University, Greensboro, USA. With over 18 years of experience in supply chain management, he is a seasoned Distribution System Analyst who excels in integrating advanced technologies such as AI, Computer Vision, and Robotics to optimize supply chain operations. His contributions to robotics have also added significant value to Autonomous, Augmented Reality (AR), and Virtual Reality (VR) systems, showcasing his ability to bridge cutting-edge innovations with practical applications. Chandra’s leadership and expertise have modernized supply chain processes, enhanced operational efficiency, and positioned him as a forward-thinking innovator in supply chain and autonomous systems.