Capturing Input Data
gsplat supports a variety of input formats using the data processing mechanisms provided by Nerfstudio. This tutorial uses a set of individually captured images as input.
Data capture practices may vary depending on the scene type. Outdoor environments, indoor spaces, human subjects, and small objects each benefit from different techniques. However, the following general guidelines apply in most scenarios and help improve the quality of 3D Gaussian Splatting results.
General guidelines
Use consistent, soft lighting
Illuminate the scene with even, diffuse light to reduce shadows and reflections. Avoid harsh directional light, which can create noise and degrade reconstruction quality.
Capture high-quality still images
Still photos typically produce better results than video frames due to reduced motion blur and more stable exposure. If using video, record at a high frame rate (at least 60 FPS) under steady lighting.
Maintain sufficient image overlap
Each image should overlap with adjacent views by about 20 to 30 percent. Overlap is essential for feature matching in Structure-from-Motion (SfM). Video often provides this overlap automatically.
Ensure parallax by moving the camera
Change the camera’s physical position between shots. Avoid rotating in place without translation, as this limits geometric information and may cause pose estimation to fail.
Keep camera settings fixed
Use manual focus and fixed focal length to keep camera intrinsics consistent. Disable zoom, autofocus, and auto exposure.
Prevent motion blur and distortion
Use a tripod or stabilizer. Avoid low-light conditions that require long exposures. Check that images are sharp and properly focused.
Include texture-rich surfaces
Scenes should have enough texture, edges, or detail to enable reliable feature detection. Flat, uniform, or glossy surfaces may reduce SfM effectiveness.
Keep the scene static
Do not allow movement during capture. Objects, people, or lighting changes can interfere with pose estimation.
Choose an appropriate image resolution
Higher resolutions improve detail but increase memory usage and training time. Choose a resolution that balances quality and performance.
Capture enough views
For small objects or scenes, 100 to 250 well-aligned images is a good starting point. Larger or full 360-degree models require more coverage. Make sure areas like the underside of objects are not missed.
Use multi-camera rigs when helpful
A multi-camera setup can speed up capture by recording multiple views at once. This is useful for dynamic scenes or time-limited sessions.
About the sample data
This tutorial uses 84 input images at 3000 × 2000 resolutioncaptured with a Canon M200 camera mounted on a tripod. The plush toy was placed on a chair covered with two stapled 50 × 70 cm sheets of white paper, forming a curved background. An extra sheet underneath allowed for easy rotation of the object during capture, while the camera remained fixed for each donut-shaped sweep.
Photos were taken indoors using only ambient light. Shadows and background elements are visible. This was intentional to simulate real-world conditions and show that good results are still possible without ideal lighting or studio setups. Lighting was uneven and no per-image post-processing was performed. This demonstrates how tools like SuperSplat can help correct brightness and color after training. The goal is to encourage hands-on learning, even with imperfect data.
Note: You can achieve similar results using a modern smartphone, as long as the lighting is adequate and the camera is held steadily.
Note: The plush toy is personally owned and is used here strictly for technical demonstration.

Selected input images used for 3D reconstruction.

Capture setup showing the plush toy, tripod-mounted camera, and curved paper background on a chair (left); live preview of the object on the camera screen during capture (right).
Observation: Gaussian Splatting and fine details
The plush toy was chosen to demonstrate one of the key strengths of 3D Gaussian Splatting: its ability to capture soft, detailed surfaces. Traditional mesh-based pipelines often struggle with materials like fur, resulting in unrealistic or incomplete geometry. In contrast, representing the scene as a set of 3D Gaussians helps preserve fine textures and subtle variations more effectively.
© 2025 SmartDataScan.
This section is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.