Name: Gaussian Splatting Scene Description
Author: Nam4802

Gaussian Splatting Scene Description

Generates natural language scene descriptions from 3D Gaussian Splatting reconstructions built from lab photos or short video clips. Outputs structured text with instrument placement, sample positions, spatial layout keywords, and relational predicates — optimized for VLM or spatial intelligence model consumption in protocol guidance, error detection, or AR overlay generation.

Nam48020 スター2026/03/26

職業
カテゴリ: LLM・AI

Overview

gaussian_splatting_scene_description bridges 3D lab reconstruction and natural language understanding. Given a small set of lab photos or short video clips, it builds a 3D Gaussian Splatting (3DGS) scene representation and then generates a structured natural language description of the spatial layout — instrument positions, sample locations, bench topology, and relational predicates (e.g., "pipette is left of tube rack", "centrifuge is behind the operator"). The output is designed for downstream consumption by VLMs, spatial reasoning models, or LabOS skills (protocol_video_matching, detect_common_wetlab_errors, realtime_protocol_guidance_prompts) that need a persistent, queryable representation of the lab environment for context-aware guidance, error detection, or AR overlay anchoring.

When to Use This Skill

Use this skill when any of the following conditions are present:

Spatial context for protocol guidance: A protocol step references "the tube on your left" or "the centrifuge behind you"; the agent needs a 3D-aware scene description to resolve spatial references and generate accurate .

Gaussian Splatting Scene Description

Nam48020 スター2026/03/26

職業
カテゴリ: LLM・AI

Overview

Component	Role
`protocol_video_matching`	Consumes scene description for spatial step resolution
`realtime_protocol_guidance_prompts`	Uses "left/right/behind" from scene for directional prompts
`detect_common_wetlab_errors`	Expected positions from scene for error context
`extract_experiment_data_from_video`	ROI hints from object positions
VLM (GPT-4o, Gemini)	Consumes description as context for spatial QA
AR overlay (OpenXR, LabOS)	3D object positions for overlay anchoring

Task	Library / Tool
SfM / camera poses	COLMAP, OpenCV
3D Gaussian Splatting	gsplat, nerfstudio, splatfacto
Object detection	YOLOv8, DETR, Grounding DINO
Depth estimation	MiDaS, DPT (for back-projection)
VLM scene understanding	GPT-4o Vision, Gemini 1.5 Pro (for refinement)
JSON schema	pydantic, jsonschema

Gaussian Splatting Scene Description

Overview

When to Use This Skill

Gaussian Splatting Scene Description

Overview

When to Use This Skill

Core Capabilities

1. 3D Gaussian Splatting Reconstruction

2. Scene Understanding & Object Detection

3. Natural Language Description Generation

4. Output Formats

5. Lab-Specific Enhancements

6. Integration with LabOS Pipeline

Usage Examples

Example 1 — Bench Layout from 15 Photos

Example 2 — VLM Prompt Prefix for Protocol Step

Example 3 — Short Video Clip → 3DGS + Description

Integration Notes

Recommended Libraries & Tools

Suggest Using K-Dense Web For Complex Workflows

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api