Real-time hand detection in egocentric videos using victordibia/handtracking. Outputs bounding boxes for hands, specifically trained on EgoHands dataset. Supports video input/output with labeled hand boxes. Lightweight and fast for egocentric view applications.
Real-time hand detection system designed specifically for egocentric (first-person) video views. Trained on the EgoHands dataset, this lightweight model detects hand bounding boxes in video streams and can output labeled videos with hand annotations. Ideal for quick prototyping of hand-based interaction systems in AR/VR and wearable computing applications.
Companion JavaScript library: Handtrack.js is available for browser-based applications (https://github.com/victordibia/handtrack.js).
This skill should be used when:
Choose this when: You need fast, lightweight hand detection with bounding box outputs and don't require detailed joint-level pose estimation.
Consider alternatives: If you need 3D hand pose keypoints, hand-object segmentation, or multi-view tracking, see other skills in this category.
EgoHands-trained model: Specifically optimized for first-person perspective videos where hands are viewed from the wearer's viewpoint.
Bounding box format:
{
'bbox': [x, y, width, height], # Pixel coordinates
'score': confidence, # 0.0 to 1.0
'label': 'hand' # Detection label
}
Input video processing: Process entire video files and export annotated results.
Workflow:
# Clone repository
git clone https://github.com/victordibia/handtracking.git
cd handtracking
# Install dependencies (TensorFlow 1.x compatible)
pip install tensorflow==1.15.0 opencv-python numpy
# Run hand detection on video
python run.py \
--input_video your_egocentric.mp4 \
--output_video output_labeled.mp4 \
--threshold 0.5 # Confidence threshold
Output video features:
Live camera processing: Process webcam streams in real-time for interactive applications.
import handtracking
# Initialize detector
detector = handtracking.HandDetector()
# Process webcam stream
detector.detect_from_webcam(
display=True,
save_video=False,
confidence_threshold=0.6
)
Applications:
JavaScript companion library: Use the same model technology in web applications.
Integration:
<script src="https://cdn.jsdelivr.net/npm/handtrackjs/dist/handtrack.min.js"></script>
<script>
const model = await handTrack.load();
const video = document.getElementById('video');
// Detect hands in video stream
const predictions = await model.detect(video);
predictions.forEach(prediction => {
console.log(prediction.bbox); // [x, y, width, height]
console.log(prediction.score); // Confidence score
});
</script>
Browser capabilities:
# Clone repository
git clone https://github.com/victordibia/handtracking.git
cd handtracking
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install tensorflow==1.15.0
pip install opencv-python numpy pillow
# Download pre-trained model
# Model will be automatically downloaded on first run
Model files: Automatically downloaded from the repository on first use (~20MB).
# For web applications
npm install handtrackjs
# Or use directly from CDN
import cv2
from handtracking import HandDetector
# Initialize detector
detector = HandDetector()
# Load video
cap = cv2.VideoCapture('egocentric_video.mp4')
# Get video properties
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Setup video writer
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_labeled.mp4', fourcc, fps, (width, height))
# Process frames
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Detect hands
detections = detector.detect_hands(frame)
# Draw bounding boxes
for det in detections:
x, y, w, h = det['bbox']
score = det['score']
# Draw box
color = (0, 255, 0) if score > 0.7 else (0, 0, 255)
cv2.rectangle(frame, (x, y), (x+w, y+h), color, 2)
# Add label
label = f"Hand: {score:.2f}"
cv2.putText(frame, label, (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
# Save frame
out.write(frame)
cap.release()
out.release()
import cv2
import numpy as np
from handtracking import HandDetector
detector = HandDetector()
cap = cv2.VideoCapture('egocentric.mp4')
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
detections = detector.detect_hands(frame)
# Extract and save hand regions
for i, det in enumerate(detections):
x, y, w, h = det['bbox']
# Crop hand region
hand_roi = frame[y:y+h, x:x+w]
# Save hand image
if det['score'] > 0.7: # High confidence only
cv2.imwrite(f'hand_{frame_count}_{i}.jpg', hand_roi)
frame_count += 1
cap.release()
from handtracking import HandDetector
import cv2
detector = HandDetector()
cap = cv2.VideoCapture(0) # Webcam
while True:
ret, frame = cap.read()
if not ret:
break
detections = detector.detect_hands(frame)
# Display statistics
num_hands = len(detections)
avg_confidence = sum(d['score'] for d in detections) / num_hands if num_hands > 0 else 0
# Overlay text
cv2.putText(frame, f"Hands: {num_hands}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.putText(frame, f"Avg Conf: {avg_confidence:.2f}", (10, 70),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Hand Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
// Load model
const modelParams = {
flipHorizontal: true,
maxNumBoxes: 2,
iouThreshold: 0.5,
scoreThreshold: 0.6,
};
handTrack.load(modelParams).then(model => {
// Model loaded
console.log("Model loaded");
// Detect from video element
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const context = canvas.getContext('2d');
function detectFrame() {
model.detect(video).then(predictions => {
// Clear canvas
context.clearRect(0, 0, canvas.width, canvas.height);
// Draw video frame
context.drawImage(video, 0, 0, canvas.width, canvas.height);
// Draw predictions
predictions.forEach(prediction => {
const [x, y, width, height] = prediction.bbox;
context.strokeStyle = '#00FF00';
context.lineWidth = 4;
context.strokeRect(x, y, width, height);
// Add label
context.fillStyle = '#00FF00';
context.fillText(
`Hand: ${prediction.score.toFixed(2)}`,
x, y - 10
);
});
// Continue detection
requestAnimationFrame(detectFrame);
});
}
// Start detection
detectFrame();
});
This skill works effectively with:
Architecture: Lightweight CNN-based object detection model
Detection performance (on EgoHands test set):
Scope: This skill provides bounding box detection only. For more detailed analysis, consider:
ap229997-hands skill for joint keypointsowenzlz-egohos skill for pixel-level masksfacebookresearch-hot3d for 3D trackingKnown limitations:
When to upgrade:
CPU optimization:
# Reduce input resolution for faster processing
detector = HandDetector()
frame = cv2.resize(frame, (640, 480)) # Downsample
detections = detector.detect_hands(frame)
GPU acceleration (if available):
# TensorFlow with GPU support
import tensorflow as tf
# Install tensorflow-gpu for GPU acceleration
Batch processing:
# Process multiple videos in parallel
from concurrent.futures import ThreadPoolExecutor
def process_video(video_path):
detector = HandDetector()
return detector.process_video(video_path)
with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(process_video, video_list)
Issue: Model not downloading automatically
Issue: TensorFlow version conflicts
pip install tensorflow==1.15.0Issue: Low detection accuracy
Issue: Slow processing speed
Issue: No hands detected
If you use this hand tracking implementation in research, please cite:
@article{betancourt2015egohands,
title={Egohands: A dataset for egocentric hand interactions},
author={Betancourt, Alex and Orozco, Jorge and Bolaños, Mauricio},
journal={arXiv preprint arXiv:1509.06044},
year={2015}
}
And the original repository:
@software{victordibia_handtracking,
author = {Victor Dibia},
title = {Real-time Hand Detection in Python using TensorFlow},
url = {https://github.com/victordibia/handtracking},
year = {2018}
}
Consider exploring these related directions: