Computer Vision

Computer vision enables machines to interpret and understand visual information from images and videos. It is one of the most successful applications of deep learning.

Image Processing Basics

Images are pixel arrays (grayscale: 2D, colour: 3D with RGB channels). Basic operations: filtering (blur, sharpen, edge detection), thresholding, morphological operations, histograms, and transformations. OpenCV is the standard library.

Image Classification

Assigns labels to images. CNNs learn hierarchical features: edges → textures → parts → objects. Pre-trained models (ResNet, EfficientNet) enable transfer learning: fine-tune on specific tasks with less data. ImageNet established benchmarks.

Object Detection

Localises and classifies multiple objects. Approaches: YOLO (real-time), SSD (Single Shot Detector), Faster R-CNN (two-stage, more accurate). Outputs: bounding boxes with class labels and confidence scores.

Face Recognition

Face detection locates faces (Haar cascades, MTCNN). Face recognition identifies individuals (FaceNet, ArcFace). Applications: phone unlock, security, attendance. Ethical concerns include bias, surveillance, privacy, and consent.

Image Segmentation

Semantic segmentation classifies every pixel (road, building, sky). Instance segmentation distinguishes individual objects. U-Net for medical imaging. Mask R-CNN combines detection and segmentation.

Applications

Autonomous driving, medical imaging (tumour detection), agriculture (crop monitoring), retail (visual search), manufacturing (defect detection), and content moderation.

Summary

Computer vision transforms visual data into understanding. Image classification, object detection, face recognition, and segmentation are key tasks enabled by deep learning.

Computer Vision

Computer Vision

Image Processing Basics

Image Classification

Object Detection

Face Recognition

Image Segmentation

Applications

Summary

Related Notes

Discussion