Computer Vision
Computer vision enables machines to interpret and understand visual information from images and videos. It is one of the most successful applications of deep learning.
Image Processing Basics
Images are pixel arrays (grayscale: 2D, colour: 3D with RGB channels). Basic operations: filtering (blur, sharpen, edge detection), thresholding, morphological operations, histograms, and transformations. OpenCV is the standard library.
Image Classification
Assigns labels to images. CNNs learn hierarchical features: edges → textures → parts → objects. Pre-trained models (ResNet, EfficientNet) enable transfer learning: fine-tune on specific tasks with less data. ImageNet established benchmarks.
Object Detection
Localises and classifies multiple objects. Approaches: YOLO (real-time), SSD (Single Shot Detector), Faster R-CNN (two-stage, more accurate). Outputs: bounding boxes with class labels and confidence scores.
Face Recognition
Face detection locates faces (Haar cascades, MTCNN). Face recognition identifies individuals (FaceNet, ArcFace). Applications: phone unlock, security, attendance. Ethical concerns include bias, surveillance, privacy, and consent.
Image Segmentation
Semantic segmentation classifies every pixel (road, building, sky). Instance segmentation distinguishes individual objects. U-Net for medical imaging. Mask R-CNN combines detection and segmentation.
Applications
Autonomous driving, medical imaging (tumour detection), agriculture (crop monitoring), retail (visual search), manufacturing (defect detection), and content moderation.
Summary
Computer vision transforms visual data into understanding. Image classification, object detection, face recognition, and segmentation are key tasks enabled by deep learning.