Introduction:
Image segmentation is a process that subdivides an image into it’s constituent parts of objects. It is a key task in comuter visionand image processing. It can be formulated as a pixel classification problem with 3 different approaches.
a) semantic
b) instance
c) panoptic
segmentated into colors are visualization
Examples and Applications:
a) image editing — https://arxiv.org/pdf/2111.03186 (using GAN)
b) dataset augmentation
c) Self-driving vehicles — Panoptic segmentation is crucial for establishing safety and efficiency in autonomous vehicles, as it enables the AI system to generate segmentation masks that differentiate between other vehicles, pedestrians, and road signs simultaneously and in real time. This allows the system to accurately assess the surrounding situation and make prompt decisions. To accomplish this, appropriate hardware such as LiDAR cameras and sensors are employed.
d) Medical imaging — Visualizing cell nuclei is a task that requires precision, especially to diagnose diseases like cancer. Often, it’s difficult to accurately detect cells during the screening that overlap and are diverse in nature. Semantic segmentation models were commonly used but showcased gaps in data and inaccuracies in the case of overlapping cells. Panoptic segmentation, specifically with deep learning, has proven to outperform the previous technologies.
e) Smart cities — Computer vision and AI play a vital role in constructing smart cities. With the help of state-of-the-art systems, cities can monitor, manage, and optimize all spheres from utilities, to waste management, security, healthcare, education, roads, and much more. Panoptic segmentation offers an accurate and efficient model for smart cities to rely on. Think of the importance panoptic segmentation has for autonomous vehicles and expand it across an entire city.
f) Augmented reality (AR) and virtual reality (VR) — By precisely segmenting and comprehending the objects and scenes, panoptic segmentation assigns accurate boundaries and classifications, thereby enhancing the quality of AR and VR simulations.
g) Surveillance and security: Panoptic segmentation is used in video surveillance systems to identify and track objects of interest within packed scenes, improving security and threat detection.
Types of segmentations
a) Semantic Segmentation: produces a contextual description of the “stuff” in the image. Classes are isolated but not the objects within the class We don’t have access to single object.
b) Instance Segmentation: produces better description that can be list objects as individual instances of “things” but lower generation on the environment and background “stuff”
c) Panoptic Segmentation: combination of the semantic and instance segmentation. We have access to not only the environment context but also to the individual objects. So we see both the “stuff” and “things”
Image Segmentation using Deep Learning:
Deconstruction: Feature extraction (backbone, encoder)
Reconstruction: Upsampler(decode)
Now you have labels in terms of color the baby and the background.
Semantic Segmentation Output
a) 1-hot encoding, just like classification
b ) Score applied to each pixel
c) Class with highest score sets the pixel
Given a grayscale(H,W,1) or RGB(H,W,3) image generate a segmentation mask which is of the same dimension as the image and consist of categorical values from 1 to N (N is the number of classes).
Instance Segmentation — proto masks
a) additional model output computes per-instance mask predcitions
b) learn to separate objects in each mask which are then fused with semantic mask
c) Handles overlapping instances.
Panoptic Segmentation
Fusing semantic and instance segmentation to detect things and stuff.
ref: https://arxiv.org/pdf/1812.03904
Dataset:
Naive way of labeling each pixel is very time consuming. Having a bounding box or a polygon give approximate labeling still requires a high annotation effort.
Label masks
object polygons
very high annotation effort
“Segment anything Model” has been a game changer for annotation effort.
Accuracy metrics:
a) Similar to IoU concept
b) Panoptic Quality “PQ” is anew metric and applied to in part to all segmentation challenges
c) PQ metrics for things and stuff categories.
COCO metrics “Panoptic Evaluation”
Computational requirements
a) same as backbone as detection model.
b) Segmentation head incurs ~20% overhead
seg much more correct in bounfing boxes.
c) Post processing demands
c) Instance and panoptic incus additional overhead.
Summary
a) Segmentation is a technique that enables to segregagte all the objects in an image along with their contours.
b ) It improves detection models for objects with more complex shape
c) It can be considered an image classification task at a pixel lavel
Resources
datasets
a) https://cocodataset.org
b)https://www.cityscapes-dataset.com
c) https://ai.facebook.com/datasets/segment-anything-its-image-segmentation-architecture
models:
https://segment-anything.com SAM model from FB
c) https://learnopencv.com/yolo5-instance-segmentation/
Some diagrams were taken from