Semantic Segmentation: Pixel level classification

Semantic Segmentation: Pixel level classification

Introduction:

Image segmentation is a process that subdivides an image into it’s constituent parts of objects. It is a key task in comuter visionand image processing. It can be formulated as a pixel classification problem with 3 different approaches.

a) semantic

b) instance

c) panoptic

segmentated into colors are visualization

Examples and Applications:

a) image editing — https://arxiv.org/pdf/2111.03186 (using GAN)

b) dataset augmentation

c) Self-driving vehicles — Panoptic segmentation is crucial for establishing safety and efficiency in autonomous vehicles, as it enables the AI system to generate segmentation masks that differentiate between other vehicles, pedestrians, and road signs simultaneously and in real time. This allows the system to accurately assess the surrounding situation and make prompt decisions. To accomplish this, appropriate hardware such as LiDAR cameras and sensors are employed.

d) Medical imaging — Visualizing cell nuclei is a task that requires precision, especially to diagnose diseases like cancer. Often, it’s difficult to accurately detect cells during the screening that overlap and are diverse in nature. Semantic segmentation models were commonly used but showcased gaps in data and inaccuracies in the case of overlapping cells. Panoptic segmentation, specifically with deep learning, has proven to outperform the previous technologies.

e) Smart cities — Computer vision and AI play a vital role in constructing smart cities. With the help of state-of-the-art systems, cities can monitor, manage, and optimize all spheres from utilities, to waste management, security, healthcare, education, roads, and much more. Panoptic segmentation offers an accurate and efficient model for smart cities to rely on. Think of the importance panoptic segmentation has for autonomous vehicles and expand it across an entire city.

f) Augmented reality (AR) and virtual reality (VR) — By precisely segmenting and comprehending the objects and scenes, panoptic segmentation assigns accurate boundaries and classifications, thereby enhancing the quality of AR and VR simulations.

g) Surveillance and security: Panoptic segmentation is used in video surveillance systems to identify and track objects of interest within packed scenes, improving security and threat detection.

Types of segmentations

a) Semantic Segmentation: produces a contextual description of the “stuff” in the image. Classes are isolated but not the objects within the class We don’t have access to single object.

b) Instance Segmentation: produces better description that can be list objects as individual instances of “things” but lower generation on the environment and background “stuff”

c) Panoptic Segmentation: combination of the semantic and instance segmentation. We have access to not only the environment context but also to the individual objects. So we see both the “stuff” and “things”

Image Segmentation using Deep Learning:

Deconstruction: Feature extraction (backbone, encoder)

Reconstruction: Upsampler(decode)

Now you have labels in terms of color the baby and the background.

Semantic Segmentation Output

a) 1-hot encoding, just like classification

b ) Score applied to each pixel

c) Class with highest score sets the pixel

Given a grayscale(H,W,1) or RGB(H,W,3) image generate a segmentation mask which is of the same dimension as the image and consist of categorical values from 1 to N (N is the number of classes).

Instance Segmentation — proto masks

a) additional model output computes per-instance mask predcitions

b) learn to separate objects in each mask which are then fused with semantic mask

c) Handles overlapping instances.

Panoptic Segmentation

Fusing semantic and instance segmentation to detect things and stuff.

Attention-Guided Unified Network Structure.

ref: https://arxiv.org/pdf/1812.03904

Dataset:

Naive way of labeling each pixel is very time consuming. Having a bounding box or a polygon give approximate labeling still requires a high annotation effort.

Label masks

object polygons

very high annotation effort

Segment anything Model” has been a game changer for annotation effort.

https://segment-anything.com/

Accuracy metrics:

a) Similar to IoU concept

b) Panoptic Quality “PQ” is anew metric and applied to in part to all segmentation challenges

c) PQ metrics for things and stuff categories.

COCO metrics “Panoptic Evaluation”

Computational requirements

a) same as backbone as detection model.

b) Segmentation head incurs ~20% overhead

seg much more correct in bounfing boxes.

c) Post processing demands

c) Instance and panoptic incus additional overhead.

Summary

a) Segmentation is a technique that enables to segregagte all the objects in an image along with their contours.

b ) It improves detection models for objects with more complex shape

c) It can be considered an image classification task at a pixel lavel

Resources

datasets

a) https://cocodataset.org

b)https://www.cityscapes-dataset.com

c) https://ai.facebook.com/datasets/segment-anything-its-image-segmentation-architecture

models:

a) https://towardsdatascience.com/u-net-explained-understanding-its-image-segmentation-architecture-56e4842e313a

https://segment-anything.com SAM model from FB

c) https://learnopencv.com/yolo5-instance-segmentation/

Some diagrams were taken from

ref: https://medium.com/analytics-vidhya/pytorch-implementation-of-semantic-segmentation-for-single-class-from-scratch-81f96643c98c

ref: https://arxiv.org/pdf/1812.03904

RELATED ARTICLES