Skip to content

fusion_bench.metrics

NYUv2 Tasks

fusion_bench.metrics.nyuv2

NYUv2 Dataset Metrics Module.

This module provides metric classes and loss functions for evaluating multi-task learning models on the NYUv2 dataset. NYUv2 is a popular indoor scene understanding dataset that includes multiple tasks: semantic segmentation, depth estimation, and surface normal prediction.

Available Metrics
  • SegmentationMetric: Computes mIoU and pixel accuracy for semantic segmentation.
  • DepthMetric: Computes absolute and relative errors for depth estimation.
  • NormalMetric: Computes angular errors for surface normal prediction.
  • NoiseMetric: Placeholder metric for noise evaluation.
Usage
from fusion_bench.metrics.nyuv2 import SegmentationMetric, DepthMetric

# Initialize metrics
seg_metric = SegmentationMetric(num_classes=13)
depth_metric = DepthMetric()

# Update with predictions and targets
seg_metric.update(seg_preds, seg_targets)
depth_metric.update(depth_preds, depth_targets)

# Compute final metrics
miou, pix_acc = seg_metric.compute()
abs_err, rel_err = depth_metric.compute()

metric_classes = {'segmentation': SegmentationMetric, 'depth': DepthMetric, 'normal': NormalMetric, 'noise': NoiseMetric} module-attribute

SegmentationMetric

Bases: Metric

Metric for evaluating semantic segmentation on NYUv2 dataset.

This metric computes mean Intersection over Union (mIoU) and pixel accuracy for multi-class segmentation tasks.

Attributes:

  • metric_names

    List of metric names ["mIoU", "pixAcc"].

  • num_classes

    Number of segmentation classes (default: 13 for NYUv2).

  • record

    Confusion matrix of shape (num_classes, num_classes) tracking predictions vs ground truth.

Source code in fusion_bench/metrics/nyuv2/segmentation.py
class SegmentationMetric(Metric):
    """
    Metric for evaluating semantic segmentation on NYUv2 dataset.

    This metric computes mean Intersection over Union (mIoU) and pixel accuracy
    for multi-class segmentation tasks.

    Attributes:
        metric_names: List of metric names ["mIoU", "pixAcc"].
        num_classes: Number of segmentation classes (default: 13 for NYUv2).
        record: Confusion matrix of shape (num_classes, num_classes) tracking
                predictions vs ground truth.
    """

    metric_names = ["mIoU", "pixAcc"]

    def __init__(self, num_classes=13):
        """
        Initialize the SegmentationMetric.

        Args:
            num_classes: Number of segmentation classes. Default is 13 for NYUv2 dataset.
        """
        super().__init__()

        self.num_classes = num_classes
        self.add_state(
            "record",
            default=torch.zeros(
                (self.num_classes, self.num_classes), dtype=torch.int64
            ),
            dist_reduce_fx="sum",
        )

    def reset(self):
        """Reset the confusion matrix to zeros."""
        self.record.zero_()

    def update(self, preds: Tensor, target: Tensor):
        """
        Update the confusion matrix with predictions and targets from a batch.

        Args:
            preds: Predicted segmentation logits of shape (batch_size, num_classes, height, width).
                   Will be converted to class predictions via softmax and argmax.
            target: Ground truth segmentation labels of shape (batch_size, height, width).
                   Pixels with negative values or values >= num_classes are ignored.
        """
        preds = preds.softmax(1).argmax(1).flatten()
        target = target.long().flatten()

        k = (target >= 0) & (target < self.num_classes)
        inds = self.num_classes * target[k].to(torch.int64) + preds[k]
        self.record += torch.bincount(inds, minlength=self.num_classes**2).reshape(
            self.num_classes, self.num_classes
        )

    def compute(self):
        """
        Compute mIoU and pixel accuracy from the confusion matrix.

        Returns:
            List[Tensor]: A list containing [mIoU, pixel_accuracy]:
                - mIoU: Mean Intersection over Union across all classes.
                - pixel_accuracy: Overall pixel classification accuracy.
        """
        h = cast(Tensor, self.record).float()
        iu = torch.diag(h) / (h.sum(1) + h.sum(0) - torch.diag(h))
        acc = torch.diag(h).sum() / h.sum()
        return [torch.mean(iu), acc]
__init__(num_classes=13)

Initialize the SegmentationMetric.

Parameters:

  • num_classes

    Number of segmentation classes. Default is 13 for NYUv2 dataset.

Source code in fusion_bench/metrics/nyuv2/segmentation.py
def __init__(self, num_classes=13):
    """
    Initialize the SegmentationMetric.

    Args:
        num_classes: Number of segmentation classes. Default is 13 for NYUv2 dataset.
    """
    super().__init__()

    self.num_classes = num_classes
    self.add_state(
        "record",
        default=torch.zeros(
            (self.num_classes, self.num_classes), dtype=torch.int64
        ),
        dist_reduce_fx="sum",
    )
compute()

Compute mIoU and pixel accuracy from the confusion matrix.

Returns:

  • List[Tensor]: A list containing [mIoU, pixel_accuracy]: - mIoU: Mean Intersection over Union across all classes. - pixel_accuracy: Overall pixel classification accuracy.

Source code in fusion_bench/metrics/nyuv2/segmentation.py
def compute(self):
    """
    Compute mIoU and pixel accuracy from the confusion matrix.

    Returns:
        List[Tensor]: A list containing [mIoU, pixel_accuracy]:
            - mIoU: Mean Intersection over Union across all classes.
            - pixel_accuracy: Overall pixel classification accuracy.
    """
    h = cast(Tensor, self.record).float()
    iu = torch.diag(h) / (h.sum(1) + h.sum(0) - torch.diag(h))
    acc = torch.diag(h).sum() / h.sum()
    return [torch.mean(iu), acc]
reset()

Reset the confusion matrix to zeros.

Source code in fusion_bench/metrics/nyuv2/segmentation.py
def reset(self):
    """Reset the confusion matrix to zeros."""
    self.record.zero_()
update(preds, target)

Update the confusion matrix with predictions and targets from a batch.

Parameters:

  • preds (Tensor) –

    Predicted segmentation logits of shape (batch_size, num_classes, height, width). Will be converted to class predictions via softmax and argmax.

  • target (Tensor) –

    Ground truth segmentation labels of shape (batch_size, height, width). Pixels with negative values or values >= num_classes are ignored.

Source code in fusion_bench/metrics/nyuv2/segmentation.py
def update(self, preds: Tensor, target: Tensor):
    """
    Update the confusion matrix with predictions and targets from a batch.

    Args:
        preds: Predicted segmentation logits of shape (batch_size, num_classes, height, width).
               Will be converted to class predictions via softmax and argmax.
        target: Ground truth segmentation labels of shape (batch_size, height, width).
               Pixels with negative values or values >= num_classes are ignored.
    """
    preds = preds.softmax(1).argmax(1).flatten()
    target = target.long().flatten()

    k = (target >= 0) & (target < self.num_classes)
    inds = self.num_classes * target[k].to(torch.int64) + preds[k]
    self.record += torch.bincount(inds, minlength=self.num_classes**2).reshape(
        self.num_classes, self.num_classes
    )

DepthMetric

Bases: Metric

Metric for evaluating depth estimation performance on NYUv2 dataset.

This metric computes absolute error and relative error for depth predictions, properly handling the binary mask to exclude invalid depth regions.

Attributes:

  • metric_names

    List of metric names ["abs_err", "rel_err"].

  • abs_record

    List storing absolute error values for each batch.

  • rel_record

    List storing relative error values for each batch.

  • batch_size

    List storing batch sizes for weighted averaging.

Source code in fusion_bench/metrics/nyuv2/depth.py
class DepthMetric(Metric):
    """
    Metric for evaluating depth estimation performance on NYUv2 dataset.

    This metric computes absolute error and relative error for depth predictions,
    properly handling the binary mask to exclude invalid depth regions.

    Attributes:
        metric_names: List of metric names ["abs_err", "rel_err"].
        abs_record: List storing absolute error values for each batch.
        rel_record: List storing relative error values for each batch.
        batch_size: List storing batch sizes for weighted averaging.
    """

    metric_names = ["abs_err", "rel_err"]

    def __init__(self):
        """Initialize the DepthMetric with state variables for tracking errors."""
        super().__init__()

        self.add_state("abs_record", default=[], dist_reduce_fx="cat")
        self.add_state("rel_record", default=[], dist_reduce_fx="cat")
        self.add_state("batch_size", default=[], dist_reduce_fx="cat")

    def reset(self):
        """Reset all metric states to empty lists."""
        self.abs_record = []
        self.rel_record = []
        self.batch_size = []

    def update(self, preds: Tensor, target: Tensor):
        """
        Update metric states with predictions and targets from a batch.

        Args:
            preds: Predicted depth values of shape (batch_size, 1, height, width).
            target: Ground truth depth values of shape (batch_size, 1, height, width).
                   Pixels with sum of 0 are considered invalid and masked out.
        """
        binary_mask = (torch.sum(target, dim=1) != 0).unsqueeze(1)
        preds = preds.masked_select(binary_mask)
        target = target.masked_select(binary_mask)
        abs_err = torch.abs(preds - target)
        rel_err = torch.abs(preds - target) / target
        abs_err = torch.sum(abs_err) / torch.nonzero(binary_mask, as_tuple=False).size(
            0
        )
        rel_err = torch.sum(rel_err) / torch.nonzero(binary_mask, as_tuple=False).size(
            0
        )
        self.abs_record.append(abs_err)
        self.rel_record.append(rel_err)
        self.batch_size.append(torch.asarray(preds.size(0), device=preds.device))

    def compute(self):
        """
        Compute the final metric values across all batches.

        Returns:
            List[Tensor]: A list containing [absolute_error, relative_error],
                         where each value is the weighted average across all batches.
        """
        records = torch.stack(
            [torch.stack(self.abs_record), torch.stack(self.rel_record)]
        )
        batch_size = torch.stack(self.batch_size)
        return [(records[i] * batch_size).sum() / batch_size.sum() for i in range(2)]
__init__()

Initialize the DepthMetric with state variables for tracking errors.

Source code in fusion_bench/metrics/nyuv2/depth.py
def __init__(self):
    """Initialize the DepthMetric with state variables for tracking errors."""
    super().__init__()

    self.add_state("abs_record", default=[], dist_reduce_fx="cat")
    self.add_state("rel_record", default=[], dist_reduce_fx="cat")
    self.add_state("batch_size", default=[], dist_reduce_fx="cat")
compute()

Compute the final metric values across all batches.

Returns:

  • List[Tensor]: A list containing [absolute_error, relative_error], where each value is the weighted average across all batches.

Source code in fusion_bench/metrics/nyuv2/depth.py
def compute(self):
    """
    Compute the final metric values across all batches.

    Returns:
        List[Tensor]: A list containing [absolute_error, relative_error],
                     where each value is the weighted average across all batches.
    """
    records = torch.stack(
        [torch.stack(self.abs_record), torch.stack(self.rel_record)]
    )
    batch_size = torch.stack(self.batch_size)
    return [(records[i] * batch_size).sum() / batch_size.sum() for i in range(2)]
reset()

Reset all metric states to empty lists.

Source code in fusion_bench/metrics/nyuv2/depth.py
def reset(self):
    """Reset all metric states to empty lists."""
    self.abs_record = []
    self.rel_record = []
    self.batch_size = []
update(preds, target)

Update metric states with predictions and targets from a batch.

Parameters:

  • preds (Tensor) –

    Predicted depth values of shape (batch_size, 1, height, width).

  • target (Tensor) –

    Ground truth depth values of shape (batch_size, 1, height, width). Pixels with sum of 0 are considered invalid and masked out.

Source code in fusion_bench/metrics/nyuv2/depth.py
def update(self, preds: Tensor, target: Tensor):
    """
    Update metric states with predictions and targets from a batch.

    Args:
        preds: Predicted depth values of shape (batch_size, 1, height, width).
        target: Ground truth depth values of shape (batch_size, 1, height, width).
               Pixels with sum of 0 are considered invalid and masked out.
    """
    binary_mask = (torch.sum(target, dim=1) != 0).unsqueeze(1)
    preds = preds.masked_select(binary_mask)
    target = target.masked_select(binary_mask)
    abs_err = torch.abs(preds - target)
    rel_err = torch.abs(preds - target) / target
    abs_err = torch.sum(abs_err) / torch.nonzero(binary_mask, as_tuple=False).size(
        0
    )
    rel_err = torch.sum(rel_err) / torch.nonzero(binary_mask, as_tuple=False).size(
        0
    )
    self.abs_record.append(abs_err)
    self.rel_record.append(rel_err)
    self.batch_size.append(torch.asarray(preds.size(0), device=preds.device))

NormalMetric

Bases: Metric

Metric for evaluating surface normal prediction on NYUv2 dataset.

This metric computes angular error statistics between predicted and ground truth surface normals, including mean, median, and percentage of predictions within specific angular thresholds (11.25°, 22.5°, 30°).

Attributes:

  • metric_names

    List of metric names ["mean", "median", "<11.25", "<22.5", "<30"].

  • record

    List storing angular errors (in degrees) for all pixels across batches.

Source code in fusion_bench/metrics/nyuv2/normal.py
class NormalMetric(Metric):
    """
    Metric for evaluating surface normal prediction on NYUv2 dataset.

    This metric computes angular error statistics between predicted and ground truth
    surface normals, including mean, median, and percentage of predictions within
    specific angular thresholds (11.25°, 22.5°, 30°).

    Attributes:
        metric_names: List of metric names ["mean", "median", "<11.25", "<22.5", "<30"].
        record: List storing angular errors (in degrees) for all pixels across batches.
    """

    metric_names = ["mean", "median", "<11.25", "<22.5", "<30"]

    def __init__(self):
        """Initialize the NormalMetric with state for recording angular errors."""
        super(NormalMetric, self).__init__()

        self.add_state("record", default=[], dist_reduce_fx="cat")

    def update(self, preds, target):
        """
        Update metric state with predictions and targets from a batch.

        Args:
            preds: Predicted surface normals of shape (batch_size, 3, height, width).
                   Will be L2-normalized before computing errors.
            target: Ground truth surface normals of shape (batch_size, 3, height, width).
                   Already normalized on NYUv2 dataset. Pixels with sum of 0 are invalid.
        """
        # gt has been normalized on the NYUv2 dataset
        preds = preds / torch.norm(preds, p=2, dim=1, keepdim=True)
        binary_mask = torch.sum(target, dim=1) != 0
        error = (
            torch.acos(
                torch.clamp(
                    torch.sum(preds * target, 1).masked_select(binary_mask), -1, 1
                )
            )
            .detach()
            .cpu()
            .numpy()
        )
        error = np.degrees(error)
        self.record.append(torch.from_numpy(error))

    def compute(self):
        """
        Compute final metric values from all recorded angular errors.

        Returns:
            List[Tensor]: A list containing five metrics:
                - mean: Mean angular error in degrees.
                - median: Median angular error in degrees.
                - <11.25: Percentage of pixels with error < 11.25°.
                - <22.5: Percentage of pixels with error < 22.5°.
                - <30: Percentage of pixels with error < 30°.

        Note:
            Returns zeros if no data has been recorded.
        """
        if self.record is None:
            return torch.asarray([0.0, 0.0, 0.0, 0.0, 0.0])

        records = torch.concatenate(self.record)
        return [
            torch.mean(records),
            torch.median(records),
            torch.mean((records < 11.25) * 1.0),
            torch.mean((records < 22.5) * 1.0),
            torch.mean((records < 30) * 1.0),
        ]
__init__()

Initialize the NormalMetric with state for recording angular errors.

Source code in fusion_bench/metrics/nyuv2/normal.py
def __init__(self):
    """Initialize the NormalMetric with state for recording angular errors."""
    super(NormalMetric, self).__init__()

    self.add_state("record", default=[], dist_reduce_fx="cat")
compute()

Compute final metric values from all recorded angular errors.

Returns:

  • List[Tensor]: A list containing five metrics: - mean: Mean angular error in degrees. - median: Median angular error in degrees. - <11.25: Percentage of pixels with error < 11.25°. - <22.5: Percentage of pixels with error < 22.5°. - <30: Percentage of pixels with error < 30°.

Note

Returns zeros if no data has been recorded.

Source code in fusion_bench/metrics/nyuv2/normal.py
def compute(self):
    """
    Compute final metric values from all recorded angular errors.

    Returns:
        List[Tensor]: A list containing five metrics:
            - mean: Mean angular error in degrees.
            - median: Median angular error in degrees.
            - <11.25: Percentage of pixels with error < 11.25°.
            - <22.5: Percentage of pixels with error < 22.5°.
            - <30: Percentage of pixels with error < 30°.

    Note:
        Returns zeros if no data has been recorded.
    """
    if self.record is None:
        return torch.asarray([0.0, 0.0, 0.0, 0.0, 0.0])

    records = torch.concatenate(self.record)
    return [
        torch.mean(records),
        torch.median(records),
        torch.mean((records < 11.25) * 1.0),
        torch.mean((records < 22.5) * 1.0),
        torch.mean((records < 30) * 1.0),
    ]
update(preds, target)

Update metric state with predictions and targets from a batch.

Parameters:

  • preds

    Predicted surface normals of shape (batch_size, 3, height, width). Will be L2-normalized before computing errors.

  • target

    Ground truth surface normals of shape (batch_size, 3, height, width). Already normalized on NYUv2 dataset. Pixels with sum of 0 are invalid.

Source code in fusion_bench/metrics/nyuv2/normal.py
def update(self, preds, target):
    """
    Update metric state with predictions and targets from a batch.

    Args:
        preds: Predicted surface normals of shape (batch_size, 3, height, width).
               Will be L2-normalized before computing errors.
        target: Ground truth surface normals of shape (batch_size, 3, height, width).
               Already normalized on NYUv2 dataset. Pixels with sum of 0 are invalid.
    """
    # gt has been normalized on the NYUv2 dataset
    preds = preds / torch.norm(preds, p=2, dim=1, keepdim=True)
    binary_mask = torch.sum(target, dim=1) != 0
    error = (
        torch.acos(
            torch.clamp(
                torch.sum(preds * target, 1).masked_select(binary_mask), -1, 1
            )
        )
        .detach()
        .cpu()
        .numpy()
    )
    error = np.degrees(error)
    self.record.append(torch.from_numpy(error))

NoiseMetric

Bases: Metric

A placeholder metric for noise evaluation on NYUv2 dataset.

This metric currently serves as a placeholder and always returns a value of 1. It can be extended in the future to include actual noise-related metrics.

Note

This is a dummy implementation that doesn't perform actual noise measurements.

Source code in fusion_bench/metrics/nyuv2/noise.py
class NoiseMetric(Metric):
    """
    A placeholder metric for noise evaluation on NYUv2 dataset.

    This metric currently serves as a placeholder and always returns a value of 1.
    It can be extended in the future to include actual noise-related metrics.

    Note:
        This is a dummy implementation that doesn't perform actual noise measurements.
    """

    def __init__(self):
        """Initialize the NoiseMetric."""
        super().__init__()

    def update(self, preds: Tensor, target: Tensor):
        """
        Update metric state (currently a no-op).

        Args:
            preds: Predicted values (unused).
            target: Ground truth values (unused).
        """
        pass

    def compute(self):
        """
        Compute the metric value.

        Returns:
            List[int]: A list containing [1] as a placeholder value.
        """
        return [1]
__init__()

Initialize the NoiseMetric.

Source code in fusion_bench/metrics/nyuv2/noise.py
def __init__(self):
    """Initialize the NoiseMetric."""
    super().__init__()
compute()

Compute the metric value.

Returns:

  • List[int]: A list containing [1] as a placeholder value.

Source code in fusion_bench/metrics/nyuv2/noise.py
def compute(self):
    """
    Compute the metric value.

    Returns:
        List[int]: A list containing [1] as a placeholder value.
    """
    return [1]
update(preds, target)

Update metric state (currently a no-op).

Parameters:

  • preds (Tensor) –

    Predicted values (unused).

  • target (Tensor) –

    Ground truth values (unused).

Source code in fusion_bench/metrics/nyuv2/noise.py
def update(self, preds: Tensor, target: Tensor):
    """
    Update metric state (currently a no-op).

    Args:
        preds: Predicted values (unused).
        target: Ground truth values (unused).
    """
    pass

Continual Learning Metrics

fusion_bench.metrics.continual_learning

compute_backward_transfer(acc_Ti, acc_ii)

Compute the backward transfer (BWT) of a model on a set of tasks.

Equation

\(BWT = \frac{1}{n} \sum_{k=1}^{n} (acc_{T,i}[k] - acc_{i,i}[k])\)

Returns:

  • float ( float ) –

    The backward transfer of the model.

Source code in fusion_bench/metrics/continual_learning/backward_transfer.py
def compute_backward_transfer(
    acc_Ti: Dict[str, float], acc_ii: Dict[str, float]
) -> float:
    R"""
    Compute the backward transfer (BWT) of a model on a set of tasks.

    Equation:
        $BWT = \frac{1}{n} \sum_{k=1}^{n} (acc_{T,i}[k] - acc_{i,i}[k])$

    Returns:
        float: The backward transfer of the model.
    """
    assert set(acc_ii.keys()) == set(acc_Ti.keys())
    bwt = 0
    for task_name in acc_ii:
        bwt += acc_Ti[task_name] - acc_ii[task_name]
    return bwt / len(acc_ii)