segment_densities


Background

To align the RNA with its nucliei is an important part in the analysis of high-resolution spatial transcriptomics. With more precise location for the RNA transcripts and the nuclei of the cells, we could segment profile of the single-cell therefore to accomplish the analysis of single-cell level spatial transcriptomics. While the first step is to align the figures (stain and RNA) appropritately, and this step has been done in spateo-release.

Cell segmentation - Spateo documentation (spateo-release.readthedocs.io)

Stain segmentation - Spateo documentation (spateo-release.readthedocs.io)

After that, the next step is to segment cells based on the alignment of the two stain figures. For segmentation, the difficulties first comes to segment the relative low and high denstiy region on the slide. As the document says that the global density could not be enough precise due to the UMI does not distribute on the space hemogenousely.

Function Review

Code

# lm for logger_manager
def segment_densities(
    adata: AnnData, # input anndata
    layer: str, # Layers that contains UMI counts to implement this function
    binsize: int, # choose bin size to merge pixels
    k: int, # kernel size for Gaussian blur
    dk: int, # kernel size for final dilation 
    distance_threshold: Optional[float] = None,  # cluster threshold
    background: Optional[Union[Tuple[int, int], Literal[False]]] = None, # in default, the outer most pixels have been identified as background, set to false to turn off background detection.
    out_layer: Optional[str] = None, # the output layer name
):
    X = SKM.select_layer_data(adata, layer, make_dense=binsize == 1) 
    if binsize > 1:
        lm.main_debug(f"Binning matrix with binsize={binsize}.")
        X = bin_matrix(X, binsize)
        if issparse(X):
            lm.main_debug("Converting to dense matrix.")
            X = X.A # why need the step
    lm.main_info("Finding density bins.")
    bins = _segment_densities(X, k, dk, distance_threshold)	# key step for density segments
    if background is not False:
        lm.main_info("Setting background pixels.")
        if background is not None:
            x, y = background
            background_label = bins[x, y]
        else:
            counts = Counter(bins[0]) + Counter(bins[-1]) + Counter(bins[:, 0]) + Counter(bins[:, -1])
            background_label = counts.most_common(1)[0][0]
        bins[bins == background_label] = 0
        bins[bins > background_label] -= 1
    if binsize > 1:
        # Expand back
        bins = cv2.resize(bins, adata.shape[::-1], interpolation=cv2.INTER_NEAREST)
    out_layer = out_layer or SKM.gen_new_layer_key(layer, SKM.BINS_SUFFIX)
    SKM.set_layer_data(adata, out_layer, bins)

Function document

The tissue is segmented into UMI density bins according to the following procedure.

  1. The UMI matrix is binned according to binsize (recommended >= 20).
  2. The binned UMI matrix (from the previous step) is Gaussian blurred with kernel size k. Note that k is in terms of bins, not pixels.
  3. The elements of the blurred, binned UMI matrix is hierarchically clustered with Ward linkage, distance threshold distance_threshold, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.
  4. Each density bin is diluted with kernel size dk, starting from the bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.
  5. If background is not provided, the density bin that is most common in the perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a (x, y) tuple instead. This feature can be turned off by providing False.
  6. The density bin matrix is resized to be the same size as the original UMI matrix.

Author: Wulilichao
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Wulilichao !
  TOC