Background
To align the RNA with its nucliei is an important part in the analysis of high-resolution spatial transcriptomics. With more precise location for the RNA transcripts and the nuclei of the cells, we could segment profile of the single-cell therefore to accomplish the analysis of single-cell level spatial transcriptomics. While the first step is to align the figures (stain and RNA) appropritately, and this step has been done in spateo-release.
Cell segmentation - Spateo documentation (spateo-release.readthedocs.io)
Stain segmentation - Spateo documentation (spateo-release.readthedocs.io)
After that, the next step is to segment cells based on the alignment of the two stain figures. For segmentation, the difficulties first comes to segment the relative low and high denstiy region on the slide. As the document says that the global density could not be enough precise due to the UMI does not distribute on the space hemogenousely.
Function Review
Code
# lm for logger_manager
def segment_densities(
adata: AnnData, # input anndata
layer: str, # Layers that contains UMI counts to implement this function
binsize: int, # choose bin size to merge pixels
k: int, # kernel size for Gaussian blur
dk: int, # kernel size for final dilation
distance_threshold: Optional[float] = None, # cluster threshold
background: Optional[Union[Tuple[int, int], Literal[False]]] = None, # in default, the outer most pixels have been identified as background, set to false to turn off background detection.
out_layer: Optional[str] = None, # the output layer name
):
X = SKM.select_layer_data(adata, layer, make_dense=binsize == 1)
if binsize > 1:
lm.main_debug(f"Binning matrix with binsize={binsize}.")
X = bin_matrix(X, binsize)
if issparse(X):
lm.main_debug("Converting to dense matrix.")
X = X.A # why need the step
lm.main_info("Finding density bins.")
bins = _segment_densities(X, k, dk, distance_threshold) # key step for density segments
if background is not False:
lm.main_info("Setting background pixels.")
if background is not None:
x, y = background
background_label = bins[x, y]
else:
counts = Counter(bins[0]) + Counter(bins[-1]) + Counter(bins[:, 0]) + Counter(bins[:, -1])
background_label = counts.most_common(1)[0][0]
bins[bins == background_label] = 0
bins[bins > background_label] -= 1
if binsize > 1:
# Expand back
bins = cv2.resize(bins, adata.shape[::-1], interpolation=cv2.INTER_NEAREST)
out_layer = out_layer or SKM.gen_new_layer_key(layer, SKM.BINS_SUFFIX)
SKM.set_layer_data(adata, out_layer, bins)
Function document
The tissue is segmented into UMI density bins according to the following procedure.
- The UMI matrix is binned according to
binsize
(recommended >= 20). - The binned UMI matrix (from the previous step) is Gaussian blurred with kernel size
k
. Note thatk
is in terms of bins, not pixels. - The elements of the blurred, binned UMI matrix is hierarchically clustered with Ward linkage, distance threshold
distance_threshold
, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix. - Each density bin is diluted with kernel size
dk
, starting from the bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps. - If
background
is not provided, the density bin that is most common in the perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a(x, y)
tuple instead. This feature can be turned off by providingFalse
. - The density bin matrix is resized to be the same size as the original UMI matrix.