Graphs for Graph Neural Networks
Graphs are a powerful and general representation of data with a wide range of applications. Many data-structures can be represented as graphs. In fact, most of the real-world objects have connections between them, and thus can be restructured as graphs. Recently, researchers have used these facts to introduce the graph data structure into Machine Learning & Deep Learning as a new branch to use Graphs for Graph Neural Networks (GNN). And we can see its application in many fields such as Physics, Chemistry, Molecular Biology, Recommendation systems and many other applications.
Throughout this article we will give a brief introduction about the graph representation. And how you could reconstruct a graph from 2D images in the medical field to be used in the graph models. At the end of this article you will learning the following:
- Basics of Graph Representation
- Graph Reconstruction from 2D images using a Histograph Library.
- How to extract features for each node. (Code Example)

While many use Graph Data-structures to represent their data, in this article, we represent pathology images as graph, as we found it is more intuitive to build a graph from a normal 2D image to really understand how we can shift our problem from a normal 2D image data structure into a graph Data Structure. We will introduce you to GNN in our next article as this here we will be focusing on the Representation of the Graph.
Graph Definitions
Any Graph (G) is composed of 2 main components, set of Vertices (Nodes) V and link between them (Edges) E (See figure 2). Example, pairs (u, v) representing 2 connected nodes u, v ∈ V. Note, undirected graph is the one which (u, v) ∈ E ⟹ (v, u) ∈ E.
The most common way to represent edges E is the Adjacency matrix (A) , Adjacency matrix is a binary square matrix of size |V| x |V|, where A𝑢,𝑣 = 1 if there is a connection between nodes u, v (see figure 2)

Code Example
In the below example we introduce an adjacency matrix. As an exercise try to run this code cell in Google Colab and see the results from the connections that is illustrated in the adjacency matrix.
import numpy as np
## Imports for plotting
import matplotlib.pyplot as plt
%matplotlib inline
## PyTorch
import torch
import torch.nn as nn
node_feats = torch.arange(8, dtype=torch.float32).view(1, 4, 2)
adj_matrix = torch.Tensor([[[1, 1, 0, 0],
[1, 1, 1, 1],
[0, 1, 1, 1],
[0, 1, 1, 1]]])
print("Node features:\n", node_feats)
print("\nAdjacency matrix:\n", adj_matrix)
#Node features:
#tensor([[[0., 1.],
#[2., 3.],
#[4., 5.],
#[6., 7.]]])
#Adjacency matrix:
#tensor([[[1., 1., 0., 0.],
#[1., 1., 1., 1.],
#[0., 1., 1., 1.],
#[0., 1., 1., 1.]]])
Graph Representation From 2D
In the following section we introduce 2D histopathology image from Breast cancer tissue which was taken by a microscope with Hematoxylin and Eosin stain from publicly available BRACS dataset [3] (Fig. 3)
We will reconstruct a graph based on two things:
- First we will deal with each cell in the image as a node
- Second example we will deal with each area (Tissue) as an node
Cell Node
Through our code we will use histocartography Library, which is open source library for creating graph nodes from 2D histopathology images.

First we will create an object from class NucleiExtractor(), segment the image cells that is trained on PanNuke dataset. then we will create feature_extractor object to extract features for each node with 512 dimensions using a Resent, it works by getting 72×72 window from each centroid cell then put this small patch into resent to output 512 feature map for each node. Finally we will create an object from KNNGraphBuilder to reconstruct your graph based on the segmented cells and the features.
from histocartography.preprocessing import NucleiExtractor, DeepFeatureExtractor, KNNGraphBuilder
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
nuclei_detector = NucleiExtractor()
feature_extractor = DeepFeatureExtractor(architecture='resnet34', patch_size=72)
knn_graph_builder = KNNGraphBuilder(k=5, thresh=50, add_loc_feats=True)
image = np.array(Image.open('283_dcis_4.png'))
plt.imshow(image)
nuclei_map, _ = nuclei_detector.process(image)
plt.imshow(nuclei_map)

np.unique(nuclei_map)
#array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
#13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
#26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
#39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
#52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
#65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
#78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
#91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
#104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
#117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
#130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
#143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
#156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
#169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
#182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
#195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
#208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
#221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
#234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
#247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259,
#260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272,
#273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285,
#286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298,
#299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311,
#312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324,
#325, 326, 327, 328, 329, 330, 331], dtype=uint16)
len(np.unique(nuclei_map))
#332
From the previous code blocks we see that the resulted segmentation map results in 331 Unique Cells with each cell assigned a unique number. The cell assigned 0 is the background.
features = feature_extractor.process(image, nuclei_map)
As a result, If you print the dimensions of features you will find it 331×512, with each node has 512 feature embedding.
cell_graph = knn_graph_builder.process(nuclei_map, features)
print(cell_graph)
#DGLGraph(num_nodes=331, num_edges=1500,
#ndata_schemes={'centroid': Scheme(shape=(2,), #dtype=torch.float32), 'feat': Scheme(shape=(514,), dtype=torch.float32)}
#edata_schemes={})
We can find the resulted graph has 331 nodes (Cells) with 1500 edges between them.
Note: the Threshold argument in the KNNGraphBuilder class controls the distance between cells that will consider the cells connected or not.
In our next article we will talk about how we use this resulted graph for Convolution Neural Network Modeling.
Tissue node
In the second example we give some hints about how to create a node as a large area (Tissue) which could be generalized to other problems and other kinds of images not only pathology images.
import os
from glob import glob
import argparse
from PIL import Image
import numpy as np
from tqdm import tqdm
import torch
from dgl.data.utils import save_graphs
import h5py
import cv2
from histocartography.preprocessing import (
VahadaneStainNormalizer, # stain normalizer
NucleiExtractor, # nuclei detector
DeepFeatureExtractor, # feature extractor
KNNGraphBuilder, # kNN graph builder
ColorMergedSuperpixelExtractor, # tissue detector
DeepFeatureExtractor, # feature extractor
RAGGraphBuilder, # build graph
AssignmnentMatrixBuilder # assignment matrix
)
image=cv2.imread('target.png')
plt.imshow(image[:,:,::-1])

Afterwards, A Normalization step is needed to the image
STAIN_NORM_TARGET_IMAGE = 'target.png' # define stain normalization target image.
normalizer = VahadaneStainNormalizer(target_path=STAIN_NORM_TARGET_IMAGE)
image = normalizer.process(image)
plt.imshow(image)
In the next code block we will prepare the three objects that will create to graph from the image which each node represents an area. As we see in ColorMergedSuperpixelExtractor object, We will segment the image based on a super pixel algorithm. A super pixel size argument will control the size of the tissue to be consider as a Node. while tissue_feature_extractor object will extract small patches of 44×44 pixels from each area. Accordingly, these small patches of 44×44 pixels will be resized to 224×224 and fed into a resent to get feature vector of each area (Node).
tissue_detector = ColorMergedSuperpixelExtractor(
superpixel_size=500,
compactness=20,
blur_kernel_size=1,
threshold=0.05,
downsampling_factor=4
)
# b define feature extractor: Extract patches of 144x144 pixels all over
# the tissue regions. Each patch is resized to 224 to match ResNet input size.
tissue_feature_extractor = DeepFeatureExtractor(
architecture='resnet34',
patch_size=44,
resize_size=224
)
# c define RAG builder. Append normalized centroid to the node features.
rag_graph_builder = RAGGraphBuilder(add_loc_feats=True)
superpixels, _ = tissue_detector.process(image)
features = tissue_feature_extractor.process(image, superpixels)
graph = rag_graph_builder.process(superpixels, features)
print(len(np.unique(superpixels)))
#100
plt.imshow(superpixels)

print(features.shape)
#torch.Size([100, 512])
print(graph)
#DGLGraph(num_nodes=100, num_edges=328,
#ndata_schemes={'centroid': Scheme(shape=(2,), #dtype=torch.float32), 'feat': Scheme(shape=(514,), dtype=torch.float32)}
#edata_schemes={})
Finally as cell nodes, the number of nodes equals the number of areas that the super-pixel algorithm determined = 100.
Some hints
- The super pixel image that tissue_feature_extractor.process takes as an input could be any other segmentation label like from your data, Segmentation output from models or pseudo labels. That makes it possible to generalize the idea to any kind of data you have.
- You can use super pixel algorithm to extract areas in any kind of 2D images not only histopathology images.
- The idea of making Graph Nodes with features is to use this graph as an input to a Graph Convolutional Neural Network which we will talk about it in our next Article in detail.