AI-based automated flow cytometry analysis for differential diagnosis of leukemia and lymphoma


Multi-channel flow cytometry (MFC) is a cornerstone in the differential diagnosis of leukemia and lymphoma. MFC data analysis requires trained experts to manually gate cell populations of interest, which is time-consuming and subjective. We developed an artificial intelligence (AI) model to automatically classify MFC data into lymphoma diagnosis labels[1]. We transformed the MFC data into self-organizing maps (SOMs) that were then analyzed using a convolution neural network (CNN). We were able to achieve an expert-level accuracy (Weighted F1 score of 0.94) in classifying the MFC samples into eight classes: chronic lymphocytic leukemia and its predecessor monoclonal B-cell lymphocytosis (CLL/MBL), marginal zone lymphoma (MZL), mantle cell lymphoma (MCL), prolymphocytic leukemia (PL), follicular lymphoma (FL), hairy cell leukemia (HCL), and lymphoplasmacytic lymphoma (LPL) and healthy controls. This level of classification accuracy was possible by deep learning on a big data set from a single lab comprising more than 20,000 samples that were measured with the same protocol.

However, MFC is not standardized and the protocol with which a sample is acquired is subject to inter-laboratory variability and thus the MFC panel changes over time in terms of the number of tubes per sample, markers measured, marker-fluorochromes conjugates, as well as the cytometer used, which makes our model not applicable for a wide range of MFC panels and smaller data sets.

We extended our AI model (base model) to four additional MFC panels (target data sets) with much smaller data sets using transfer learning. We developed a workflow, merge_TL, that combines transfer learning with FCS file merging to handle differences across MFC panels. We trained models for each of the four target data sets by transferring the features learned from our base model. With merge_TL, we could increase the target model’s overall performance and more prominently, increase the learning rate for very small training sizes.



CNN architecture


Layer (type) Output Shape Param #
input_1 (InputLayer)(None, 36, 36, 18)0
conv2d_1 (Conv2D)(None, 33, 33, 32)9248
conv2d_2 (Conv2D)(None, 31, 31, 48)13872
conv2d_3 (Conv2D)(None, 15, 15, 64)12352
global_max_pooling2d_1(Glob (None, 64)0
dense_1 (Dense)(None, 64)4160
dense_2 (Dense)(None, 32)2080
dense_3 (Dense)(None, 9)297
Total params: 42,009
Trainable params: 42,009
Non-trainable params: 0


The data and code used for merging FCS files and model generation with transfer learning are available here:


  1. Zhao, M., Mallesh, N., Höllein, A., Schabath, R., Haferlach, C., Haferlach, T., Elsner, F., Lüling, H., Krawitz, P., and Kern, W. (2020). Hematologist-Level Classification of Mature B-Cell Neoplasm Using Deep Learning on Multiparameter Flow Cytometry Data. Cytom. Part A 97.
  2. Knowledge transfer preprint: https://doi.org/10.1101/2021.03.03.21252824