APPROACHES TO FEATURE EXTRACTION FOR CHROMOSOMAL DATA PROCESSING
08.03.2025 17:16
[1. Інформаційні системи і технології]
Автор: Oleksii Pysarchuk, Doctor of Technical Sciences, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Kyiv, Ukraine; Yurii Mironov, PhD student, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Kyiv, Ukraine
Automation of karyotyping (chromosomal image processing) is a broad and relevant application domain that introduces a variety of complex scientific and applied problems, some of which have not been fully solved to date. Related research [1-3] indicates that approaches chromosome identification and classification powered by neural networks show great promise, but offer inconsistent results for chromosomes with altered structure because structural abnormalities are diverse and cannot be adequately represented in training image datasets.
Therefore, the task of developing a technological basis for analyzing chromosome images that would allow identifying abnormal chromosomes is relevant. The first step towards developing such a technological basis is a feature extraction algorithm that reflects the logical structure of the chromosome. Individual images of chromosomes have been chosen as an input format (fig. 1).
Fig. 1. A pair of human chromosomes 10
In the cytogenetic laboratories, unautomated karyotyping process consists of comparing each individual chromosome to ideograms – schematic images of “ideal” chromosomes. Each type of chromosome (1, 2, …, 22, X, Y) has its corresponding ideogram. By matching image of patient’s chromosome to a certain ideogram, chromosome can be categorized as a chromosome of a certain type.
Fig. 2. Human chromosome 10 ideogram
Thus, the process of identification of a single chromosome consists of a comparison between structure of a chromosome and structure of an ideogram. Both of these structures can be expressed as a sequence of bands, where each band has properties “length” and “color”. Each chromosome type has a distinct combination of such bands. So, in order to perform an extraction of logical features from a chromosome image, one would need to extract information about such structure.
Similar approaches have been proposed in the past [4], and they base upon extracting a medial axis of a chromosome and obtaining a color profile throughout this medial axis. However, this approach has two flaws: 1) known skeletonization algorithms do not guarantee image skeleton to be a single line with two ends (fig. 3); 2) it is up to debate which algorithm should be used for identifying “black” and “white” bands on a chromosome due to low contrast of images.
Fig. 3. Improper skeletonization result
Flaw 1 can be bypassed using geometric transformation (fig. 4). After that, a skeletonization algorithm is applied.
Fig. 4. Extraction of medial axis using geometric transformation
Flaw 2 has been addressed using adaptive thresholding binarization, converting grayscale chromosome image into an image consisting only of white and black pixels. As a result, a color profile of binarized image can be extracted (fig. 5). It can be expressed as a list of numbers with possible values 0 (black) and 255 (white).
Fig. 5. Chromosome color profile extraction
To measure the efficiency of a proposed approach, a dataset of 36 chromosome images has been used. Dataset consists of 3 chromosome types, 12 images for each. In order to prove the potential of the proposed approach, its performance has been compared to VGG16. This neural network for feature extraction has been chosen because of its versatility and common usage. Rand index has been used to measure the clustering efficiency, comparing similarity between expected and computed clusters [5]. Results of such a comparison can be observed in table 1.
Table 1. Proposed algorithm efficiency measurement
As can be seen, the proposed algorithm offers +15% accuracy increase comparing to a commonly used feature extraction tool. This can be explained by the fact that proposed method extracts logical features out of chromosomal image instead of hierarchical features extracted by VGG16. Therefore, complex geometry of chromosomal objects has a lesser effect on precision comparing with common methods.
Thus, proposed approach to chromosomal feature extraction shows some promise. Further research should be focused on introducing new methods of chromosome color extraction and effective thresholding binarization algorithm.
References
1. Wu Y., Yue Y., Tan X., Wang W., Lu T. End-To-End Chromosome Karyotyping with Data Augmentation Using GAN. 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 2018;
2. Vajen, B., Hänselmann, S., Lutterloh, F., Käfer, S., Espenkötter, J., Beening, A., Bogin, J., Schlegelberger, B., Göhring, G. Classification of fluorescent R-Band metaphase chromosomes using a convolutional neural network is precise and fast in generating karyograms of hematologic neoplastic cells. Cancer genetics. 2022. Vol. 260-261. P 23–29;
3. Moradi, M., & Setarehdan, S.K. New features for automatic classification of human chromosomes: A feasibility study. Pattern Recognit. Lett. 2006. Vol. 27, P. 19-28;
4. Pysarchuk, O., Mironov, Y. Chromosome Feature Extraction and Ideogram-Powered Chromosome Categorization. Advances in Computer Science for Engineering and Education. ICCSEEA 2022. Lecture Notes on Data Engineering and Communications Technologies. 2022. Vol 134. Springer, Cham;
5. Scikit-learn, (n.d.). Rand score. Retrieved Jan 4, 2024, from https://scikit-learn.org/stable/modules/generated/sklearn.metrics.rand_score.html.