Colorectal cancer (CRC) incidence can be reduced through the early detection and removal of precancerous polyps. Artificial intelligence (AI), especially deep learning, enhances polyp detection during colonoscopy but it often faces limitations from small medical imaging datasets. This study investigates whether synthetic and pseudosynthetic data-augmented images derived from original datasets can improve AI accuracy in polyp detection. Pseudosynthetic data, uniquely derived through augmentation techniques such as flipping, rotation, and contrast adjustment, simulates multiple endoscopic examinations of the same patients without subjecting them to repeated invasive procedures, while enabling the traceability of the original clinical data. A modified U-Net was trained on various combinations of real, synthetic (CycleGAN and diffusion-based), and pseudosynthetic datasets across ten experimental setups and externally validated on the CVC-Colon-DB dataset (including 612 images). The combination of real and pseudosynthetic data provided the highest model performance (a Dice coefficient of 0.7638, a precision of 0.8979, a recall of 0.7535, and a F1 score of 0. 0.7797). To that, when the proposed model employed diffusion-based synthetic data it performed better than when using CycleGAN-generated data, which demonstrated its superior generalization capability in the former case (with a precision of 0.7488, a recall of 0.6695 and a F1 score of 0.8987). The obtained results show that pseudosynthetic data alone can significantly improve the generalization capability of the employed model in comparison with simply real data. These findings confirm that the augmented and synthetic datasets are valuable tools for enhancing a model’s performance and addressing ethical concerns in AI-assisted diagnostics.
Colon polyps, Synthetic data, Polyp detection, Polyp segmentation, Colorectal cancer.
Andrei-Constantin IOANOVICI, Marius-Ștefan MĂRUȘTERI, Andrei Marian FEIER, Vasile Florin POPESCU, Irina IOANOVICI, Daniela-Ecaterina DOBRU, "Using Synthetic and Pseudosynthetic Data to Enhance Polyp Detection in Future AI-assisted Endoscopy Frameworks", Studies in Informatics and Control, ISSN 1220-1766, vol. 34(2), pp. 77-88, 2025. https://doi.org/10.24846/v34i2y202507