Received: April 03, 2024
Accepted: July 16,2024
Available: August 22, 2024
Breast cancer is one of the leading causes of death in women in the world, so its early detection has become a priority to save lives. For the diagnosis of this type of cancer, there are techniques such as dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), which uses a contrast agent to enhance abnormalities in breast tissue, which improves the detection and characterization of possible tumors. As a limitation, DCE-MRI studies are usually expensive, there is little equipment available to perform them, and in some cases the contrast medium can generate adverse effects due to an allergic reaction. Considering all of the above, the aim of this work was to use deep learning models for the generation of postcontrast synthetic images in DCE-MRI studies. The proposed methodology consisted of the development of a cost function, called CeR-Loss, that takes advantage of the contrast agent uptake behavior. As a result, two new deep learning architectures were trained, which we have named G-RiedGAN and D-RiedGAN, for the generation of postcontrast images in DCE-MRI studies, from precontrast images. Finally, it is concluded that the peak signal-to-noise ratio, structured similarity indexing method, and mean absolute error metrics show that the proposed architectures improve the postcontrast image synthesis process, preserving greater similarity between the synthetic images and the real images, compared to the state-of-the-art base models.
Keywords: Breast cancer, diagnostic imaging, magnetic resonance imaging, postcontrast image generation, deep learning.
El cáncer de mama es una de las principales causas de muerte en mujeres en el mundo, por lo que su detección de forma temprana se ha convertido en una prioridad para salvar vidas. Para el diagnóstico de este tipo de cáncer existen técnicas como la imagen de resonancia magnética dinámica con realce de contraste (DCE-MRI, por sus siglas en inglés), la cual usa un agente de contraste para realzar las anomalías en el tejido de la mama, lo que mejora la detección y caracterización de posibles tumores. Como limitación, los estudios de DCE-MRI suelen tener un costo alto, hay poca disponibilidad de equipos para realizarlos, y en algunos casos los medios de contraste pueden generar efectos adversos por reacciones alérgicas. Considerando lo anterior, este trabajo tuvo como objetivo el uso de modelos de aprendizaje profundo para la generación de imágenes sintéticas postcontraste en estudios de DCE-MRI. La metodología consistió en el desarrollo de una función de costo denominada pérdida en las regiones con realce de contraste que aprovecha el comportamiento de la captación del agente de contraste. Como resultado se entrenaron dos nuevas arquitecturas de aprendizaje profundo, las cuales hemos denominado G-RiedGAN y D-RiedGAN, para la generación de imágenes postcontraste en estudios de DCE-MRI, a partir de imágenes precontraste. Finalmente, se concluye que las métricas proporción máxima señal ruido, índice de similitud estructural y error absoluto medio muestran que las arquitecturas propuestas mejoran el proceso de síntesis de las imágenes postcontraste preservando mayor similitud entre las imágenes sintéticas y las imágenes reales, esto en comparación con los modelos base en el estado del arte.
Palabras clave: Cáncer de mama, imagen médica, resonancia magnética, generación de imagen postcontraste, aprendizaje profundo.
Breast cancer is a chronic non-communicable disease caused by DNA alterations that affect the normal division and growth of tissue cells. Due to its high incidence rates—particularly among the female population—it is one of the major public health concerns worldwide
In general terms, breast cancer can be classified into five main types
The treatment and prognosis of breast cancer depend on its type and specific characteristics. Nevertheless, early detection and timely treatment are crucial for preventing complications, improving patient prognosis, and reducing mortality rates
When mammography or ultrasound results are inconclusive, more specialized tests are employed, which involve the intravenous administration of a contrast agent
For its part, DCE-MRI uses magnetic waves to capture the absorption of the contrast agent over time, as it reacts in an accelerated manner in tissues with potential lesions
Despite their advantages, these methods have limitations that include elevated costs, long acquisition times, and limited availability of equipment. Additionally, the contrast agent may cause allergic or adverse reactions in patients
Image synthesis models can be broadly classified into two categories: autoencoders and Generative Adversarial Networks (GANs). Autoencoders, on the one hand, are architectures that comprise an encoder, which reduces the dimensionality of input data to learn an abstract (or latent) representation of its distribution; and a decoder, which reconstructs information from the latent space into a higher-dimensional space
GANs, on the other hand, consist of a generator and a discriminator. The generator is a convolutional network that attempts to learn the latent distribution of the real data to generate synthetic information from a random noise sample. For its part, the discriminator is a complementary convolutional network that acts as an expert in distinguishing between real and synthetic information. The training of both networks is adversarial, that is, the generator strives to enhance its generation process to deceive the discriminator, while the discriminator seeks to refine its expertise to avoid being deceived by the generator. This adversarial learning process gives GANs their name
Image synthesis methods can be employed to generate post-contrast images from pre-contrast images in DCE-MRI and CEDM studies. This application, known as domain shift, entails transforming a pre-contrast image (x) into a post-contrast image (y)
In a subsequent study, the authors proposed a U-Net architecture called RiedNet
Regarding the use of DCE-MRI studies for breast cancer detection, the authors of
Another contribution to this field is the study presented in
Furthermore, in
In other imaging modalities, the study presented in
Aiming at reducing the radiation doses used in breast cancer diagnostic tests, the authors of
In a context other than breast cancer diagnosis, the authors of
Despite numerous attempts to develop generative models for synthesizing diagnostic images in breast cancer detection, there are still significant limitations. This is largely due to the high variability in breast tissue density, which affects the performance of generative models when using contrast agents, as the visibility of these agents diminishes with increasing pixel intensity.
To address these challenges, this study proposes a novel architecture called D-RiedGAN. This architecture builds upon the Pix2Pix framework by incorporating residual inception blocks but focuses on contrast-enhanced regions in DCE-MRI studies.
The proposed methodology begins with the implementation of a three-model baseline to synthesize fat-saturated T1-weighted images showing early response to contrast medium in DCE-MRI studies. Building on this baseline, two mixed architectures and two new architectures (G-RiedGAN and D-RiedGAN) are developed. These generative models are trained to generate synthetic post-contrast images, 𝑦̂ = 𝐺(𝑥). from non-contrast images, x. The goal is for the generative model, 𝐺(𝑥), to learn the early response to the contrast medium, thereby making the synthetic images resemble the real post-contrast images (y).
Traditional models for image synthesis have achieved significant advancements in natural image processing but face multiple limitations, especially when applied to specialized images such as medical ones. To overcome these challenges, this study proposes integrating a cost function that captures information from contrast-enhanced regions during training. This function aims to guide the synthesis process to accurately generate contrast enhancement in post-contrast images.
In terms of pixel intensity, the enhancement after contrast agent administration is identified by the highest intensities in the post-contrast image. Particularly, a global thresholding strategy, as shown in (1), is used to detect these high-intensity pixels. Here, y(i, j) is the pixel at position (i, j) in the post-contrast image and T is the threshold value.
(1)Due to the sensitivity of T to intensity variations in images from different DCE-MRI studies, this parameter is set for each image using the 90th percentile of its histogram. In other words, the 10 % of the image pixels with the highest intensities are retained as contrast-enhanced regions. To refine these regions, closing and opening operators are employed using a 7x7-pixel circular structuring element, which smooths contours and removes small gaps between adjacent regions. This process is applied to both synthetic and real post-contrast images, generating real Fy and synthetic FG(x) contrast enhancement masks.
Once the contrast-enhanced regions are identified, a cost function is used to minimize the discrepancies between these regions in real and synthetic images. Since contrast-enhanced regions are binary, optimizing them involves employing a cost function based on set similarity, such as the Jaccard index
(2) Because the Jaccard index is neither convex nor differentiable, its optimization via gradient descent (in the context of neural networks) can result in suboptimal solutions or convergence problems.
(3)Here, 𝑔𝑖(𝑚) = ∆({𝜋1. . . 𝜋𝑖 }) − ∆({𝜋1. . . 𝜋𝑖−1}) with 𝜋 being a permutation of the components of 𝑚 in descending order. Function ∆̅ is the strict convex closure of ∆, is piecewise linear, and interpolates the values of ∆, in 𝑅𝑝. Equation (4) is employed to compute the Lovász function of the Jaccard index in (2) (∆̅ 𝐽𝑐).
(4)Where 𝑓(𝑦, 𝑦∗) estimates the error vector . from the real or generated contrast masks after applying the SoftMax function. To avoid variations caused by batch size and the number of classes, the Lovász function is optimized by combining it with the Binary Cross-Entropy (BCE) described in (5), as suggested by the authors of
(5)Finally, to optimize the proposed models, a cost function combining BCE with the Lovász surrogate extension is used specifically for the contrast-enhanced regions. This combined function, termed CeR-Loss, is presented in (6).
(6)Figure 1a illustrates the general architecture of the first proposed model, G-RiedGAN. This architecture integrates, after the generator, a filter for detecting contrast-enhanced regions, which provides feedback to the generator to guide it in accurately replicating contrast enhancement. In this case, the PatchGAN discriminator, which identifies whether pre-contrast and post-contrast image pairs are real or synthetic, remains unchanged. The loss function of the G-RiedGAN generator, shown in (7), considers the overall loss from the pixel-level difference between the real and generated images, as well as the loss from the contrast-enhanced regions (CeR-Loss).
(7)The proposed D-RiedGAN architecture, for its part, depicted in Figure 1b, includes the difference between the contrast-enhanced regions in both the generator and the discriminator. This allows the generator to focus more on these regions by considering them in the adversarial counterpart, thus improving the quality of the synthesized images.
For adversarial learning, the D-RiedGAN discriminator is modified to receive a triplet of images: the input image, the synthetic or real image, and the contrast-enhanced regions of the real or synthetic image. The loss functions of the D-RiedGAN generator and discriminator, defined in (8) and (9), respectively, include the loss from the contrast-enhanced regions (CeR-Loss).
(8)
(9)The Pix2Pix
The Pix2Pix architecture
RiedNet
Unlike the previous two architectures, Ea-GAN
To improve the synthesis process, these three foundational architectures were combined. The first combination, called RiedGAN, integrates a PatchGAN discriminator into the RiedNet architecture to enhance the synthesis process using an adversarial learning scheme. The key distinction between this network and the original Pix2Pix is that it employs the U-Net generator from the RiedNet architecture rather than the traditional U-Net generator.
Building on the idea of using edge maps from the Ea-GAN architecture, edge maps were incorporated into the RiedGAN framework, resulting in two new models: gEa-RiedGAN, which integrates edge maps into the RiedGAN generator, and dEa-RiedGAN, which incorporates edge map information into both the generator and the discriminator.
To assess the quality of the synthetic images, three widely used quantitative metrics were employed: Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM).
The Mean Absolute Error (MAE) quantifies the pixel-to-pixel difference between the intensities of two images. For a real image, y, and a generated image, 𝐺(𝑥), both of size 𝑚 𝑥 𝑛 pixels, MAE is computed as detailed in (10). A low MAE value signifies minimal error between the synthesized image and the reference image, with values close to 0 being ideal, i.e., indicating high accuracy. Conversely, a high MAE value reflects larger error and lower accuracy in image synthesis.
(10)The Peak Signal-to-Noise Ratio (PSNR) represents the ratio between the maximum possible energy of a signal and the noise affecting the signal’s representation, measured in decibels (dB)
(11)The Structural Similarity Index Measure (SSIM) considers the strong interdependencies between pixels, especially those in close proximity. These dependencies include information about luminance, contrast, and structure of the objects in the image and can be estimated jointly as shown in Equation (12)
(12)Finally, difference maps are calculated by comparing individual pixels between a generated image and a real image to evaluate their discrepancies. This process is described by (13), where each pixel in the images is analyzed, and the difference in intensity between corresponding pixels in the two images is estimated. Each pixel’s value represents its intensity, and pixel comparison involves subtracting the value of the corresponding pixel in one image from the value of the same pixel in the other image. This comparison is used to quantify and visualize the differences between the generated and real images.
(13)The results presented in this study were obtained using the experimental setup detailed in Table 1. This table outlines the hyperparameters employed for each model, which were adjusted according to the available computational resources. The experiments were conducted on a workstation equipped with an Intel Xeon Silver 4108 CPU and an NVIDIA Quadro P2000 GPU with 4 GB of RAM. Python version 3.8 was used as the programming language, along with PyTorch version 2.0.
| Model | Batch size | Number of epochs | Optimizer | Learning rate | Activation output | ||
| RiedNet | 4 | 100 | Adam | 0.0002 | N/A | - | Linear |
| Pix2Pix | 1 | 100 | Adam | 0.0002 | 100 | - | TanH |
| gEa-GAN | 1 | 100 | Adam | 0.0002 | 300 | 300 | Sigmoide |
| dEa-GAN | 1 | 100 | Adam | 0.0002 | 300 | 300 | Sigmoide |
| RiedGAN | 4 | 100 | Adam | 0.0002 | 100 | - | TanH |
| gEa-RiedGAN | 1 | 100 | Adam | 0.0002 | 100 | 300 | TanH |
| dEa-RiedGAN | 1 | 100 | Adam | 0.0002 | 100 | 300 | TanH |
| G-RiedGAN | 1 | 100 | Adam | 0.0002 | 100 | 150 | TanH |
| D-RiedGAN | 1 | 100 | Adam | 0.0002 | 100 | 150 | TanH |
A proprietary, retrospective, and anonymized database of DCE-MRI studies from 197 patients was used to train the models. Each study includes T1- and T2-weighted structural images, Diffusion Weighted images (DWIs), and six DCE images. This study focused on the T1 fat-saturated sequence acquired before contrast agent administration x and the corresponding image acquired in the early stage after contrast agent administration y.
Given the retrospective nature of the database, studies were selected based on their use of different types of 1.5 T resonators and gadolinium-based contrast agents, with doses ranging between 0.014 and 0.016 ml/mol. All studies contained at least one abnormality (either benign or malignant) annotated by expert radiologists using the BI-RADS system. This selection process ensured a balanced representation of both benign and malignant cases.
To focus on synthesizing contrast regions, only images with annotated contrast regions were included to ensure accurate depiction of contrast uptake. As a result, a total of 937 normalized images, scaled to the range from -1 to 1, were processed. Of these, 718 images were allocated for training and 219 for validation. The original images, with resolutions ranging between 480x480 and 512x512 pixels, were resized to 240x240 pixels for consistency purposes.
Figure 2 presents a comparison of the PSNR, SSIM, and MAE metrics for the models assessed on the validation image set. As observed, the G-RiedGAN and D-RiedGAN models proposed in this study outperformed the other models. This suggests that incorporating contrast-enhanced regions into the image synthesis process using the CeR-Loss function significantly enhances the quality of the synthetic images, as reflected by the values of these quantitative metrics.
Although G-RiedGAN and D-RiedGAN showed a slightly higher MAE compared to RiedGAN, the increase in MAE for D-RiedGAN was minimal and is outweighed by substantial improvements in PSNR and SSIM. This indicates that while RiedGAN demonstrated slightly better accuracy in individual pixel errors, it tends to produce more blurred images of internal structures, making it less suitable for medical image synthesis.
In comparison to Pix2Pix
Figure 3, for its part, displays real post-contrast images generated from their non-contrast counterparts. Overall, the models effectively reproduced larger anatomical structures, though some discrepancies were observed in the intensities of rib cage structures. Despite this, G-RiedGAN and D-RiedGAN showed superior synthesis of contrast-enhanced regions compared to reference models such as RiedNet
Based on the results, G-RiedGAN and D-RiedGAN combine the strengths of existing models to achieve more precise and higher-quality image synthesis, particularly in contrast-enhanced regions. They also performed well when compared to models reported in the literature, notably in minimizing noise and blur. This is further illustrated in Figure 4, which shows the difference maps of the synthetic and real images. As can be seen, the difference maps for G-RiedGAN and D-RiedGAN reveal fewer discrepancies between the synthetic and real images.
The proposed cost function, CeR-Loss, is a critical component of the D-RiedGAN architecture, significantly enhancing its performance when compared to other models. To evaluate the impact of this function on model training, a series of experiments were conducted using the D-RiedGAN architecture (described in the previous section) but with variations in parameters 𝜆𝑙1 and 𝜆Lovasz. In these experiments, 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 was consistently set higher than 𝜆𝑙1 across all configurations.
Figure 5 illustrates the results for three different parameter settings: 𝜆𝑙1 = 20 and 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 = 30; 𝜆𝑙1 = 40 and 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 = 60; and 𝜆𝑙1 = 100 and 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 = 150. As observed, both the MAE and PSNR metrics improved as the values of 𝜆𝑙1 and 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 was increased, with the best performance achieved at 𝜆𝑙1 = 100 and 𝜆𝐿𝑜𝑣𝑎𝑠𝑧 = 150. These results underscore the positive effect of the CeR-Loss cost function on the model’s overall performance.
This paper introduced CeR-Loss, a novel cost function designed to leverage contrast agent uptake for generating synthetic post-contrast images from pre-contrast images in DCE-MRI studies. This function is incorporated into two new deep learning architectures, G-RiedGAN and D-RiedGAN, which focus on contrast-enhanced regions to improve the synthesis of post-contrast images. The primary goal of these architectures is to minimize dependence on contrast agents and reduce the costs associated with DCE-MRI studies for breast cancer screening.
The proposed G-RiedGAN and D-RiedGAN models combine features from the RIED-Net and Pix2Pix architectures within the EaGAN framework. Notably, D-RiedGAN includes a filter for detecting contrast-enhanced regions, which are essential for accurately synthesizing DCE-MRI images in breast cancer detection and diagnosis. These identified contrast regions guide the network learning process through the Lovász and BCE loss functions, which are integrated into the loss function of the generator and the discriminator (CeR-Loss).
Two approaches were used for comparative evaluation. The first approach compared the performance of the proposed models (with CeR-Loss) against models documented in the literature and a set of mixed models. The results, based on MAE, PSNR, and SSIM metrics, indicate that the proposed models more effectively synthesize contrast-enhanced regions, with reduced noise and blur. The second approach assessed the impact of the CeR-Loss function on the learning process, revealing that increasing the weight of CeR-Loss positively influences the synthesis of contrast regions, as reflected by the value of the same metrics.
Although validation was limited to quantitative metrics based on pixel intensities of synthetic images, future studies should include qualitative assessments by expert radiologists to validate the diagnostic quality of these images. Additionally, future research should investigate the performance of the baseline and proposed models across heterogeneous image databases, considering variations in study quality (0.5 T, 1.5 T, 3 T, and 7 T), dosage, and contrast agents. Furthermore, incorporating synthetic post-contrast images could improve the training of breast cancer detection and classification models that use conventional MRI images, as these synthetic images might provide valuable supplementary information to enhance model performance.
This study was partially funded by the Instituto Tecnológico Metropolitano (ITM) in Medellín, Colombia, through research project P20213, as well as by the Institución Universitaria Pascual Bravo and Ayudas Diagnósticas SURA S.A.S. under agreement CE-007-2020. Additional funding was provided by SAPIENCIA, the education agency of Medellín.
The authors declare no conflicts of interest.
Sara Cañaveral: Designed and conducted the experiments, analyzed and interpreted the data, drafted the manuscript, and contributed to its final revision.
Rubén Fonnegra: Designed the experiments, analyzed and interpreted the data, and participated in both partial and final revisions of the manuscript.
Carlos Mera-Banguero: Designed the experiments, analyzed and interpreted the data, and participated in both partial and final revisions of the manuscript. Initially engaged as a professor at the Instituto Tecnológico Metropolitano (ITM), he completed his contributions while affiliated with the Universidad de Antioquia in Medellín, Colombia.