Automated segmentation of cardiac adipose tissue in Dixon magnetic resonance images

Objective: Increasing evidence suggests a strong link between excess cardiac adipose tissue (CAT) and the risk of a cardiovascular event. Multi-echo Dixon magnetic resonance imaging (MRI), providing fat-only and water-only images, is a useful tool for quantification but requires the segmentation of CAT from a large number of images. The intent of this study was to evaluate an automated technique for CAT segmentation from Dixon MRI by comparing the contours identified by the automated algorithm to those manually traced by an observer. Methods: An automated segmentation algorithm, based on optimal thresholds and custom morphological processing, was applied to the registered fat-only and water-only images to identify CAT in the volume scans. CAT contours in 446 images, from 10 MRI scans, were selected for validation analysis. Cross-sectional area (CSA) and volume were computed and compared using Bland-Altman analysis. In addition, Hausdorff distance and Dice Similarity Coefficient (DSC) were used for assessment. Results: Linear regression analysis yielded correlation of R = 0.381 for CSA and R = 0.879 for volume. When compared to the observer, the computer algorithm under-estimated CSA by 27.5 ± 40.0% and volume by 26.4 ± 10.4%. The average bidirectional Hausdorff distance was 26.2 ± 16.0 mm while the average unidirectional Hausdorff distances were 24.5 ± 15.7 mm and 12.4 ± 11.7 mm. The average DSC was 0.561 ± 0.100. The time required for manual tracing was 15.84 ± 3.73 min and the time required for the computer algorithm was 2.81 ± 0.12 min. Conclusions: This study provided a technique, faster and less tedious than manual tracing (p < 0.00001), for quantification of CAT in Dixon MRI data, demonstrating feasibility of this approach for cardiac risk stratification.


Introduction
Obesity has become a major health crisis in recent years. Some estimates suggest the global number of obese people has surpassed 700 million. This pandemic has a substantial impact on world health and may contribute to a decrease in life expectancy in this century [1] . Excess adipose tissue acts as a storage mechanism for excess energy, but the function of adipose tissue is far more complex. It is an active autocrine, paracrine and endocrine organ and can influence a number of chronic conditions [2] .
There is increasing evidence of a strong link between obesity and cardiovascular disease [3,4] . This connection has led to the classification of obesity as a major, modifiable cardiovascular risk factor [5] . One traditional measurement of obesity is Body Mass Index (BMI), but this summary statistic does not adequately reflect the complex mechanisms associated with excess adiposity around various organ systems. Studies have demonstrated that those with similar BMI may have substantially different metabolic profiles or cardiovascular risk [6] . One such depot, that can have significant influence over the likelihood of cardiovascular disease, is the epicardial adipose tissue (EAT). It develops between the myocardium and the visceral pericardium. In addition, adipose tissue can develop between the two layers of the pericardiumthe visceral and parietaland is called the paracardial adipose tissue (PAT). We subsequently refer to the combined depot as the cardiac adipose tissue (CAT). The coronary vasculature is adjacent to the myocardium and makes direct contact with the EATthere is no fascia between them. These metabolically dynamic and active fat depots secrete cytokines that can influence the development of coronary artery disease (CAD) inside the arterial walls. The close contact of the coronary arteries to the CAT can encourage the onset of CAD, or the continued progression of CAD if already present [7] .
Studies have demonstrated the strong relationship between CAT volume and the presence of CAD [7,8] and the severity of stenosis. There are also studies to suggest that the relationship is more complex and influences aspects of CAD beyond mere presence and quantity. Smaller, heterogeneous plaques (i.e. "mixed" plaques), consisting of different tissue types, and particularly a necrotic core, are more metabolically active. While they do not necessarily block enough of the lumen to be symptomatic, they are more dangerous and more likely to cause an acute event. Alexopoulos et al. demonstrated that increased CAT volume, measured by cardiac computed tomography (CT), is associated with development of these dangerous plaques [8] . Another study, also using cardiac CT, suggested increased CAT volume correlated to increased likelihood of the presence of disease specifically in the left anterior descending (LAD) coronary artery [9] . Based on this strong connection between CAT volume and the presence and type of CAD, non-invasive, in-vivo quantification of CAT volume has the potential to characterize the risk of acute coronary syndromes or heart attack.
Cardiac CT has proven useful in quantifying CAT [8][9][10][11][12][13][14] , but is associated with the delivery of a large dose of radiation to patients. With increasing utilization of CT, the annual per-capita collective dose has increased by 600% since 2006 [15] . Measures can be taken to mitigate and reduce the dose, but radiation risk remains, particularly for longitudinal studies requiring multiple scans [16] . Magnetic resonance imaging (MRI) has emerged as an effective imaging modality for identifying whole body fat and ectopic fat depots [16] . Specifically, "Dixon" technique sequences, by creating a registered set of in-phase (IP) and out-of-phase (OOP) image volumes, can highlight adipose tissue by producing fat-only and water-only images [17] . The three-dimensional (3D) data sets created by MRI are rich with information, but analyzing the images to identify and quantify the CAT is cumbersome and time-consuming and has become a substantial bottleneck in studies quantifying adipose tissue, particularly where multiple scans are acquired from different time points in longitudinal studies [16] . Most of the work to-date on automated image analysis has been based on cardiac CT imaging [12,14,18] . Little work exists in quantifying CAT from cardiac MRI. Cristobal-Huerta et al. have done some development using a two-dimensional (2D) technique based on pixel values and some a priori anatomical information, but have only applied it on a few image slices in each scan [19] . Homsi et al. investigated the use of cardiac-specific Dixon MRI scans for quantifying EAT and PAT volumes, developing custom software combining user-defined regions-of-interest in each 2D image slice and thresholding to perform segmentation, but no in vivo reference standard was used [20] .
In this study, a custom algorithm for fully automatic segmentation of CAT volume from thoracic and abdominal Dixon MRI scans was developed [21] for use as a monitoring tool in pre-to post-intervention studies of obesity and longitudinal changes of ectopic fat depots. The objective of the current study was to assess whether CAT could be reliably segmented from Dixon MRI scans intended for studies of obesity and validate the automatic segmentation algorithm by comparing to a manually-derived in vivo reference standard.

Study Population
The segmentation algorithm validation was performed using MRI scans of 13 female subjects between the ages of 18 and 35 years old and with BMI values from 30 to 39.99 kg/m 2 (Obese Class I and II). All subjects were established as healthy, completed a medical history screening, and were required to maintain adequate compliance with the study requirements. Subjects were excluded if they had one or more of the following six characteristics: (1) at risk for cardiovascular events, (2) pregnant, (3) were on medications affecting endocrine or cardiovascular function, (4) have high blood pressure, (5) are already engaging in strength-training more than twice per week or any type of moderate-high intensity exercise program, and (6) are smokers. All subjects provided informed consent and the study was approved by the local Internal Review Board.

MRI Data Acquisition
For the MRI data acquisition, each subject lay in a supine position with arms extended above their head and a triangular support pillow beneath their knees. Body coils were used to acquire data from both the thoracic and abdominal cavities using a 3.0 T Magnetom Skyra system (Siemens Healthcare, Erlangen, Germany) at the Texas Tech Neuroimaging Institute. A multi-echo T1-weighted Volumetric Interpolated Breath-hold Examination (VIBE) 2-point Dixon imaging sequence [22,23] with a 20-second breath hold (TR/TE = 3.97/1.23 ms, flip angle = 9°) was used. Volumes with 120 slices (2.5 cm thick with 20% gap) and 320 x 260 pixels in each slice were acquired over a field-of-view (FOV) of 45 cm, resulting in an effective voxel size of 1.40625 x 1.40625 x 3.0 mm. This sequence is a multi-echo acquisition that produces IP and OOP registered image volumes, used by the scanner software to reconstruct the fat-only and water-only images.

Automatic Segmentation
The automatic segmentation software used in this study was initially developed and implemented in MATLAB (MathWorks, Natick, MA) by Hill et al. [21] and further modified in this study. As shown in the flow chart in Figure 1, the cardiac fat segmentation was performed on a slice-by-slice basis in three main steps: (1) preprocessing of the image data, (2) framing of the heart and the cardiac fat by identifying the lungs and aorta, and (3) segmentation of the heart and the cardiac fat. The preprocessing consisted of a correction for inhomogeneity, histogram-based contrast enhancement, and computation of the void signal (used subsequently to aid in identification of the lungs). The inhomogeneity field is estimated using a grayscale morphological closing operation with a large circular structure element. After the inhomogeneity field was removed from the data, a histogram is calculated and then smoothed via median filtering. The smoothed histogram was used to enhance the contrast of both the water-only and fat-only images. The peak intensity was identified and assumed to be foreground. Values above this intensity were clipped and the remaining range was remapped to 256 gray levels. Example interim images from the preprocessing are shown in Figures 2b and 2d. The final step in pre-processing was the use of the median-filtered histogram to determine a threshold for foreground and background signal intensities using Otsu's method [24] . The threshold was used to create a binary image from both the enhanced water-only and fat-only images. Then, a mask image was created from each using the corresponding water-only or fat-only threshold. The two masks were added together to create the body mask and binary morphology was used to fill any internal holes. An example set of interim images for this step of the process is shown in Figure 3.  Figure 3.

The preprocessed water-only image (a), preprocessed fat-only image (b), the binary image resulting from thresholding the water-only image (c), ), the binary image resulting from thresholding the fat-only image (d), and the final body mask (e), are shown in
The framing process, indicated in the middle block of the flow chart in Figure 1, began with a calculation of the void signal, as shown in Equation (1).
The void image was corrected for inhomogeneity, as described previously. An optimal threshold (via Otsu's method [24] ) was computed from the corrected image and used to create a binary mask for the lungs. An example corrected void image and binary lung mask are shown in Figures 4a and 4b, respectively. The aorta was identified using a repeated series of binary dilation and fill operations beginning with a seed point. An example lung mask and aorta are shown in Figure 4c. The lung mask and aorta were used to frame in an area for segmentation of the heart. A seed point, defined as the centroid of the region between the lungs and aorta was used to locate the heart. However, near the apex of the heart, the absence of the lungs and the presence of the diaphragm and the liver confound this approach. In this region, since the lung mask could not be used to frame the heart, the water-only image was used to compute a Bayesian threshold between the higher gray-scale distributions of the diaphragm and liver and the lower distribution of the gray-scale intensity of the heart. The convex hull of the resulting segmented heart was dilated to create a mask for the identification of the cardiac fat. Pixels in the fat-only image mask in the dilated convex hull region were labeled as CAT. Finally, contours outlining the CAT in each 2D image slice were extracted and used for comparison with the manually traced contours. In addition, binary images, with pixels labeled as CAT were also used as output for comparison.

Validation Sample and Statistical Analysis
Among the 13 subjects, 10 scans were randomly chosen for validation analysis, resulting in a total of 446 images available for comparison (44.6 ± 2.6 images per case). A custom cross-platform software package for manual tracing was developed for the study, as shown in Figure 5. The tool provided complete control for the user, including mechanisms to blend the water-only image and the fat-only image together, independently selecting the colormap and transparency level for each to facilitate visual interpretation based on the fat-only image, the water-only image, or a blend of the two. In addition, the tool also allowed users to make manual corrections to their traced fat depots as necessary. The specific images for analysis were selected by the operator as those that contained CAT. Cross-sectional area (CSA) of the CAT was computed for each of the images selected. In addition, volume comparisons were made based on a calculation of CAT volume using the slice thickness and trapezoidal integration [25] .

Figure 5. Screenshot of custom software developed for manual tracing
The fat-only image is displayed with a "green" colormap in this example, and the water-only image is shown with a "bone" colormap. The two images are blended together with the semi-transparent water-only image overlaid on the fat-only image for combined visualization. The operator could view either image separately or as a blend while tracing the CAT and could also make manual corrections to their contours as they deemed necessary.
CSA was used as the primary metric, but it is a derived measurement and does not adequately determine if contours agree on a point-by-point basis. To assess the agreement among the contours in a more stringent manner, Hausdorff distance [26] and the Dice Similarity Coefficient (DSC) [27] were also used. The Hausdorff distance is the largest Euclidean distance between points on the two contours being compared; therefore, contours that align well have small Hausdorff distances and those with large mismatch have large Hausdorff distance. It is a bidirectional measure, in that a distance is computed for each contour and the overall Hausdorff distance is the maximum of the two. The DSC is a measure of overlapping regions and calculated as the ratio of the number of pixels in the intersection to the number of pixels in the sum of the two regions. For all slices with CSA measurements, direct comparisons were made to assess algorithm validity using Bland-Altman analysis [28] .
To assess the time required for analysis, the custom tracing software included a feature that automatically tracked the time required for the operator to perform the manual tracing. Times were also recorded for the segmentation algorithm and compared to manual tracing times using the paired Student's t test. p-values less than 0.05 were considered significant. Regression analysis, correlation coefficient, and comparisons of difference vs. mean [28] were also used to determine the relationship between the algorithm and the manual reference standard.

Results
Example manually-traced contours, compared to automatically identified CAT, are shown in Figure 6. When comparing on a slice-by-slice basis, the computer algorithm underestimated the CAT CSA by 27.5 ± 40.0%. A plot of the difference vs. mean for all 446 observations is shown in Figure 7, demonstrating the positive correlation (R 2 = 0.381), but with underestimation by the computer algorithm. The average bidirectional Hausdorff distance was 26.2 ± 16.0 mm, while the unidirectional Hausdorff distances were 24.5 ± 15.7 mm and 12.4 ± 11.7 mm. The average DSC for the slice-by-slice comparison was 0.561 ± 0.100.

Bland-Altman charts for cross-sectional area (CSA) for 446 observations and volume comparisons for 10 observations. Direct comparison with linear regression are shown on the left. Unity line (solid) and regression line (dashed) are shown. Difference versus mean are shown on the right. The mean difference (heavy dashed line) and mean difference ± 2 standard deviations (light dashed line) are indicated.
In addition, volume calculations were also used to evaluate the computer algorithm in 10 different MRI scans. The volume computed from the computer-derived contours also underestimated the volume as calculated from the observer-traced contours (26.4 ± 10.4%), but had stronger correlation than the CSA alone (R 2 = 0.879). For further understanding of the agreement between computer and observer volumes, plots of difference vs. mean are shown in Figure 7.
The average time for the observer to trace the images from each case was 15.84 ± 3.73 min. The time required for the computer algorithm to run was 2.81 ± 0.12 min, which was significantly faster than the manual tracing analysis (p < 0.00001).

Discussion
Adipose tissue around the internal organs, is associated with many non-communicable diseases, because the adipose tissue functions as more than mere storage and can actively secrete cytokines that have paracrine and systemic effects. Differentiation of VAT and subcutaneous adipose tissue (SAT) has become a focus in studies and MRI has emerged as the standard for non-invasive assessment. While the advantages of MRI for monitoring ectopic fat depots are importantit does not require ionizing radiation, and is non-invasiveon the other hand, it generates a large amount of data requiring analysis for quantification of volumes. Many studies have pursued the development of automated or semi-automated techniques to segment VAT and SAT from MRI images to facilitate its use in assessment of regional adiposity [29][30][31][32][33][34] .
Due to the increasing evidence of the connection between CAT and cardiovascular risk [6-8, 12, 35, 36] , further categorizing fat depots to specifically identify CAT has become more relevant for use as an endpoint in broader longitudinal studies of obesity. Anthropometric measures of adiposity, such as Body Mass Index (BMI), do not correlate well with visceral adiposity or presence of CAT [37] . When patients start a weight-loss program with exercise or nutritional intervention, changes in BMI may not be immediate or apparent. However, reduction in CAT and the corresponding reduction in cardiovascular risk may be an important result of the intervention that can influence the patients' motivation to continue the program, even without an obvious loss of body mass.
The most accessible imaging modality for measurement of CAT is transthoracic echocardiography (TTE), but the typical approach originally proposed by Iacobellis et al. is to make a single user-defined linear thickness measurement on the free wall of the right ventricle [38] . The right ventricle was chosen because it is known to contain the thickest layer of CAT and is visible in the traditional parasternal short and long axis views [39,40] . CAT thickness may have some utility as a measurement tool, but the distribution of CAT across the surface of the heart is not constant. Therefore, a single, linear measurement does not provide information on the distribution, overall volume, or change in CAT volume over time. Many have worked to address the limitation using cardiac CT [10][11][12][13][14] , but MRI does not require radiation and pulse sequences can be designed to target adipose tissue specifically [19,20,41,42] .
Fluchter et al. developed a black blood sequence with a water-suppression pre-pulse to assess CAT [42] . They compared volumes to a pathological study and assessed inter-observer agreement, but used manual tracing for segmentation, a time-intensive process. Homsi et al. investigated a semi-automated approach using a cardiac-specific Dixon sequence and software to attempt to differentiate EAT from PAT, demonstrating strong correlation (R 2 = 1.00) and a maximum difference of 6% when comparing fat volume to a known volume in a phantom. They also assessed inter-observer agreement from clinical images by having two operators use the system to measure EAT and PAT separately. Correlation was reported as R=0.998 for EAT and R=0.999 for PAT, but no clinical reference standard was used to assess validity of the system for in vivo images. In addition, the operator was required to manually input two regions of interest for each image slice.
The current study demonstrated proof-of-concept of an automated method of segmentation of CAT from Dixon MRI images. There was positive correlation between the CAT CSA calculated from computer-generated contours and manually-traced contours (R 2 = 0.381), but the automated segmentation algorithm under-estimated CAT CSA by 27.5% when compared to CSA from manually-traced contours. Segmentation algorithms can be categorized as one of two different basic types -a more "bottom-up" approach based on pixel or voxel intensity values used for thresholds or edge detection algorithms creating a series of edge-fragments requiring linking to form larger objects -and a more "top-down" approachwhere a shape-based model is used to apply high-level a priori information on the structure of interest. While a modest amount of shape information was applied in this study, the primary approach was reliant on voxel intensities for the determination optimal threshold values to build tissue groups. This type of analysis led to segmented CAT that contains more and smaller, fragmented objects, whereas, when a user identifies CAT via manual tracing, it results in fewer, but more smooth and larger objects. The algorithm presented in this evaluation is automated and does not involve any user-based assessment or correction of results. Though the correlation was moderate and the automated algorithm underestimated CSA on a slice-by-slice basis compared to a manually-derived reference standard, these results suggest that a segmentation algorithm that utilizes both the lower-level voxel intensity, but also leverages this higher-level anatomical shape information may be well suited for this application. In addition, efficient interaction methods applied after segmentation could improve the accuracy of the technique while maintaining the large advantage over manual tracing all of the image slices, a laborious process.
Cristobal-Huerta presented preliminary work on a segmentation algorithm using a GVF-based active contour ("snakes") algorithm that does leverage the high-level shape information [19] . They compared the results to contours traced by an observer, but only present a single, volumetric comparison of adipose tissue volume. Homsi et al. [20] and Fluchter et al. [42] also present volumetric comparisons. Certainly, CAT volume is a target end-point, but volumetric comparisons alone can mask errors in segmentation performance by averaging out slice-to-slice differences. The current study, therefore, also included slice-by-slice comparisons to use as a basis for further understanding of the algorithms' performance. In addition, CSA is a derived measurement that can obscure specific performance characteristics of the algorithm within single slices, so the more stringent measures of Hausdorff distance [26] and DSC [27] were used for direct comparisons between contours derived by the automated algorithm and those from the manual reference standard. Figure 8 shows examples of contours from images with large Hausdorff distances and small DSC where there is substantial disagreement. It also shows contours from images visualizing smaller Hausdorff distance and improved DSC where the agreement is better, providing context for the summative data presented in this study. Table 1 presents the specific metrics for the image slices in Figure 8, including CSA for the computer algorithm and the observer, the Hausdorff distance (with both unidirectional measurements), and the DSC. Many slices with large Hausdorff distances or small DSC, implying significant disagreement between contours, have small "islands" of CAT as determined by the automated algorithm, demonstrating the implications of the more "bottom up" approach to segmentation (Figure 8a). In these cases, there was enough fat signal in the voxel for the algorithm to identify it as CAT, but very small regions like this were not identified via manual interpretation. In most slices, the automated algorithm successfully identified the location of individual fat deposits as identified by the observer, but did consistently underestimate the size.
While other studies have discussed differentiating EAT from PAT, the algorithm evaluated here targeted the combined EAT/PAT depot and called it CAT. The ability of MRI to adequately resolve the EAT from the PAT remains unclear.
In-plane resolution for other MRI studies published are between 1.4 and 1.8 mm [20,42] and the current study had an in-plane resolution of 1.40625 x 1.40625 mm. EAT thickness can expand to approximately 20 mm in morbidly obese patients, but averages are only in the 6-8 mm range [43] . Therefore, typical EAT thickness values will be 3-4 pixels in width and ranging to approximately 14-15 pixels only in extreme cases. While the current algorithm underestimates the amount of fat, the qualitative and quantitative results of this study suggest automated identification of the combined EAT/PAT depot is feasible, and that alone may be sufficient for risk stratification [20,41] . Future work will include the evolution of the algorithm to improve the identification of the CAT and differentiating EAT from PAT, both by utilizing proof-of-concept techniques assessed here, and by leveraging higher-level, anatomical shape information.   One limitation of the current study was that the MRI protocol used was a breath-hold VIBE sequence designed for adiposity assessment across a large region of the body and not focused specifically on cardiac fat. In addition to CAT, the image data collected was intended to be used to used to quantify visceral adipose tissue (VAT) and subcutaneous adipose tissue (SCAT). To keep the protocol for data acquisition efficient, no additional ECG-gated sequence was used for cardiac imaging. As a result, motion artifacts may have obfuscated the pericardium or other details associated with visual identification and automated detection of the CAT. Further studies will investigate application of the current algorithm, and evolved versions of it, on cardiac-specific, ECG-gated MRI. Another limitation is that the clinical reference standard was based on only a single observer, not providing any context regarding inter-observer variability. Future studies will include additional expert observers to provide an assessment of the automated algorithm results within the context of inter-observer variability, an improved approach for validation of medical segmentation algorithms [44] . In addition, there were only ten image scans randomly selected from the data set to use for manual tracing comparison. There were 446 slices available for individual comparison, but a wider variety of anatomy will elicit other performance issues with segmentation algorithms and will be used in future work. In conclusion, this study utilized a rigorous framework to evaluate an automated algorithm for identification of CAT from Dixon MRI images, demonstrating that CAT volume can be isolated from these image sets and used for cardiovascular risk stratification.