A QuickStart Guide to Using Medical Imagery for Machine Learning

Automating an extremely important sector like Health Care with the power of Machine Learning can bring a very great and positive impact to the world.

In this article, let us study how medical datasets work, how we can pre-process them and get them ready for machine learning models as well as explore some models and techniques that are known to work very well with such data.

Basic Terms & Dataset Overview

Medical imagery such as that of captured by a CT/MRI is capable of providing us the full 3D structure of the body part by combining all the 2D images or ‘slices’ given to us as output by them. Usually in medical datasets provided to us, we are given a bunch of the same slices to work on in special image formats.

Two more of the most basic terms one must know-

Study ID: A unique identifier for the procedure done on each patient or subject. Each study ID might have multiple series and series IDs associated to it.

Series ID: Subsets of images within a study ID, so that images can be organised further based on some common properties [e.g. a common plane/view used (explained below)].

Planes/Views Used

These 2D slices can be acquired along three primary anatomical planes:

Axial (top to bottom)
Sagittal (left to right)
Coronal (front to back)

and hence, they would also need to be stacked differently in their respective plane to form the 3D image.

Contrast Mechanisms

Medical scanners can adjust how different parts of the body appear in images by changing their brightness intensity or contrast to make certain areas stand out. These mechanisms that decide which parts would be darker and which ones would be lighter are standardised and also given names (e.g. in in MR scans, we have T1-weighted images where bones appear lighter along with T2-weighted images where bones appear darker).

File Formats

There are various special image formats that are need to be dealt with while working with such a data, like-

DICOM

DICOM stands for Digital Imaging and Communications in Medicine. It’s a crucial standard in medical imaging and uses ‘.dcm’ as its extension. This format also has a bunch of metadata attached to it telling about the patient, scanner used, plane of the image, scanning mechanism used etc.

A Python library called ‘PyDICOM’ is used to perform operations with these files. Let us try to use it to read a DICOM and access its metadata-

import pydicom# Read DICOM filedcm = pydicom.dcmread("sample_dicom.dcm")# Access metadataprint(f"Patient Name: {dcm.PatientName}")print(f"Modality: {dcm.Modality}")

We can also convert the DICOM to a PNG/JPG to view it easily-

import pydicomimport numpy as npfrom PIL import Imagedef dicom_to_image(dicom_file, output_file):    # Read the DICOM file    dicom = pydicom.dcmread(dicom_file)        # Get pixel array    pixel_array = dicom.pixel_array        # Rescale pixel values to 0-255    pixel_array = pixel_array - np.min(pixel_array)    pixel_array = pixel_array / np.max(pixel_array)    pixel_array = (pixel_array * 255).astype(np.uint8)        # Create PIL Image    image = Image.fromarray(pixel_array)        # Save as PNG or JPG    image.save(output_file)    print(f"Image saved as {output_file}")# Usagedicom_file = "sample_dicom.dcm"output_file = "output_image.png"  # or "output_image.jpg" for JPGdicom_to_image(dicom_file, output_file)

NifTI

The NifTI (Neuroimaging Informatics Technology Initiative) file format is a standard way of storing brain imaging data. It was created to make it easier for researchers and doctors to share and analyze brain scans. Although, now it is commonly used for other scans as well. The extension that this format uses is ‘.nii’ or also ‘.nii.gz’ when it’s compressed.

It is also a common practice to convert your 2D DICOM slices into a single 3D NifTI file. This file can then be viewed using specialised softwares for this domain.

The Python library ‘Nibabel’ is used to deal with this format, let’s try to use it-

import nibabel as nib# Load NIfTI fileimg = nib.load('sample_nifti.nii')# Get image data as numpy arraydata = img.get_fdata()# Access header informationprint(f"Image shape: {img.shape}")print(f"Data type: {img.get_data_dtype()}")# Get affine transformation matrixaffine = img.affineprint(f"Affine matrix:\n{affine}")

Let’s also try to convert DICOM files of the same series into a NifTI file so we can visualise it later using a software-

import osimport pydicomimport nibabel as nibimport numpy as npfrom collections import defaultdictdef dicom_series_to_nifti(dicom_folder, output_file):    # Read all DICOM files in the folder    dicom_files = [pydicom.dcmread(os.path.join(dicom_folder, f))                    for f in os.listdir(dicom_folder)                    if f.endswith('.dcm')]        # Sort files by Instance Number    dicom_files.sort(key=lambda x: int(x.InstanceNumber))        # Extract pixel data and create 3D volume    pixel_arrays = [dcm.pixel_array for dcm in dicom_files]    volume = np.stack(pixel_arrays, axis=-1)        # Get affine matrix    first_slice = dicom_files[0]    pixel_spacing = first_slice.PixelSpacing    slice_thickness = first_slice.SliceThickness    image_position = first_slice.ImagePositionPatient        affine = np.eye(4)    affine[0, 0] = pixel_spacing[0]    affine[1, 1] = pixel_spacing[1]    affine[2, 2] = slice_thickness    affine[:3, 3] = image_position        # Create NIfTI image and save    nifti_image = nib.Nifti1Image(volume, affine)    nib.save(nifti_image, output_file)# Usagedicom_folder = 'dicom_series_folder'output_file = 'output.nii.gz'dicom_series_to_nifti(dicom_folder, output_file)

These are the 2 most popular format but other than them, many more such as MetaImage Header Archive (.mha) are also in use.

Visualisation Softwares

There are various special softwares that exist specially to deal with medical imagery — be it simply viewing them, segmenting them or analysing them.

Some popular softwares include-

ITK-SNAP: A platform to create segment structures for 3D medical images
3D Slicer: An open-source platform for medical image informatics, image processing, and three-dimensional visualization
OsiriX: A DICOM viewer for medical imaging, popular among radiologists

Popular Tools, Models and Techniques

Study-Level Cropping

One of the most important things while training a model using medical data is to understand the exact problem/disease/condition we are dealing with and read up about it. Many times only a certain region-of-interest is of use to us within the scans that we have due to the nature of the problem.

The crops of these region-of-interests from base scans are called study-level crops. A model works much better if trained on these correct crops instead of the full scan.

Mostly, automating this process of study-level-cropping over an entire dataset requires training a segmentation model which can segment out parts-of-interest from the scans from which we can map out the bounding box that we need.

TotalSegmentator

TotalSegmentator is a tool also available as a Python library based on a nnU-Net which is trained on a bunch of medical data that allows users to segment all major body parts in CT scans and now — MRI scans automatically with just a call.

It has various modes that can be used, its default modes are capable to segment all the parts but other than them, it has specialized ones for different categories of parts as well. Let’s try to use to it to segment out a few DICOM MRI axial slices of lumbar spine that we have-

import osimport numpy as npimport nibabel as nibimport SimpleITK as sitkimport matplotlib.pyplot as pltfrom totalsegmentator.python_api import totalsegmentatordef convert_dicom_to_nifti(dicom_directory, nifti_output_path):    """    Convert DICOM series to NIfTI format.        Args:    dicom_directory (str): Path to the directory containing DICOM files.    nifti_output_path (str): Path where the NIfTI file will be saved.    """    reader = sitk.ImageSeriesReader()    series_ids = reader.GetGDCMSeriesIDs(dicom_directory)        if not series_ids:        raise ValueError("No DICOM series found in the specified directory.")        dicom_files = reader.GetGDCMSeriesFileNames(dicom_directory, series_ids[0])    reader.SetFileNames(dicom_files)    image = reader.Execute()        sitk.WriteImage(image, nifti_output_path)    print(f"Converted DICOM to NIfTI: {nifti_output_path}")def segment_and_visualize(nifti_file, segmentation_output_dir, visualization_output_dir):    """    Perform segmentation on a NIfTI file and visualize the results.        Args:    nifti_file (str): Path to the input NIfTI file.    segmentation_output_dir (str): Directory to store segmentation results.    visualization_output_dir (str): Directory to store visualization images.    """    # Perform segmentation    totalsegmentator(nifti_file, segmentation_output_dir, task="total_mr", verbose=True)        # Create visualization directory    os.makedirs(visualization_output_dir, exist_ok=True)        # Visualize original slices    visualize_slices(nifti_file,                      output_path=os.path.join(visualization_output_dir, "original_slices.png"))        # Visualize vertebrae segmentation    vertebrae_segmentation = os.path.join(segmentation_output_dir, "vertebrae.nii.gz")    visualize_slices(nifti_file,                      segmentation_path=vertebrae_segmentation,                     output_path=os.path.join(visualization_output_dir, "vertebrae_segmentation.png"))def visualize_slices(image_path, segmentation_path=None, output_path=None, num_slices=9):    """    Visualize multiple slices of a 3D image with optional segmentation overlay.        Args:    image_path (str): Path to the NIfTI image file.    segmentation_path (str, optional): Path to the segmentation NIfTI file.    output_path (str, optional): Path to save the visualization.    num_slices (int): Number of slices to visualize.    """    image_data = nib.load(image_path).get_fdata()        if segmentation_path:        segmentation_data = nib.load(segmentation_path).get_fdata()        total_slices = image_data.shape[2]    slice_indices = np.linspace(0, total_slices-1, num_slices, dtype=int)        fig, axes = plt.subplots(3, 3, figsize=(15, 15))    for i, ax in enumerate(axes.flat):        if i < num_slices:            slice_num = slice_indices[i]            ax.imshow(image_data[:, :, slice_num].T, cmap='gray')            if segmentation_path:                ax.imshow(segmentation_data[:, :, slice_num].T, alpha=0.5, cmap='jet')            ax.set_title(f'Slice {slice_num}')            ax.axis('off')        plt.tight_layout()        if output_path:        plt.savefig(output_path)        plt.close(fig)        print(f"Saved visualization to {output_path}")    else:        plt.show()# Main executionif __name__ == "__main__":    # Define paths    dicom_directory = "sample_dicom_slices_folder"    nifti_file = "mri_scan.nii.gz"    segmentation_output_dir = "segmentation_results"    visualization_output_dir = "visualizations"    # Convert DICOM to NIfTI    convert_dicom_to_nifti(dicom_directory, nifti_file)    # Perform segmentation and visualization    segment_and_visualize(nifti_file, segmentation_output_dir, visualization_output_dir)

Let us see the original slices as well as the segmented slices-

MaxViT-UNet

This is an architecture that uses MaxViT (Multi-Axis Vision Transformer) as its backbone which is a model that combines features of both a ViT as well as a convolution model. Along with this, the UNet part allows for precise localization mechanisms — making it good for medical segmentation.

This research paper titled ‘MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation’ also shows its prowess in the task.

2.5D CNN

This architecture bridges the gap between 2D and 3D but there’s actually nothing ‘2.5D’ about the model itself.

What’s 2.5D is the data that we input into it – a common practice is to stack adjacent slices or select ‘n’ number of equivalent slices from a series and stack them together to give depth to the data and hence, also making it 3D without it actually being 3D.

In case, we are using study-level crops to create a 2.5D image, we can even stack a segmentation mask of the actual part-of-interest on it in case the crops are also capturing noise, this can make the model even better.

Multi-View CNN

If our data provides us with different views, then we can use all of them through a Multi-View CNN or MVCNN which allows for different input streams. All the input streams process the different views seperately, hence capturing all of their information efficiently.

Once the data has gone through the different streams, the information is combined or fused. For the fusion process, we have a lot of options that we can consider — all of them having different strength and weaknesses (e.g. if the different views have some sort of sequence that we’d like to capture, we can use a LSTM or GRU layer, otherwise we can also use a simple concat function).

Top 12 Scale Insights Alternatives & Competitors for 2025

Amazon Marketing Cloud (AMC): A Beginner’s Guide to Data-Driven Amazon Marketing