Contents - Clustering

Go back to home page

3.1 Data extraction for clustering

3.1.1 How data is extracted
3.1.2 Use all points of an image
3.1.3 Use the image's points centroid as data

3.2 Apply clustering algorithms on extracted data

3.2.1 KMeans
3.2.2 KMedoids
3.2.3 Agglomerative Clustering

3.3 Execute custom functions : user script environment
3.4 Clustering results

3.4.1 Cluster assignment

3.4.1.1 Save results as set
3.4.1.2 Export as CSV file

3.4.2 Graphic Visualisation
3.4.3 Internal Validation Indexes

3.4.3.1 Mean Silhouette
3.4.3.2 Calinski-Habaraz Index
3.4.3.3 Davies Boulin index

Clustering on NIfTI data

brain-Mapper aims to allow the user to use the interesting data from NIfTI files to perform clustering algorithms and thus determine the different groups of voxels.
To accomplish this, brain-Mapper extracts the data from NIfTI files and allows you to select the method you would like to apply. Clustering results can be exported as a CSV file or saved in the application, and thus be exported as new NIfTI files.

In this section we explain the main functionalities of our software around clustering

3.1 Data extraction for clustering

The NIfTI format is an image format but for some teams it is interesting to apply clustering algorithms on the list of voxels, usually represented as a list of [X_coordinate,Y_coordinate,Z_coordinate, Intensity] entries, each of which represent a voxel.
Our software extracts the data of your selected image collections before applying clustering algorithms on it.

3.1.1 How data is extracted

From the main view page, where all collections are accessible, the user can click on 'clustering' button at the bottom right, once he has selected some image collections.

A pop up dialog window will appear. It shows the number of image collections selected as well as the total number of NIfTI images to be treated.

In this windows, the user can select between two ways of extracting the image's information before the clustering view is loaded : our software creates a list of interesting> voxels by extracting the coordinates of all the voxels that have an intensity greater than 0 or by calculating the image's centroid (each image will be represented by a single point).

3.1.2 Use all points of an image

When all points are selected by choosing 'Use all region points for each file', the data used for clustering is the list of all voxels whose intensity is greater than 0.
The clustering view thus contains a data table with several data entries

3.1.3 Use the image's points centroid as data

By choosing 'Use centroids as file representation', the data used for clustering is the a list of a single voxel per file, which represents the mean voxel or center of all the voxels in the image. This type of extraction might take a while longer than the simple extraction, because several calculations are done.
The clustering view thus contains a data table with a single data entries per selected file : if a total of 4 files in 2 different image collections were selected, the data table will display 4 data entries

3.2 Apply clustering algorithms on extracted data

Once you have extracted the data from the selected images, you can choose which algorithm clustering to apply by clicking on the yellow bar at the top left ont the clustering view.

The current version of brainMapper has 3 clustering methods the user can choose from : KMeans, KMedoids and Agglomerative Clustering.
KMeans and Agglomerative Clustering come from the library for machine learning in Python, scikit-learn (for more details click here )
An implementation of KMedoids was made available by the developping team.

Once an algorithm is selected, information and the algorithm's parameters page will appear on the left section of clustering view. You can select and enter the parameters for the clustering algorithm here.

3.2.1 KMeans

The KMeans algorithm is a classic clustering algorithm.

We used the implementation from the scikit-learn library.
For more details on the algorithm and its parameters click here

3.2.2 KMedoids

The KMedoids algorithm is an alternative to KMeans when you want the centroid of each cluster is the median point of each cluster.

3.2.3 Agglomerative Clustering

Agglomerative clustering, specially Ward Linkage, is sometimes used in neuroimaging. It is a Hierarchical clustering algorithm.

We used the implementation of this type of algorithm from the scikit-learn library.
For more details on its parameters click here

3.3 Execute custom functions : user script environment

In the clustering method chooser, you may see the option 'Custom user script'.
By selecting it the user can write the algorithm she/he would like to apply to the extracted data.

INFORMATION

This functionality, although incorporated to the user interface, is not functional in the current version.

As interesting as this option was in the beginning of our project, we could not yet implement the necessary controls to make it a reality given the little time we had.
It is intended that this functionality will be worked on later on.

3.4 Clustering results

Once the selected clustering method parameters have been correctly set, you can launch the algorithm by clicking on the 'Run' button

Cluster assignments will appear on the data table and graphic visualisations of the results will appear.

3.4.1 Cluster assignment

The data table will be modified to display which data entry belongs to which cluster.

This assignment results can be saved in two ways : either by saving them as a new set in the application or by exporting the results as a CSV file.

3.4.1.1 Save as set

By clicking on the 'Save as set' button, a NIfTI file containing all points from a given cluster will be recreated, for each cluster obtained. A set containing this results will be added in the main page, in the 'Clustering' tab.

They can be exported as NIfTI files from there.

3.4.1.2 Export as CSV file

By clicking on the 'Export' button, a CSV file containing the data table can be saved on the disk

3.4.2 Graphic visualisation

Two graphic visualisations can give additional information on clustering results.

One of these graphs represents the proportion of data entries assigned to each cluster with an histogram (the graph in blue).
The second graph plots the Silhouette values for each data entry after cluster assignment (the graph in green).

3.4.3 Internal Validation indexes

Internal validation indexes can be useful when one needs to determine which clustering execution is to be retained as conclusive.

In this version of brainMapper, validation indexes are calculated automatically after a clustering algorithm is applied on data.
The internal validation indexes of the current version are :

The mean of sample's Silhouette index
Calinski-Habaraz score
Davis-Boulin index

To visualize their values, click on the 'Result Details' button on the top bar.

A pop-up window containing parameters, cluster centers and indexes values will appear.

Its contents can be saved in a .txt file by clicking on 'Save as text file'.

3.4.3.1 Mean Silhouette

Mean silhouette index is calculated with the according function from scikit-learn library. For more details click here

3.4.3.2 Calinski-Habaraz score

Calinski-Habaraz score is computed with the according function from scikit-learn library. For more details click here

3.4.3.3 Davis-Boulin index

Davis-Boulin index is a way to evaluate clustering algorithms using the features of the dataset.

NOTE

A good value of Davis-Boulin index does not imply the best information retrieval.