
Overview
A Sankey diagram is a type of flow diagram used to visualize the magnitude of flow between different nodes or processes in a system. It is particularly useful for illustrating energy, material, cost, or other types of flows. The width of the arrows in a Sankey diagram is proportional to the quantity of the flow being represented, making it easy to identify the most significant transfers within a system.
Here is an example of a Sankey diagram visualizing income flow.

Steps to Create a Sankey Diagram in R with networkD3
To create a Sankey diagram in R, install and load the required packages as shown below.

Next, set your working directory and load your dataset.

View the sample dataset (see the data I am using in this case)

Next, create nodes data frame using the code below.

The above code results in a data frame with a single column name that contains all unique values from the source and target columns of the original data frame.
See the resulting nodes data frame below.

Next, compute the IDsource and IDtarget columns using the next line of codes.

Note: For each value in a$source the above code finds the position (index) of that value in nodes$name (same applies for IDtarget column).
After finding the positions, the code subtracts 1 from each position. This is done to create a zero-based indexing.
Next, add a group column to be used in applying color to links.

Notice when you preview your data, three columns have been added to the original data frame namely, IDsource, IDtarget, and group as shown below.

Next, add a group column to the nodes data frame, to be used in applying color to nodes - use the code below.

Below see the nodes data frame incorporating the group column.

Next create color mapping for the categories "A", "B", and "C", associating each category with a specific color using the code below.

Using sankeyNetwork function, create a Sankey network diagram by specifying the details as follows.

Executing the above code generates the view below.

Other Sankey diagram examples created in the same procedure include.
Income Sankey Diagram

Balance Sheet Sankey Diagram

Marketing Sankey Diagram

Recruitment Process Sankey Diagram

Conclusion
Sankey diagrams are a powerful and versatile data visualization tool designed to illustrate the flow of values, resources, or information between different stages or nodes within a system. These diagrams are particularly effective in highlighting complex processes and distributions by visualizing transitions and changes over time or across different states. Sankey diagrams are widely applicable across various fields, including customer journey mapping, supply chain analysis, energy flow visualization, and financial resource allocation. As data visualization tools continue to evolve, Sankey diagrams are likely to remain a valuable asset for understanding complex systems and processes, especially when combined with interactive and dynamic visualization capabilities.
If you like the work we do and would like to work with us, drop us an email on our contacts page and we’ll reach out!
Thank you for reading!