Experience the power of Luzmo. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.
In statistical analysis and exploratory data analysis, a box plot is the superstar of data visualization. While it may not be as popular as a pie chart or something else in Excel’s top 10 hits, it’s one of the most useful chart types for industries such as data science, machine learning, and healthcare.
Today, we take a look at box plot diagrams: what they are, what (not) to do with them, and when it’s best to use them for data visualization.
A box plot diagram, also known as a box and whisker plot diagram, is a common way to visualize the distribution of a dataset based on five points:
A box plot diagram has fairly standard elements:
Box plots are commonly used in exploratory data analysis, for situations such as comparing distributions, identifying skewness and spread, highlighting outlier, and summarizing large data sets.
Box plot diagrams are very handy, but they only work well in certain situations. Here is when you can get the most out of a box and whisker plot.
One of the primary uses for a box plot diagram is to compare the distribution of continuous data (e.g. sales revenue, test scores, response times) across different categories (e.g. regions, departments, or experiments).
In a box plot, every group or category is represented by its own box. This lets you compare the central tendency (median), the spread (IQR), and the presence of outliers in each group. The reader can easily spot which group has higher variability or different central values.
For example, you want to compare the sales performance in different regions (North, South, West, East). With a box plot, you can show the sales distribution for each of the regions. As a result, you can quickly spot the regions with the highest and lowest sales variability and outliers.
With large data sets, you should be able to spot outliers easily. These are data points that vary significantly from the rest of the data set, which can imply data errors, rare events, or influential observations.
In a box plot, outliers will fall outside of the whiskers, and it’s typically 1.5 the IQR from the quartiles.
Let’s say you run a clinical trial and you want to explore the recovery time for patients. If some patients’ recovery times are longer or shorter than the average, this will show up on the box plot. The researchers can then immediately investigate the reasons behind this.
A box plot helps you understand if the data distribution is symmetric or skewed, which is important when applying statistical techniques, e.g., parametric or non-parametric tests.
In a box plot, if the median line is closer to the top or the bottom of the box or if the whiskers are not even, it means that the data is skewed. When the box plot is symmetrical, it means that the data is distributed evenly.
For example, you’re analyzing the income distribution in a company. If the distribution of the income is skewed (for example, some employees are making more compared to others), the whisker on the box plot will be longer on the high-income side, which shows that the data is skewed.
When you show individual data points that would be overwhelming because of a large data set, a box plot diagram helps you quickly summarize the most important data.
Box plots condense the key statistical properties of large data sets into a format that is reader-friendly. It highlights the range, quartiles, and outliers without overwhelming the reader with too many details.
For example, summarizing test scores from a nationwide exam across thousands of students. Instead of showing the score for every student, the box plot diagram summarizes the distribution of scores, allowing the educator who is reading to quickly assess the median, range, and any outliers.
If you want to track performance metrics (e.g., productivity, sales, customer satisfaction scores, and others), a box plot diagram comes in handy.
This visualization type lets you compare the median, spread, and variability over time. You can use it to identify trends, periods of consistency, or fluctuations that are unusual.
For example, you could be tracking monthly traffic for a website for a year and create a box plot diagram for each month. Each of the 12 box plots can show if the traffic increased or stabilized over time and highlight outliers or months when the traffic was particularly good or bad.
So you want to visualize your data values, but you’re unsure if a box plot diagram is the right choice.
Here are some things you should and should not do.
Use box plot diagrams for comparing distributions: this visualization type is excellent for comparing skewness and spread of data across different groups.
Include clear axis labels and a legend: proper labels help understand the different ranges, medians, quartiles, and outliers.
Use box plot diagrams when summarizing large sets of data: this type of visualization is excellent for condensing large data sets into a concise five-number summary that shows the range, interquartile range (IQR), median, and outliers. When you need to show lots of data without overwhelming your audience, grab this chart type.
Clearly explain outliers: indicate outliers and explain what those data points mean (e.g., errors, anomalies or rare events)
Use color or annotation to explain key comparisons: when comparing groups in a box plot diagram, colors or annotations help emphasize the differences between them and make it easier for your target audience to understand the visualization.
Don’t use box plots for small datasets: if the data set is too small, a box plot does more harm than good and confuses the reader. Consider using dot plots or a scatter plot instead.
Don’t use box plots to show exact data points: box plots summarize a large number of data points, which means that an individual data point will be hidden. For showing the distribution of individual data points, use a scatter plot or a strip plot instead.
Don’t use them if your data is categorical: this chart type is ideal for continuous, numerical data. For categorical data, use a bar chart or a pie chart instead.
Don’t forget to check for data skewness: if your data is highly skewed, a box and whisker diagram will distort the perception of data distribution. A violin plot will be the better choice in this case.
Don’t overload the box plot with too many categories: just like with other visualization types, having too many categories results in the diagram being too difficult to read and interpret.
Box plot diagrams are some of the many visualization types supported in Luzmo, an app that allows you to add a dashboard to your software platform. You can choose from many types of visualizations: histogram, bar chart, tree map, donut chart, and many, many others. But even more importantly, you can embed those visualizations right into your app.
Want to learn more? Book a free demo with our team to find out how Luzmo can help you and your app’s end-users unlock the true power of data visualization.
Experience the power of Luzmo. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.