Aquaboutic | Focus Security Research | Vulnerability Exploit | POC


data visualization: basic charts

Posted by mitry at 2020-02-29

Data visualization, which can help users understand data, has always been a popular direction.

Chart is a common means of "data visualization", among which the basic chart - bar chart, line chart, pie chart and so on - is the most commonly used.

Users are very familiar with these charts, but if asked, what are their characteristics and the most suitable occasions (data sets)? I'm afraid there are not many people who can answer.

This paper is a note of the first chapter of data visualization with JavaScript. It summarizes the characteristics and applicable occasions of six basic charts and answers the above questions very well.

Zero and preamble

Correct a misunderstanding before you get to the point.

Some people think that the basic chart is too simple, Taiyuan start, not high-end, not atmospheric, so the pursuit of more complex chart. However, the simpler the chart, the easier it is to understand, and the faster it is to understand the data, is not that the most important purpose and the highest pursuit of "data visualization"?

So, please don't look down on these basic charts. Because users are most familiar with them, they should be given priority as long as they are applicable.

1、 Bar chart

Histogram is the most common chart and the easiest to read.

It is applicable to two-dimensional data sets (each data point includes two values X and y), but only one dimension needs to be compared. Annual sales volume is two-dimensional data, and "year" and "sales volume" are its two dimensions, but only the "sales volume" dimension needs to be compared.

Histogram uses the height of column to reflect the difference of data. The naked eye is very sensitive to the height difference, and the recognition effect is very good. The limitation of histogram is that it can only be used in small and medium-sized datasets.

Generally speaking, the X axis of histogram is the time dimension, and users are accustomed to think that there is a time trend. If the x-axis is not a time dimension, it is recommended to distinguish each column with color to change the user's attention to the time trend.

The figure above shows the number of wins of each team in a certain year in the British Football League. The X axis represents different teams, and the Y axis represents the number of wins.

2、 Line chart data

Line graph is suitable for two-dimensional large data sets, especially those where the trend is more important than a single data point.

It is also suitable for the comparison of multiple 2D datasets.

The above is a line chart of two two-dimensional data sets (carbon dioxide concentration in the atmosphere, average surface temperature).

3、 Pie chart

Pie chart is a chart that should be avoided because the naked eye is not sensitive to area size.

In the above figure, the area order of the five color blocks in the left pie chart is not easy to see. It's much easier to change it into a histogram.

In general, histogram should always be used instead of pie chart. But one exception is to reflect the proportion of a certain part of the population as a whole, such as the proportion of the poor in the total population.

4、 Scatter chart

Scatter is suitable for 3D data sets, but only two dimensions need to be compared.

The above figure shows the medical expenditure and life expectancy of each country. The three dimensions are country, medical expenditure and life expectancy. Only the latter two dimensions need to be compared.

In order to identify the third dimension, you can mark each point with text or different colors.

5、 Bubble chart

Bubble chart is a variation of scatter chart, which reflects the third dimension through the area of each point.

The picture above shows the path of Hurricane Katrina. The three dimensions are longitude, latitude and intensity. The larger the area of the point, the greater the intensity. Because the user is not good at judging the area size, the bubble chart is only suitable for the situation that does not require accurate identification of the third dimension.

If you add different colors (or text labels) to bubbles, they can be used to represent four-dimensional data. For example, the following figure shows the wind power level of each point by color.

6、 Radar chart

Radar map is applicable to multidimensional data (more than four dimensions), and each dimension must be able to be sorted (nationality cannot be sorted). However, it has a limitation that the maximum number of data points is 6, otherwise it can not be distinguished, so the application occasions are limited.

Here are the data of the five basketball players who started the Miami Heat. In addition to name, each data point has five dimensions, which are scoring, rebounding, assists, steals and covers.

Draw a radar picture, just like the following.

The larger the data points are, the more important they are. Obviously LeBron James (red zone) is the heat's most important player.

When it is necessary to pay attention, the user is not familiar with the radar map, so it is difficult to interpret it. When using, try to add instructions to reduce the burden of interpretation.

Seven, summary