Time, quantity, quality, nominal, ordinal… data types used in Industry
Figure 1 Planetary Movements, Depicted as Cyclic Lines on a Spatial-Temporal Grid, by an Unknown Astronomer in a Transcription of Commentary of Macrobius on Cicero's In Somnium Scipionis, 10th or 11th Century A.D. Reprinted in [95].
Brief start about Data
Effective conviction and persuasion are a result of speaking with clear and accurate data. Data is the basis of reasoning and a source of factual information.
Without data, decisions are blind and dangerous. If we can't decide we cannot move. This would be disappointing for Leonardo Da Vinci who once said “ Life is movement “.
Figure 2 Trendline showing interest over time for the word “data”
Source: trends.google
Figure 3 Map showing interest by region in 2022 for the word “data
Source: trends.google.
As we can see in Figure 2 and 3, data is worldwide, remains popular and we might conjecture based on the trend line that it shall stay so for a while.
It also means that there is a massive amount of articles, citations, and definitions out there on the web. Thus, we need to clearly define our notions.
In this article, we are going to explore the history of data, set up the notions, list data types, and clarify the subjectivity of data types.
Rollback in History
According to Merrieb-Webster’s dictionary, the first known use of “datum” was in 1646 meaning “something given or admitted especially as a basis for reasoning or inference”.
Online etymology dictionary informs us that from 1897 its meaning evolved as numerical facts collected for future reference.
Just a side question: Have we heard about Ishango bones?
Huylebrouck in “Africa and mathematics” mentions that [1] the oldest mathematical finding is the Ishango rod dating to 20.000 years before present.
Figure 3. The Ishango bone on exhibition at
the Royal Belgian Institute of Natural Sciences
Understanding Data
First things first, let us set our common conversation ground. It is important to keep in mind that this topic is significantly popular, thus we might hear or read different words having almost the same definition, or synonyms.
One piece of advice would be to check when discussing with your client or teammate if you have a common language for mutual understanding and a solid base when developing your project.
Just to illustrate, let us look at what data means.
Notions
(1) Data: is a collection of values that people can only observe when they look at it.
Example :
“Hello”, 2, 3.5 , null, “3.2 pH”
(2) Meaningful data: A collection of values belonging to a set of contexts and which “talks” to the audience when presented. Questions, curiosity, pattern finding thoughts pop up in our heads when we encounter meaningful data contrary to just data.
Example:
[context][collection of values ] → [Worktime] [10a.m , 11a.m, …]
[context][collection of values ] → [Machine state] [“working” , “stopped”,”paused,”...]
[context][collection of values ] → [Start time] [“12a.m” , “15p.m”,”19p.m,”...]
[context][collection of values ] → [End time] [“15a.m” , “17a.m”,”14a.m”...]
Extra question: what would be an appropriate chart to visualize the given example ?
Data source table : is a table composed of rows and columns containing data and context in which meaningful data is obtained by the product of the latters. Our data source table may be generated by different Connections or “sources” such as Json, databases which receive data from actors we define in (7).
Example:
Figure 4. A table containing a collection of data associated with context.
Figure 4., demonstrates to us 4 collections of data 1,2,3, and 4 associated respectively with “TagName”, “Building”, “Product”, and “UnitPrice” contexts.
We can intuitively assume that this table represents information gathered from a production factory when both context and data are correctly provided.
Row : is a horizontal line composed of a collection of values associated with a context given in each column forming the row. See figure 4.
Column: is a vertical line composed of a collection of values associated with the only given context representing the column mostly indicated in the header section. See figure 4.
Data field: is a cell in the table containing a data value. See figure 4.
Actor: is a data provider. In industry, our actors are usually sensors and software.
Element: is an entity and one of the fundamental building blocks of data visualization. We refer to it as “datum” as well. It actually is represented as a row in our table in figure 4. It is the transformation of the values given in elements that are used later for visual encoding.
Now that we have a common language, let’s discuss data types.
Data types
According to “Data Types, Graphical Marks, and Visual Encoding Channels” written by Jeffrey, [4] data values can represent different forms of measurement.
Our problem is to identify our forms, in other words, our data types that help us define our “comparison types”.
In this crucial section, we shall clarify data types, explain that data could belong to multiple data types, and provide a few examples of data types we use in industrial use cases.
Why types, why categories?
Actually, the oldest reference trying to explain the need of categorizing data draws back to 1946 and S.S. Stevens states in “On the Theory of Scales of Measurement” that [3] the British Association for the Advancement of Science debated the problem of measurement [...] and reported upon the possibility of “quantitative estimates of sensory events” meaning simply: Is it possible to measure human sensations.
In industry, we categorize data in different types to show off relations, comparison, deviation, proportion, and distribution when building widgets.
Identifying a data type
Nominal data
Nominal data is used to categorize data and compare the equality of values.
Figure 5. Example of categorical data
Product A, Product B, and Product C are values representing categorical data. We can only compare if Product X is equal to Product Y or different.
Ordinal data
Ordinal data contains values we can use to compare them in a specific ordering
Figure 6 Example of ordinal
data using number-based values
Figure 7 Example of ordinal
data using text-based values
In these examples, we can compare values such as “Is the year 2000 greater than the year 1999 ?” or “Is pressure high thus greater to a threshold value? “
Quantitative data
Quantitative data contains values in which we can perform differences between these such as finding distances or proportions
Figure 8 Example of quantitative data
We can see that Year is the same example used to illustrate ordinal data previously. We will clarify this point in the upcoming section “The problem of Measurement”
Meanwhile, in this example, we can ask questions such as “ How many years have passed between 2000 and 1997 or what is the proportion between 1999 and 1998”.
Temporal data
Temporal data is used to demonstrate intervals or punctual moments in time in which our data fields are "valid". In other words we use temporal data to demonstrate occuring events, actions, or
It is one of our main data types in Industries as time is crucial for operations in the factories.
Figure 9 Example of quantitative data
Date-times can be standardized or not. One might use ISO date-time format or just date strings such as "Saturday, January 22, 2022".
Yet is is recommended to have formatted date-times. Below we have a list of formatted date-times examples:
Year: 2021,2022...
Quarter: Quarter 1, Quarter 2, Quarter 3, Quarter 4
Quarter Year: Quarter 1 2022 , Quarter 2 2021, Quarter 3 2020, Quarter 4 2022
Month : April, May, Juin ...
Month Year: April 2022, May 2022, Juin 2021 ...
Week: 28,29 ( week of a year )
Day: 14,15 ... 31 ( day of a month )
Day of the week: Tuesday, Wednesday
Day of the year: 194,195...
Hour: 9, 16
Minute : 30,60
Second : 20, 45 ,60
Date : 07/14/2020, 07/13/2022 ..( DD/MM/YYYY )
Datetime: 07/14/2020 12:00:00 AM, 07/13/2022 14:30:00 PM ..( DD/MM/YYYY HH:MM:SS (AM/PM) )
Time: 12:00:00
The problem of measurement
According to S.S. Stevens and N.R Campbell in “On the Theory of Scales of Measurement” [3], measurement is, in the broadest sense, defined as the assignment of numerals to objects or events according to rules.”
So if we stick to this general definition, we need to be cautious when identifying our data types as they are open to interpretation and rules defined by individuals.
As mentioned earlier in “Identifying a data type”, we saw that Year can be an ordinal data type as well as a quantitative data type.
To determine its type, we would need to look at the rest of our data sources and the information we would like to pass to our audience.
Application
Let's illustrate our types with examples
Nominal data example:
Figure 10 Example of ordinal data
Figure 10 is a Sankey chart extracted from our Sankey article. Our ordinal data type in this example is energy sources such as " Natural gas, power supply, Heat...". A way to illustrate nominal data is to visually encode them using shapes such as rectangles in this particular example.
Ordinal data example:
Figure 11 Example of ordinal data
Figure 11 shows us a stacked bar chart example demonstrating the energy consumption of different products across years. Our Year context is an ordinal data type visually encoded applying position to give us a sense of rank/order. Indeed, we can see this order if we follow our X axis from right to left positioning years.
Quantitative data example:
Show off years in a line chart to compare hour differences between each year.
Figure 12 Example of quantitative data
Example in Figure 12 demonstrates a way to display quantitative data types.
Indeed we want to find out our maximum Energy consumption for three years. We use color value (degradation ) to visually sense that "darker" color is more attractive thus it should represent an important value than lighter color values. We also use size to illustrate a sense of "greater, lesser" quantity.
Temporal data example:
Figure 13 Example of temporal data
This example is extracted from one of our chart articles destined to present Gantt charts.
Our temporal data is Date time formatted as Day HH:MM.
We visualize it using Gantt charts which are powerful for presenting events, timelines, and schedules.
We can see in Figure 13, that size and position are used to illustrate time. The rectangle shapes determine the length of our intervals and the position determines when it occurred in our given time axis.
Conclusion
We can conclude that knowing our data type helps us identify which type of visual encoding and attribute we might choose when designing our dashboards and reports. Yet it is important to keep in mind that data types aren’t mutually exclusive and that data may belong to different types based on the given context and desired outcome.
Reference
[1] Huylebrouck, D. (2019). Africa and Mathematics: From Colonial Findings Back to the Ishango Rods.
[2] observablehq.com. (2019). Data Types, Graphical Marks, and Visual Encoding Channels. [online] Available at: https://observablehq.com/@uwdata/data-types-graphical-marks-and-visual-encoding-channels
[3] Stevens, S.S. (1946). On the Theory of Scales of Measurement. Science, [online] 103(2684), pp.677–680. Available at: http://psychology.okstate.edu/faculty/jgrice/psyc3214/Stevens_FourScales_1946.pdf [Accessed 14 Nov. 2019].
Comments