Data Types for Stats (and MATLAB categorical arrays)
In the stats world, there are two types of data: Quantitative and Qualitative
- Quantitative. Also known as Numeric or "Number", these data are used to count or measure something. Their values are stored in the numeric classes of MATLAB (double, uint8).
- Qualitative. Also known as Categorical or "Word", these data are used to describe something using a label like Male or Female, or Good or Bad. These data take the form of booleans (true or false) or character or word data and are stored in MATLAB classes like logical, string, or categorical arrays (see below).
Each category can be further broken down into sub-categories. Numeric categories can be continuous (like a measurement) or discrete (like a rating). Categorical categories can be unordered (like Sex) or ordered (like ranking: Beginner, Intermediate, or Advanced).
Different MATLAB variable classes are used to store these different types of data:
graph TD
A[DATA] --> B[Numbers];
A[DATA] --> C[Categories];
B --> D[Discrete];
B --> E[Continuous];
C --> F[Nominal];
C --> G[Ordinal];
E --> H(Decimals)
E --> I(pi, height, weight)
H --> R(double, single)
D --> L(Whole Numbers)
D --> k("Counts, Ratings,
Images")
L --> Q(uint8, uint16)
F --> M(Male, Female)
F --> N(No Order)
G --> O(Ordered)
G --> P("Beginner,
Intermediate,
Advanced")
M --> S("logical,
string, cell,
categorical")
O --> T(categorical)
Categorical Arrays
To handle Qualitative Data, MATLAB created the categorical
variable type. Categorical arrays operate similar to string arrays, but they have built-in functions for statistical uses.
Consider the following string array.
Create String array | |
---|---|
We can easily convert this string array into a categorical array using the function categorical
…Here we just overwrite the string array with its categorical version
And that's it. sex
is now a categorical array. You can do a lot of the same things with a categorical array that you can do with a string array
Create logical array from a categorical array | |
---|---|
…Here we create a logical array from the relation operation "sex is equal to Male"
Categorical also has a lot of built-in functions, designed to make data analysis easier. The function categories
returns the categories (or group names) in a categorical array
Get Categories | |
---|---|
…There are two categories: Female and Male
And you can use the function summary to report the count of each category in the array
Creating Ordinal Data
If you have ordinal data, you still use the categorical
function, but with a couple of additional inputs.
Create Ordinal Categorical Array | |
---|---|
…Here, the second input into categorical
is the category names in the order that you want. The third input sets Ordinal to true.
And we get an ordinal categorical array that looks very similar to just a categorical array.
The main differences is that when you call a function like summary
…
…the results are reported in the order of the ordinal categories (and not in alphabetical order)
Transforming Numeric Data into Qualitative Data
Sometimes the raw data comes in as numeric, when what you actually want is categorical.
Consider the following numeric array
These numbers actually represent three different categories
- Terrible
- Meh
- Awesome
To replace the numbers with the category labels, you enter the following into categorical
:
…notice here that the second input is the rating categories as numbers, while the third input is the ratings categories as labels. The fourth input turns Ordinal on.
And you get…
ratings =
1×10 categorical array
Columns 1 through 5
Awesome Awesome Meh Terrible Awesome
Columns 6 through 10
Meh Terrible Terrible Terrible Meh
…a categorical array with all the correct categories included.
And these categories show up in the summary: