The data type is a means of classifying the type of value that a variable possesses. The data type will also determine what logical, mathematical or relational operations and actions can be performed on it. The data type acts as an attribute that instructs a computer how to interpret it.
Data types are fundamental in most, if not all programming languages. Data types are assigned to values to ensure the proper error-free function of that data.
One of the most well-known and intuitive data types is the integer, which is a whole number, with or without a minus, e.g. -12, 12 or 122.
In a practical sense, a solid understanding of data types is highly useful when developing a tracking plan in a real-life data scenario, e.g. tracking customer data. Each property will need an assigned data type.
Table of Contents
Common Data Types
Here is a brief overview of the main fundamental data types.
These do vary between programming languages, for example, Java has both primitive and non-primitive data types whereas Python has subdivided data types for both numeric and sequence tyes.
Whilst differences exist between data types in various programming languages, they are inherently the same. Some common substitutions between programming languages include Arrays in JavaScript vs Lists in Python and Hashmaps in Java vs Dictionaries in Python vs Objects in JavaScript.
Integers (int)
Integers are a familiar form of numerical data, but crucially, they do not have fractional components. They can however be positive or negative.
Examples: 911, 0, -192, 4981, etc.
Floating Points (float)
Another numeric data point, but this time for numbers that do have a fractional component. Therefore, a decimal point is required. One example of a floating point value is a monetary value.
Examples: 1.11, 0.9, -0.25, 298.09
Numbers often use both int and float data types.
Character (char)
Characters are singular, e.g. a single letter, a symbol, digit, blank space or punctuation mark. This entire sentence is constructed from characters, but characters in sequence form a string.
Examples: *, %, @, £ , 8, B, \
String (str or text)
Strings are sequences of characters, i.e. text. Strings can include any combination of digits, symbols, spaces, punctuation marks and so on.
Some values can be stored as both strings, e.g. a number ‘+39 090110 011’ and as an integer, or floating-point, e.g. ‘39090110011’.
Example: Strings are sequences of characters.
Boolean (bool)
The Boolean data type is a logical data type that represents values ‘true’ or ‘false’ only. These may be indicated as 0 (false) and 1 (true).
Example: Does the customer order pizzas? > true/1, does the customer have pineapple on their pizza? > false/0.
Enumerated type (enum)
The enumerated data type contains sets of what are known as elements or enumerators. These predefined values are unique and are assigned to a variable. So, ‘impressionist’ and ‘cubist’ are the enumerators or elements, then an enumerated type variable may be either ‘impressionist’ or ‘cubist’, but not both. The Boolean data is an enumerated type (e.g. true or false, but not both).
Example: The days of the week, or compass directions.
Array
The array data type is a list. Arrays store elements in specific orders, usually of all the same type. Arrays may have different data structures, since they store multiple values or elements.
An array for our previous example of artistic genres might include the genres (elements), ‘cubist’, ‘impressionist’ and ‘renaissance’, as well as the indices of each value, so 0, 1, 2, and so on. This array contains 3 elements.
If you had to pick your favourite artistic genre here, but picked two, or even all three, then all 3 will be saved in an array.
Date
The ISO 8601 syntax will store dates in the YYYY-MM-DD format.
Time
The format for time is typically HH:MM:SS. Time can exceed a 24-hr period, e.g. 90:00:00 is 90 hours.
Times and dates can be stored as a combined value.
Timestamp
Timestamps take many formats, one of the key ones being Unix time which extends from the ;
‘Unix epoch’, which happens to be 1st January 1970. A Unix timestamp measures the seconds elapsed since that date. This ensures that the timestamp is not affected by time zones and will always be the same regardless of where you are in the world. You can find the Unix timestamp for your date of birth, or any other date, here.
We use many of these different data types in our daily lives, often without realising. For example, this post is one large string of characters, with some integers used along the way. Online forms might ask us to provide Boolean data, e.g. true or false answers. We also regularly type in dates, e.g. our date of birth.
Dictionary Data Type (Python)
In Python, the dictionary data type stores unordered values like a map for fast look-up and is similar to an unordered list. Unlike other single value element data types, the dictionary data type holds key:value pairs.
Want to improve your data skills?
See the best data engineering & data science books
Advanced Data Types
The following are two advanced data types:
- Schema models: A collection of database objects.
- Object relational managers (ORM): Code that automates the transfer of data stored in relational database tables into objects. These can help map code objects to your database.
Schema Models
Schema models can be used in conjunction with Pydantic, a Python library for data parsing and validation.
Pydantic models allow you to check your data types against a model that you defined previously. This enables you to catch errors during data parsing and validation whilst providing immediate feedback. You can merge multiple data models into one model.
You can also inherit from other models (e.g. User vs Customer). The customer model has all of the same properties that the user model, but some extra fields. This saves time whilst helping ensure proper schema data validation.
Object Relational Mappers (ORM)
ORMs assist in converting data between incompatible type systems using object-orientated programming languages and are essentially software or mapping layers that help map code objects to a database. for example, using ORMS, you can make a model in Python and translate your model code into the relevant SQL code.
By using an ORM rather than writing raw SQL code, you can input Python models using SQLAlchemy and let SQLAlchemy translate your model operations into SQL code.
Moreover, ORMs assist in the process of serialisation/de-serialisation and help determine what the underlying data should look like at both ends. An example here would be deserialising a JSON document into a Python object, decoding JSON formatted data into Python-native data type in the process.
In Python, deserialisation will decode JSON data into a dictionary, which is a Python-specific data type (described above).
The Importance of Data Types
Data types are fundamental in most programming languages, though different programming languages differ in their handling of data types. Programming languages are often defined as ‘strongly typed’ or ‘weakly typed’.
Strongly typed languages do not allow variables to be used in such a way that does not follow their data types. This reduces the chance of errors, as data types are explicit and not implicit. Python is a strongly typed language.
In contrast, weakly typed languages do allow implicit conversions between unrelated data types. Data types do not need to be explicitly specified in the code. JavaScript is a weakly typed language.
In every programming language, all values of any variable still have a static type, regardless of whether they’re strongly or weakly typed. Many consider the concept of ‘strong’ and ‘weak’ typing to be somewhat moot, or a fallacy.
Moreover, depending upon the application structure and IT system that you’re building, choosing and refining an ongoing data structure allows you to better encapsulate the problem space that you’re looking to solve.
Despite differences between programming languages, having a solid fundamental understanding of data types helps you to know what is possible in any given programming language.
Data Instrumentation
When you instrument data, which means the process of tracking data and then routing it to any number of tools, it’s vital to create a tracking plan that is complete with events and properties. When it comes to commercial or business data tracking, customer data is the main medium you’ll need to track and instrument. Event and entity data are the two main types of customer data – you can learn about them here.
When deciding which events to track, it’s vital to define the data type of each property. This approach will ensure errors are limited, or eradicated. This also assists in the clean-up of dirty data, e.g. dates may be in many different formats and should be unified to one single format and specified as a date data type.
Being conscious of data types is a good habit in all manner of data tasks and processes. For example, if you are conducting customer surveys, you’ll be working with many of the different aforementioned data types.
- Answers to open-ended questions are strings. Strings are hard to analyse automatically without parsing responses in some way and aggregate them against preset rules (e.g. mentioning specific keywords, like ‘bad’).
- Predefined choices (e.g. dropdowns or checkboxes) are enumerated data (enum), checkboxes are also arrays.
- Radio buttons and checkboxes can also be Boolean.
Summary: What are Data Types and Why Are They Important
Data types are fundamental and in our low-code and no-code world, their importance outside of coding and programming have diminished somewhat.
Nevertheless, creating a robust data plan, or tracking plan, for customer data does require a rudimentary knowledge of how data works. Data types are ubiquitous across all programming languages, though the ways in which programming languages handle them do differ.