What is Customer Data?

James Phoenix

Customer data is composed primarily of data relating to customers, their behaviour and their relationship with businesses both on and offline. Naturally, as customer data is human data, it is intrinsically complex and sensitive.

Lately, the customer data arms race has been somewhat stunted by a new wave of privacy laws and the embrace of pro-privacy and pro-transparency data policy.

The topic of data privacy has intensified as modern big data initiatives seek to collect extremely complex and at times highly personal data on their customers.

There has been a considerable response to this, marked by the meteoric rise of Signal, a privacy-focused instant messaging app, increases in the usage of ad blockers and also the rise in the usage of Tor, a privacy-centric “onion router”.

Signal released a marketing campaign (below; later blocked by Facebook) that exposed what types of data big business collects and uses on a daily basis. This is indicative of how granular entity data has become in marketing and advertising – more on what this means shortly.

The EU’s GDPR legislation and California’s CCPA has changed the statutory environment when it comes to customer data. Namely, there must be a greater focus on explicit permission and transparency – people must be told what data is being collected, why, how it’ll be used, and how they are entitled to access their data or request deletion.

With such complex legislation came a swathe of loopholes that enabled those with the most data clout – such as FAGMA – to possibly elude their statutory responsibilities when it comes to customer data.

This is the backdrop under which we must discuss customer data – it is no longer a free-for-all commodity that can be used at the free will of businesses.

With that said, data remains the centre of modern business strategy and is the lifeblood of successful and thriving corporations.

Customer data is essential to the omnichannel retail paradigm of late – the machinery that pumps customer data through business both great and small is not stopping anytime soon.

Here, we’ll discuss the ins and outs of customer data.

Customer Data: The Basics

Customer data is complex as it mirrors the diversity of humans and their behaviours.

In broad terms, there are two main types of customer data:

User data; or entity data, which describes the main user traits and context
Event/Behavioural/Interaction data; perhaps most accurately described as event data, which describes how a user interactions with a business and its various channels, whether that be a product, service, or something else

Besides these two main categories, customer data may also be third-party customer data. This concerns customer interaction outside of the business or brand’s channels, e.g. their interactions with external media and communication channels or possibly even offline data.

User/Entity Data

User or entity data, in this context, revolves around what is known as personally identifiable information (PII).

Entity data is a broad term, but in this case, the entity is the user. Customer entity data includes names, phone numbers, email addresses and domestic addresses, age, nationality, gender and preferences as indicated via a business’s products or services.

User data can provide a complete identity of somebody, or there may be enough pieces of data that can be linked together. Data such as job details, profession, location, gender, etc, can be used to resolve the identity of an individual. This is called identity resolution and can be either probabilistic or deterministic.

Traditionally non-personally identifiable information (NPII), such as IP addresses, cookies and device data/session IDs/browser fingerprints is probably no longer non-identifiable these days, with digital fingerprinting, in particular, becoming so advanced. This data is used in probabilistic identity resolution too, especially where identifying someone is a matter of security or compliance.

Entity data is displayed quite naturally in tables. The columns of the table represent each property, such as the name, age, etc. These are assigned values unique for each user.

Accounts as Entities

Entities can also refer to accounts as well as users. For example, data that describes an account or group of accounts, e.g. by subscription type, will have to be defined as such.

In some B2B businesses, users can be grouped by organisation, for example. Each organisation that is a customer of a B2B business is its own entity, as well as the users inside of that entity.

Whilst the same subscription service may contain a long table of users, it’s often sensible to group these up into their own sensical groups.

How to Gather Entity Data

In this context, the chief entities under discussion are customers, which may either be logged in some POS device or relational database (in the case of brick-and-mortar stores at least), CRMs or CDPs.

Entity data is collected both directly and indirectly. Indirect data sharing is probably the chief issue of pro-privacy groups, as permission is naturally a grey area when one is not overtly aware of the data they are sharing.

For example, direct user data is gathered when someone enters their data in a subscription form or survey. In this scenario, it’s pretty easy for a business to supply a confirmation form that describes the permission granted by submitting this form.

Contrastingly, indirect data is gathered when a user interacts with a product, thus creating a richer customer profile that they may not know they are creating.

Event Data

Event data, as the name suggests, refers to interactions performed by users. These usually happen during interaction with a product or service.

Some typical examples are app usage – recording how a user interacts with an app. Taps, swipes and clicks are all types of events and can be recorded from apps, websites or other digital products.

Event data is frequently used to measure and analyse various engagements and interactions. One example would be analysing app usage information to optimise and tune UI and UX design – that’s just one example. Here are some more examples of how event data is used:

Website, Software App Interactions: Everything from web visits and hits to page stickiness, user flow, traffic sources, bounce rates, swipes, clicks, scrolling, etc, can be tracked as event data.

Social Media: Everything from views to post shares, replies, etc, can be recorded and tracked as event data.

Advertising: CTRs, CPCs, conversion rates, etc, can all be recorded and analysed as event data.

Customer Service Information: Queries, complaints, customer service tickets, etc, can all be recorded and tracked as event data.

Event data is often called behavioural data. Event data helps animate customer behaviour and is used to explain actions beyond static entities. Event data enables behaviour-based functionality, e.g. automated messages or emails that trigger upon receipt of certain behaviour.

Event data contains 3 key elements:

The action itself, or the event that takes place
The timestamp for the event
Event properties; the state of any other properties associated with the event

A simple example would comprise 3 actions in sequence – Add to Cart, Buy and Payment Completed.

When the event Add to Cart is triggered and recorded, the event properties would also record data such as the customer’s ID.

Each event is issued a time-stamp, which enables businesses to track, say, how long it takes for a customer to progress through the checkout. This data could be used to trigger a ‘cart reminder’ message, or ‘you need to complete your payment’ reminder.

Want to improve your data skills?

See the best data engineering & data science books

Shop Now

How to Gather Event Data

Event data is temporal and time-sensitive and thus is trickier to gather and manage than more static entity data. Gathering event data depends on what events a business wishes to track. Most modern data stacks contain features for event-streaming, such as:

Customer Data Platforms, like Segment and Snowflake
Open-source data platforms like Snowplow
Custom-built options designed for specific business needs

Event streaming tech like Apache Kafka and Google Cloud’s Pub/Sub are also options. These are enterprise-level tools for managing big data streams.

Otherwise, there are a few other ways to collect event data:

Google Analytics, Mixpanel and Other Platforms: Google Analytics is the old timer of modern business abd web analytics and is perhaps the most popular tracking platform around. In GA, it’s possible to setup all manner if events to track actions ranging from the standard link clicks to downloads, video watches, form submission, etc. It’s also possible to measure behaviours such as scroll depth. UTM parameters logged in GA or other platforms enable campaign analysis, allowing businesses to measure events from upstream in the funnel.
Webhooks: Webhooks are useful for tracking asyncronous events using HTTPS to send JSON payloads. Stripe provides a webhook that sends JSON customer data when a customer pays for an item or invoice. This allows businesses to update their persona accounts and records when someone uses Stripe to pay.
IoT sensors: IoT sensors are a rich resource of chronological events and determine the state of both users and devices. For example, fitness wearables can send event data about work outs, thus providing insight into workout type and sport (e.g. running or cycling) as well as their fitness level. This data can be used for the purposes of personalised marketing.
APIs: One example is dataforseo.com, an API stack that provides SERPs and other SEO data.

Data Tracking: What to Track vs How to Track

What to track vs how to track is a classic data science vs data engineering conundrum. One team or individual might have a bright idea “why don’t we collect and track event data that helps us tackle cart abandonment?” Agreeing on this might be easy – the business loses a lot of revenue by cart abandonment so strategies to tackle this are welcomed.

But then poses the question “how do we collect and track event data that helps us tackle cart abandonment?”

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

In this situation, the various data practitioners are required to put their heads together. Data engineers are chiefly charged with the responsibility of enabling the relevant data to be collected and stored ready for analysis and implementation. But, collaborations will need to work out precisely what data to collect first. Events should be clearly defined before engineering takes place.

Collecting and Implementing Customer Data 101

Here is a practical guide to collecting, analysing and using customer data.

Consult on the following five points before beginning to draft up a customer data strategy:

What purpose is served by the collection of event data? What are the pain points customer data can rectify?
What are the roles of entities and how do they intersect with events?
How does entity and event data relate to overarching customer data or business trends?
Precisely what events need to be tracked?
How can all of this be engineered securely and in accordance with data and privacy laws?

7 Methods to Collect Customer Data

Here are 7 ways to engineer customer data strategies:

1: Website Analytics

Website analytics are age-old and very simple to get to grips with. Google Analytics and even Search Console are the first tools to get to use for customer data collection purposes. Google Analytics contains many analytical tools designed for analysing on-site and customer data. Check out this post on behavioural data for more information on what can and can’t be done using Google Analytics.

Aside from Google Analytics, Piwik, Matomo and Mixpanel provide website analytics tools and services that are somewhat similar to those offered by Google Analytics.

These tools combine with visual tools such as Optimizely, VWO and Hotjar to enable businesses to visualise sales funnels and analyse website heatmaps and session behaviour.

Most of this data is indirect event data.

2: Social Media

Social media analysis can assess likes, shares, impressions, comments and more. Each social media platform has native insights/analytics suites and these provide an excellent starting point. Can you identify customers from this data? By using online reputation management and sentiment analysis tools you can indeed identify who your customers are and home in on what they’re saying about your brand/products.

It’s also possible to use a customer data platform (CDP) to segment and analyse groups of customers for the sake of social media advertising. These audiences can then be piped to social media ad suites for optimised ad targeting.

3: Tracking Pixels

Tracking Pixels are HTML or JavaScript codes inserted onto websites or even emails. These enable businesses to track the IP addresses, browser fingers, etc, of users interacting with their services. Tracking pixels enable omnichannel marketing and remarketing, helping connect the dots between multiple business touchpoints. This allows marketers to visualise the sales funnel and target ads or promos at the optimum time for increasing conversion rates.

4: Contact Information and Forms

Contact information and forms are a go-to method of collecting entity data from customers. Subscription boxes are a basic example – they allow businesses to collect email addresses and names for newsletter and promotional purposes. When a customer places an order or subscribes to a free or paid product or service, their data can be made instantly available in a database or customer data platform.

5: Surveys and Feedback

Customer service surveys and feedback are an excellent opportunity to collect qualitative data. If done correctly, customer feedback forms and customer service can collect attitudinal data and sentiments. This data can be used for marketing purposes, and can help businesses understand their customers’ thoughts and frustrations.

6: Customer Service Data

Closely related to the above, customer data can also be extracted from customer service software. Customer service data ranges from telephone customer service recordings to chatbot transcripts, emails and even social media conversations. This is useful in a similar way to customer surveys and feedback.

7: Transactional Information

Similarly to when a customer fills in a subscription form, transactional information is collected when a customer makes a transaction to or via your business. Transactional data may be on or offline, online in the case of an eCommerce store and offline in the case of a brick-and-mortar store using a traditional point of sale (POS) device.

Cleaning and Validating Customer Data

Cleaning and validating data is a huge topic in itself. Today, the process of validating customer data and creating a ‘single source of truth’ for all customers across all touchpoints has become simpler, in part thanks to customer data platforms. Customer data platforms (CDPs) are a powerful intermediary for customer data that allows businesses to induct data from multiple sources or channels before cleaning and validating in the CDP and piping it to other applications for analysis and implementation.

CDPs also have powerful segmentation tools and some analytical tools of their own. The two biggest players in CDPs are mParticle and Segment.

Analysing Quantitative Customer Data: 6 Broad Techniques

Here are some basic quantitative customer data analysis techniques that use data mining to extract insights and analyses from customer data.

1: Classification

Classification is the simple classification, or categorisation, of customer data. One example would be classifying customers by their average monthly spend, or the categories they’re most interested in. Some customers might be classified as outdoorsy types – they can be forwarded promos and content based on their interests.

Others might be interested in interior decor, etc, etc. Of course, there is likely much crossover between categories and major eCommerce sites like Amazon will probably classify customers in many thousands of ways. Classification is also important when building recommendation engines.

2: Association Rules

Association rules look a bit like; if this, then that, then this, etc. Mapping these sorts of rules is also very important when building recommendation engines. This type of data analysis uncovers connections between different products in seemingly disparate categories. For example, a torch or flashlight may be associated with a camping stove and energy bar that people buy together when they go camping.

3: Outlier Detection

Outlier detection seeks to identify anomalies or unexpected patterns that form in the data. These are used to signal impending changes in trends, such as a sharp increase or reduction in sales of a certain product. Outliers can be used for root cause analysis, which enables businesses to locate where something went right or wrong.

4: Clustering

Clustering groups data into homogeneous categories. Many entities or events can be used to cluster data. It’s more of a way to explore groups of data for further analysis.

5: Regression

Regression can be used to study the relationships between different customer data points. It’s a great technique for analysing how the presence/absence of certain data affects the characteristics of other data in a set.

6: Prediction

Predictive analytics is a mammoth topic. Deep historical customer data can be used to help predict future trends in customer behaviour. This informs both sales, marketing and business intelligence, e.g. enabling businesses to stock up on certain high-demand items prior to a surge in demand.

Analysing Qualitative Customer Data: 3 Broad Techniques

Qualitative data, e.g. written text collected via customer service, feedback and surveys or social media tends to be qualitative. Heatmaps are also qualitative.

Qualitative customer data is likely less numerous but more detailed than quantitative customer data.

1: Content Analysis

Content can be analysed for repeating themes, phrases, keywords, etc. Teams can manually search by phrase or keyword, collating their findings and analysing them for links and meaning. Content can reveal attitudes or perceptions of businesses and products, or even of competitors or wider consumer trends.

2: Narrative or Thematic Analysis

Narrative or thematic analysis seeks to analyse the repeating themes that crop up in qualitative data. This is an excellent competitor analysis tool, as businesses can analyse the problems customers have with their competitor’s products (or with industry as a whole), so they can develop new products that address these pain points.

3: Sentiment Analysis

Sentiment analysis is clearly matched with the above, but uses natural language processing AI to go deeper into the emotions displayed in qualitative data. Sentiment analysis tools uncover themes and narratives with minimal manual analysis.

Summary: What is Customer Data?

Customer data is a complex topic with considerable depth. Why? Because the events and entities under study are humans, or of human origin, which makes them complex by name and nature.

Customer data is probably not the free-for-all that it once was, but privacy legislation hardly restricts business access to data, it just makes consent and transparency more important.

As customer data has aligned itself towards the centre of modern business data strategy and business intelligence, an array of new platforms have erupted onto the market. Namely, customer data platforms are laser-targeted at managing and performing many of these processes and ideas described in this article.

FAQ

What is Customer Data?

Customer data is primarily composed of entity and event data that relates to the properties, characteristics, features, behaviours and actions of customers. Customers share data both directly (e.g. through surveys and forms) and indirectly (e.g. through clicks, scrolls and browsing behaviours). This data can be collected and analysed for sales, marketing, business intelligence, and much more. Customer data has been safeguarded somewhat by GDPR and other privacy laws, but this hasn’t really changed the capabilities of businesses when it comes to harnessing customer data.

How do you Analyse Customer Data?

Customer data should be cleaned and segmented prior to analysis (unless the intention is to analyse the entire set). It can then be analysed with a variety of traditional analysis techniques ranging from classification and clustering to regression, outlier analysis, association analysis and predictive analysis.

Why is Customer Data Important?

Whilst we might feel in-tune with our customers or audiences, this is pretty much impossible at scale, and it’s wrong to assume anyway. Customer data provides an empirically robust, measurable and analysable view of the customer that can scale up and down. Customer data can be very broad, e.g. classifying broad groups of people depending on shared characteristics, or very granular, e.g. measuring precisely when a customer interacts with a sales channel and what they tend to do prior to purchasing a product.

What is Customer Data?

Table of Contents

Customer Data: The Basics

User/Entity Data

Accounts as Entities

How to Gather Entity Data

Event Data

How to Gather Event Data

Data Tracking: What to Track vs How to Track

Read The Meta-Engineer

Collecting and Implementing Customer Data 101

7 Methods to Collect Customer Data

1: Website Analytics

2: Social Media

3: Tracking Pixels

4: Contact Information and Forms

5: Surveys and Feedback

6: Customer Service Data

7: Transactional Information

Cleaning and Validating Customer Data

Analysing Quantitative Customer Data: 6 Broad Techniques

1: Classification

2: Association Rules

3: Outlier Detection

4: Clustering

5: Regression

6: Prediction

Analysing Qualitative Customer Data: 3 Broad Techniques

1: Content Analysis

2: Narrative or Thematic Analysis

3: Sentiment Analysis

Summary: What is Customer Data?

FAQ

Become a better AI engineer

More Insights

Recursive Self-Improvement Loop for Agent Tooling

Ask the LLM Where Your Plan Contradicts Itself