As a database consulting company, we are often faced with analytics and reporting related tasks which seem to be easy on the surface but are in reality not that trivial. The number of those seemingly simple things is longer than one might think, especially in the area of reporting
Table of Contents
The purpose of this post is not to share some fancy SQL techniques, some super cool ma…
As a database consulting company, we are often faced with analytics and reporting related tasks which seem to be easy on the surface but are in reality not that trivial. The number of those seemingly simple things is longer than one might think, especially in the area of reporting
Table of Contents
The purpose of this post is not to share some fancy SQL techniques, some super cool magic AI tool that does whatever new hot thing or some crazy coding technique one can use in PostgreSQL. This is all about two simple things that are often forgotten or just neglected. We are of course talking about “awareness” and “clarity”.
Yes, it is that simple. If you don’t know what you have or if you don’t know what you want, all data is worthless (in a best case scenario) or even dangerous.
Simple Things: Counting Customers
One of those seemingly simple things might be related to a trivial question: “How many customers do you have?”. Every CEO will proudly talk about this number but have you ever asked those guys a simple question: “What is a customer?”.
Well, just try. In my experience the process often starts with an arrogant smile that indicates “What stupid question is that?”. But just dig deeper and keep asking for details. I can assure you: In the majority of cases, people have no clue what they are talking about.
Here are some typical scenarios:
- If a person has not ordered for a long time, when do you stop counting him as a customer?
 - If two customers merge into a single company and rebrand: How many customers do you have?
 - If a customer has ordered an item and sent it back immediately without paying for it. Is it a customer?
 
One could continue this list forever and a day. In other words: It is not that simple and many “business people” tend to think that displaying things in a pie chart has the power to change the fundamental problem: “What do you actually count?”
Example: Counting Customers in Banking
The following data structure shows a simple scenario: A bank wants to store people and bank accounts. In addition to that we want to know who is allowed to use a bank account? In real life, this has a real use case: A married couple might share an account for family expenses or a company might allow various book keepers to access data. There are plenty of scenarios which might come to mind when thinking about those scenarios.
But, let us take a look at the data structure and the sample data:
PgSQL
| 12345678910111213141516171819202122232425262728293031323334353637383940414243 | CREATE TABLE t_entity( entity_id int PRIMARY KEY, name text, is_company boolean);INSERT INTO t_entity VALUES (1, ‘Joe’, ‘false’), (2, ‘Jane’, ‘false’), (3, ‘Jack’, ‘false’), (4, ‘Janette’, ‘false’), (5, ‘Company Ltd’, ‘true’), (6, ‘Enterprise Ltd’, ‘true’), (7, ‘FakerFake Inc’, ‘true’);CREATE TABLE t_account( account_id text PRIMARY KEY, primary_owner int REFERENCES t_entity(entity_id), balance numeric(16, 2) DEFAULT 0);INSERT INTO t_account VALUES (‘ACC001’, 2, 1000), (‘ACC002’, 3, 2954), (‘COMP003’, 5, 2789), (‘COMP004’, 6, 9863);CREATE TABLE t_allowed_use( account_id text, entity_id int, UNIQUE (account_id, entity_id));INSERT INTO t_allowed_use VALUES (‘COMP003’, 1), (‘COMP003’, 2), (‘COMP003’, 3); | 
The sample data is pretty simple and shows what happens in real life.
Approach 1: Counting accounts
The most trivial approach is: Just count those accounts …
PgSQL
| 12345678910 | test=# SELECT * FROM t_entity AS e, t_account AS a WHERE a.primary_owner = e.entity_id; entity_id | name | is_company | account_id | primary_owner | balance ———–+––––––––+————+————+—————+——— 2 | Jane | f | ACC001 | 2 | 1000.00 3 | Jack | f | ACC002 | 3 | 2954.00 5 | Company Ltd | t | COMP003 | 5 | 2789.00 6 | Enterprise Ltd | t | COMP004 | 6 | 9863.00(4 rows) | 
There are a couple of issues with this approach:
- What happens when a guy has two accounts?
 - What happens when the primary owner is already dead?
 
The second case is especially interesting. There is some legislation out there (at least in Europe) that says something along the lines of “if nobody shows up for 30 years or so the cash belongs to the bank”. What this trivial thing already shows is that our simple SQL statement might not be enough to clearly define this.
What also matters here is that some folks might have two accounts and definitely don’t want to double count. The following query might therefore be more appropriate:
PgSQL
| 1234567 | test=# SELECT count(DISTINCT primary_owner) FROM t_entity AS e, t_account AS a WHERE a.primary_owner = e.entity_id; count —–– 4(1 row) | 
In this case, a normal and a distinct count will yield the same result. But hey, code properly from the start.
How do you handle shared accounts?
Life can be even harder: Consider the following statement.
PgSQL
| 12345678910 | test=# SELECT a.*, u.*FROM t_entity AS e, t_account AS a, t_allowed_use AS u WHERE a.primary_owner = e.entity_id AND u.account_id = a.account_id; account_id | primary_owner | balance | account_id | entity_id ————+—————+———+————+———– COMP003 | 5 | 2789.00 | COMP003 | 1 COMP003 | 5 | 2789.00 | COMP003 | 2 COMP003 | 5 | 2789.00 | COMP003 | 3(3 rows) | 
The primary owner of the COMP003 account is a company and 3 people are allowed to sign. Most likely we want this to be counted as one customer. But what if this account belongs to a couple (equal rights)? Well, life is tricky. Or what if this couple has a shared account (privately) and is running 5 companies they own? Now how many customers is that?
As you can see those questions keep piling up.
My pledge: Think about what you want to know
What I am trying to express here is fairly simple: This is not about fancy SQL, modern AI. Those things are all about precision and awareness. Most people produce numbers but have no clue what they really mean. Obviously this leads to a lot of sugar coating and also leads to wrong business decisions just because management assumes they know what they are dealing with.
SQL and especially PostgreSQL can answer all business questions you might want to have answered. However, this also means that you have to ask the right questions and tell the system what you really want to know. There is no such thing as “SELECT what_I_want_to_know FROM go_and_figure_out”.
Many people tried to use AI and large language models to do that. The belief seems to be that the model somehow knows. To prove my point: Here is such an interaction with a language model:
>>> how many people live? (data only, short answer, just one number)
I’m unable to verify the global population.
>>> how many people are alive ? (data only, short answer, just one number)
7,924,110,000 (approximate as of mid-2023)
>>> how many people are known to be alive ? (data only, short answer, just one number)
I’m unable to verify an exact number.
>>> how many people live? make a guess ? (data only, short answer, just one number)
8 billion (estimated as of 2023)
Well, we are anywhere between “we don’t know” and 8 billion - it is just bad luck if you happen to be one of those 76 million people who got lost on the way.
Finally
PostgreSQL is a sensational database and it will give you what you are looking for. The point is just: Consider thinking about what you really want to know.