This is a typical case for Self-Service BI with Excel data.
A few days ago, a client asked me the following question:
I have an Excel sheet with numbers and text in one column. I want to import this sheet into Power BI and perform analysis on the numbers in that column.
How can I separate the numbers from the text in that column?
Remember that I need the text in that column as well.
I had never been in this situation, so I first started using the technique I knew.
I created a Dummy Excel with the same problem, which looks like this:
Figure 1 – Sample Data in Excel (Figure by the Author)
To create a PoC, I first loaded this data into an SQL Server database to see how to solve it …
This is a typical case for Self-Service BI with Excel data.
A few days ago, a client asked me the following question:
I have an Excel sheet with numbers and text in one column. I want to import this sheet into Power BI and perform analysis on the numbers in that column.
How can I separate the numbers from the text in that column?
Remember that I need the text in that column as well.
I had never been in this situation, so I first started using the technique I knew.
I created a Dummy Excel with the same problem, which looks like this:
Figure 1 – Sample Data in Excel (Figure by the Author)
To create a PoC, I first loaded this data into an SQL Server database to see how to solve it there.
Solving the problem by using SQL
T-SQL has two functions that are helpful in such scenarios:
-
This one tries to convert a value to a target data type. If it fails, it returns NULL.
-
Checks if a value is a numeric value. If yes, it returns 1. Otherwise, 0.
Based on this knowledge, I wrote a query to separate the values into two columns. One with the numbers and one with the text:
SELECT [Values]
,TRY_CONVERT(decimal(18, 5), [Values]) AS [Number]
,IIF(ISNUMERIC([Values]) = 0, [Values], NULL) AS [Text]
FROM [dbo].[MixedValues];
The result is the following table:
Figure 2 – T-SQL Query to separate the numbers from the text (Figure by the Author)
If you look carefully, you see that row 17 is recognized as a text.
This is because the number contains a blank.
I will come back to this later on.
Switching to Power Query – Trying IsNaN()
Now, I loaded the Excel into Power Query.
I defined the column as Text and started working on this challenge.
The first attempt uses the Number.IsNaN() function.
This function returns true if the value is NaN. “NaN” is a placeholder for not applicable, for example, because of a division by 0.
I tried this to determine whether a text is equivalent to NaN.
This is the M-Code for the calculated column:
if Number.IsNaN([Value]) = true
then [Value]
else null
The result surprised me:
Figure 3 – Result with Number.IsNaN(). Why doesn’t this function recognize numbers? (Figure by the Author)
Strangely, the result is that it cannot convert a number to a number.
I suppose this happens because the column’s data type is text.
Then, I tried converting the column to a number and applying the IsNaN() function to the result:
if Number.IsNaN(Number.From([Value])) = false
then Number.From([Value])
else null
Now, the numbers are converted to numbers, but the text values result in an error:
Figure 4 – Applying IsNaN() to the converted values returns the numbers, but an error for the text (Figure by the Author)
Now the logic works for numbers.
But the conversion fails for the rows containing text. This results in rows with errors.
Trying Value.Is() in Power Query
Let’s try another function: Value.Is()
This function checks whether a value is compatible with a data type.
This should be equivalent to the ISNUMERIC() function shown above:
if Value.Is([Value], Number.Type) = true
then Number.From([Value])
else null
Unfortunately, this function failed to return the expected result as well:
Figure 5 – Result when trying the Value.Is() function (Figure by the Author)
When I tried the same approach as above, by converting the value to a number first, I got the same result as before:
Figure 6 – Errors when trying to convert the values to a number first (Figure by the Author)
Therefore, I suspect that the function Value.Is() expects a number data type, but this makes no sense to me.
At this point, I didn’t have time for deeper research, as I was running short on time.
It was time to switch the approach.
Switching concept
Now I explored how to catch errors in Power Query.
My idea was: What if I could catch the conversion error and use this information?
I found this page with useful information: Errors – PowerQuery M | Microsoft Learn
From this, I deduced this expression:
try Number.From([Value]))
After adding a calculated column with this expression, I got this result:
Figure 7 – Result with try (Figure by the Author)
I was optimistic, as I didn’t get an error.
Next, was to expand the Records:
Figure 8 – Expand the Value from the Record output of the try call (Figure by the Author)
I didn’t need the Error columns—only the Value column.
This is the result after the expansion:
Figure 9 – Result after the expansion of the records (Figure by the Author)
Notice that I renamed the columns directly in the ExpandRecordColumn() function. Otherwise, I would have gotten a column named [Value.1].
This result was the first where I didn’t get any errors.
Now, I added a calculated column to check if the new column is empty. If yes, then the original Value column contained a text:
if [Numeric Value] = null then [Value] else null
Here, the result:
Figure 10 – Result with the successful separation of numerical and text values from one column (Figure by the Author)
After setting the correct data types and removing the original Value column, I got this table:
Figure 11 – Result after cleanup (Figure by the Author)
Handle the number with blanks
But we still have row 17, which contained a number with a blank.
How did I handle this?
The most straightforward approach was to remove any Blank from the column Value:
Figure 12 – Add a Replace Value call to remove blanks from the data (Figure by the Author)
But I had to add this step before starting the steps for separating the two value types:
Figure 13 – Add the Replace Value step at the correct place (Figure by the Author)
After adding this step, row 17 is recognized as a number and stored correctly.
Here is the data after loading it into Power BI:
Figure 14 – Clean data after loading into Power BI (Figure by the Author)
But this only worked if the text values were single words. It didn’t work when sentences or multiple words were stored there.
Conclusion
This was a fascinating excursion into how Power Query, or the M-language, works with data types.
I’m still unsure about the causes of the errors.
But I learned how to handle errors, or how to use the try call and handle the output.
This was very helpful.
Anyway, as you see with the original value in row 17, data quality is paramount.
I have another client where users from different countries are working on the same Excel file with their own number formats.
This is a nightmare because Excel is highly tolerant of data types. It accepts everything, even when the column is formatted as a number.
In that situation, I must force users to use Excel’s formatting options to ensure that numbers are consistently recognized as such.
Without this, I have no chance to import this data into Power BI without a lot of effort to clean up the numbers.
And stay assured that users always find a way to mess up with numbers in Excel.
References
The Data is created with random numbers and words.
Here is the reference for the M-Language: Power Query M formula language reference – PowerQuery M | Microsoft Learn