At some point in their career, many statisticians learn that calculating a quantile can be surprisingly complicated. In probability theory, quantiles have certain elegance: they’re simply defined based on the inverse of a cumulative distribution function. But when it comes time to actually estimate a quantile or compute it using real-world data, it’s surprisingly complicated, and it turns out that there are multiple reasonable ways to define a quantile for a dataset. Different software packages like SAS or R each offer at least a half-dozen ways to compute quantiles, with subtle differences. That’s why I often see data analysts surprised by the fact that their preferred software package spits out a different estimate for the median compared to someone else’s software.

Things get stil…

Things get still more complicated once you introduce weights, like when you’re analyzing data from a survey sample. This post explains why quantile estimation in general is tricky, and it highlights some troubling surprises in open-source software packages that compute weighted quantiles. It goes on to provide some suggestions for how to define weighted quantiles and fix the issues we see in software packages, and includes a small simulation study to see whether the proposed idea has any promise.

The main takeaways from this post are this:

Weighted quantiles are complex and statistical methods literature has not settled on how they should be computed.

In lieu of clear guidance from methods literature, many software package authors have come up with their own implementations. Most of these implementations have surprising behavior that you might find troubling.

More conversations and research is needed so that we can agree upon which methods for weighted quantiles to implement in software packages.

Why Computing Quantiles Can be Complicated

The big reason why there are different ways to compute quantiles is pretty easy to understand. Suppose we have the following example data, with (n=4), and we want to compute the median.

x <- c(2, 2, 3, 3)

Intuitively, the median should be the midpoint of our sorted data values. But because we have an even number of data points, it’s not clear if we should define the median as the second observation x=2 or as the third observation x=3, or something inbetween, like 2.5. Different software options will apply a different interpolation rule. Here are two different results from using two different options in R.

quantile(x, probs = 0.5, type = 7)

50%
2.5

quantile(x, probs = 0.5, type = 4)

50%
2

The numbers for these types come from the highly influential 1996 paper by Hyndman and Fan, “Sample Quantiles in Statistical Packages.” 1 The paper is almost certainly referenced in the documentation for your favorite statistical software package for quantiles. Of the Hyndman and Fan types, the first three are discontinuous methods: that is, they would impute the median in our data as exactly either 2 or 3. All the other types are continous methods, so they might impute the median as something like 2.5 or 2.17.

This blog post does a good job explaining and visualizing the different quantile types.

Surprising Results when Using Weights

When we have weights, things get more complicated. In the example data below, we would intuitively guess that the weighted median of the four data points should be somewhere in the range (\left[2,3\right]).

data <- data.frame(
x = c(2   , 2  ,  3   , 3   ),
w = c(0.25, 0.15, 0.35, 0.25)
)
print(data)

As in the unweighted case, we would want to take some weighted average of the values 2 and 3, so our estimated median is the weighted average (\hat{q}_p = \gamma \times 2 + (1-\gamma) \times 3), for some value of (\gamma) between 0 and 1. However, because we have weights, the value of (\gamma) is likely not just going to be (\gamma=0.5), but something that reflects the different weights for each data point.

A few packages in R implement quantile functions that can weight the data points when computing a quantile. Here are three examples that compute a weighted quantile analogous to the base R unweighted quantile we’d get from quantile(x, type = 4), taken from the packages ‘survey’, ‘ggdist’, and ‘collapse’. Note that they all give the same result here, which is an interpolation between 2 and 3.

survey:::qrule_hf4(x = data$x, w = data$w, p = 0.5)

[1] 2.285714

collapse::fquantile(x = data$x, w = data$w, probs = 0.5, type = 4)

50%
2.285714

ggdist::weighted_quantile(x = data$x, weights = data$w, probs = 0.5, type = 4)

50%
2.285714

Now let’s see something really weird. Let’s reformat the data so that the records are listed in ascending order of weights. We haven’t actually changed our data, we’ve just changed how it’s represented.

data2 <- data |> sort_by(~ list(x, w))
print(data2)

Let’s calculate those weighted quantiles again.

survey:::qrule_hf4(x = data2$x, w = data2$w, p = 0.5)

[1] 2.4

collapse::fquantile(x = data2$x, w = data2$w, probs = 0.5, type = 4)

50%
2.4

ggdist::weighted_quantile(x = data2$x, weights = data2$w, probs = 0.5, type = 4)

50%
2.4

Well, that’s troubling. We have the exact same data, but reformatting it gave us a completely different weighted quantile: the quantile has changed from 2.29 to 2.4. With a single dataset, you could get a different weighted quantile estimate on Monday than on Tuesday, just because you reformatted it slightly.

To make things more confusing, these three packages’ results can actually disagree with each other for ostensibly the same quantile method, as in the following example.

survey:::qrule_hf7(x = data$x, w = data$w, p = 0.5)

[1] 2.833333

ggdist::weighted_quantile(x = data$x, weights = data$w, probs = 0.5, type = 7)

50%
2.640845

collapse::fquantile(x = data$x, w = data$w, probs = 0.5, type = 7)

50%
2.785714

What on earth is happening here?

Why R Functions for Weighted Quantiles Depend on Arbitrary Formatting Choices

In theory, a quantile is defined based on a CDF. So you would think that the weighted sample quantile would be calculated based on a weighted empirical CDF (ECDF). For our example data, the weighted ECDF would look like the following:

[ \begin{align} \hat{F}(2) &= 0.4 \ \hat{F}(3) &= 1 \end{align} ]

You might think that the weighted median would be just some function of this weighted ECDF. Unfortunately you’d be wrong.

These R functions aren’t actually using the weighted ECDF to define the weighted quantile. They’re using order statistics. Order statistics are uniquely defined when every observation in the data has equal weight. But they’re trickier to define when observations in the data have unequal weights. The R functions listed above use an unfortunate approach to define the order statistics.

In our unweighted sample of size (n=4), the order statistics are the following.

[ X_{(1)}=2, X_{(2)}=2, X_{(3)}=3, X_{(4)}=3 ]

For a weighted sample, the R functions are in essence defining the order statistics as tuples ((x, w)) that pair an observation’s value of (x) with its weight (w), and the order is entirely defined by the (x) values. Here’s one way this could look.

[ X_{(1)}=(2, 0.25), X_{(2)}=(2, 0.15), X_{(3)}=(3, 0.25), X_{(4)}=(3, 0.35) ]

But that was arbitrary. We could have just as easily defined the order statistics instead as the following.

[ X_{(1)}=(2, 0.15), X_{(2)}=(2, 0.25), X_{(3)}=(3, 0.35), X_{(4)}=(3, 0.25) ]

By default, the R functions for weighted quantiles all estimate the median from this dataset by interpolating between the order statistics (X_{(2)}) and (X_{(3)}), using some function of the data values and the weights. The problem is that those order statistics aren’t uniquely defined when the data have weights. Should (X_{(2)}) be defined as (X_{(2)}=(2, 0.25)) or (X_{(2)}=(2, 0.15))? The documentation for these R functions (2, 3, 4) doesn’t address this question. Unfortunately, the R code just implicitly answers this question by relying on the arbitary ordering of the records in the dataset.

This thoughtful blog post by Matthew Kay, the author of the ggdist R package, explored the question of how to define order statistics with weighted data. Unfortunately, the post only considered the scenario where observations with the same value of the variable (X) all also happen to have identical weights. That’s actually a rare special case in practice, and so the ‘ggdist’ package’s approach to weighted quantiles breaks down in the general setting where different observations in the data can have the same value of the variable (x) but different weights.

So What Does Other Software Do?

Python

The Python NumPy package provides very limited support for weighted quantiles. It supports a weighted analogue of Hyndman and Fan’s type 1 quantile method, which has the disadvantage of being a discontinuous method. This matches what SAS PROC UNIVARIATE does for weighted quantiles. On the plus side, this approach isn’t affected by the arbitrary order of the weights in the data, as we can see below.

Click to Show Python Example with NumPy

import numpy as np
import polars as pl

# Load data
x = np.array([2, 2, 3, 3])
w = np.array([0.25, 0.15, 0.30, 0.20])
data = pl.DataFrame([x, w], schema = ["x", "w"])

# Create two versions of the data,
# with opposite sorting of the weights
data = data.sort(["x", "w"], descending = [False, True])
data2 = data.sort(["x", "w"], descending = [False, False])

# Compute a weighted quantile
np.quantile(a = data["x"], weights = data["w"], q = 0.5, method = "inverted_cdf")

array(3)

np.quantile(a = data["x"], weights = data2["w"], q = 0.5, method = "inverted_cdf")

array(3)

The ‘survey’ package in R can match these results by specifying the option qrule = 'hf1'.

survey:::qrule_hf1(x = data$x, w = data$w, p = 0.5)

[1] 3

survey:::qrule_hf1(x = data2$x, w = data2$w, p = 0.5)

[1] 3

Julia

Julia also only supports only one kind of weighted quantile. For the small example dataset we’ve been using, you get nearly the same result regardless of how the weights are sorted, but unfortunately the results are still different. By reading carefully, you can conclude from the Julia function documentation that in general Julia’s weighted quantile method depends on the sort order of the weights, and that it uses a different approach than the R ‘survey’ package or Python’s NumPy. Unfortunately the documentation for the weighted quantile functionality doesn’t justify why Julia’s weighted quantile method works the way it does, doesn’t include any references to methods literature, and in any case it seems like there are some unresolved bugs in the implementation. 5 This kind of thing is why I don’t use Julia for statistics.

Click to Show Julia Example

```{julia}
using DataFrames
using StatsBase

# Define data values, and weights sorted in alternate orders
x  = [2,    2,    3,    3   ]
w1 = ProbabilityWeights([0.25, 0.15, 0.30, 0.20])
w2 = ProbabilityWeights([0.15, 0.25, 0.20, 0.30])

# Estimate weighted quantile where
# weights are sorted in descending order
sort_1_result = quantile(x, w1, 0.5)
sort_2_result = quantile(x, w2, 0.5)

# Compare the result
sort_1_result == sort_2_result
print(sort_1_result)
print(sort_2_result)

Starting Julia …

print(sort_1_result)

2.624999999999999

print(sort_2_result)

2.625

sort_1_result == sort_2_result

false


### SAS

The SAS SURVEYMEANS procedure only supports one type of weighted quantile, but it’s a different type than NumPy or SAS PROC UNIVARIATE, and is [something](https://documentation.sas.com/doc/en/statug/15.2/statug_surveymeans_details06.htm#statug.surveymeans.quantiledetails)\. I suspect it inherited its approach from the earlier software package [WesVar](https://methods.sagepub.com/ency/edvol/encyclopedia-of-survey-research-methods/chpt/wesvar)\. Instead of using tuples based on an observation’s value and its weight, it uses tuples based on an observation’s value and the corresponding value of the weighted ECDF\. So, for example, the order statistic \(X\_\{\(2\)\}\) for our example dataset is the tuple \(X\_\{\(2\)\} = \(2, \hat\{F\}\(2\)\)\)\. The full set of order statistics is the following:

\[ X\_\{\(1\)\}=\(2, 0\.4\), X\_\{\(2\)\}=\(2, 0\.4\), X\_\{\(3\)\}=\(3, 1\), X\_\{\(4\)\}=\(3, 1\) \]

In this way, the order statistics are actually uniquely defined\. For our example dataset, the SAS SURVEYMEANS procedure would estimate the median by interpolating between these order statistics as follows:

\[ \begin\{aligned\} \hat\{q\}\_\{0\.5\} &= X\_\{\(2\)\} \+ \frac\{0\.5 - \hat\{F\}\(X\_\{\(2\)\}\)\}\{\hat\{F\}\(X\_\{\(3\)\}\) - \hat\{F\}\(X\_\{\(2\)\}\)\} \(X\_\{\(3\)\} - X\_\{\(2\)\}\) \\ &\approx 2\.17 \end\{aligned\} \]

The general definition used by SAS PROC SURVEYMEANS is the following\.

\[ \hat\{Q\}\(p\)= \begin\{cases\}X\_\{\(1\)\} & \text \{ if \} p\<\hat\{F\}\left\(X\_\{\(1\)\}\right\) \\ X\_\{\(k\)\}\+\frac\{p-\hat\{F\}\left\(X\_\{\(k\)\}\right\)\}\{\hat\{F\}\left\(X\_\{\(k\+1\)\}\right\)-\hat\{F\}\left\(X\_\{\(k\)\}\right\)\}\left\(X\_\{\(k\+1\)\}-X\_\{\(k\)\}\right\) & \text \{ if \} \hat\{F\}\left\(X\_\{\(k\)\}\right\) \leq p\<\hat\{F\}\left\(X\_\{\(k\+1\)\}\right\) \\ X\_\{\(n\)\} & \text \{ if \} p=1\end\{cases\} \]

This is closely related to Hyndman and Fan’s Type 4 quantile, although it gives different results even for unweighted samples\.

## How Could We “Fix” the Open-Source Implementations of Weighted Quantiles?

To fix the problems with the weighted quantile methods in the various R and Julia packages, the most important thing to do is to stop using methods that depend on the arbitrary ordering of the weights in the data\. One tempting quick fix to the software would be to impose an order on the weights: for example, by sorting weights in ascending order and then carrying on with the existing algorithms\. That seems to me like a bad idea: whatever order the software developer picks would be purely arbitrary\.

I don’t know what the best method to use instead is, but I do have some thoughts on a way forward\.

I think one of the suggestions in [Matthew Kay’s blog post](https://htmlpreview.github.io/?https://github.com/mjskay/uncertainty-examples/blob/master/weighted-quantiles.html) gets at a better solution\. To understand Matthew’s suggestion, it helps to write out the Hyndman and Fan \(HF\) continuous quantile types for unweighed data using the following notation\.

### Hyndman and Fan Continuous Quantile Types for Unweighted Data

In the unweighted case, we denote the HF quantile of type \(t\) for the \(p\)-quantile of a variable \(X\) as \(\hat\{Q\}\_\{X, t\}\(p\)\)\. For example, we would refer to the type \(4\) median as \(\hat\{Q\}\_\{X, 4\}\(0\.5\)\)\.

\[ \begin\{aligned\} \hat\{Q\}\_\{X, t\}\(p\) & =\(1-\gamma\) X\_\{\(j\)\}\+\gamma X\_\{\(j\+1\)\} & & \\ j & =\lfloor p n\+m\_t\rfloor & & m\_t \in \mathbb\{R\} \\ \gamma & =p n\+m\_t-j & & 0 \leq \gamma \leq 1 \end\{aligned\} \]

The value \(m\_t\) is specific to the quantile type \(t\), with the following values:


| \(t\) | \(m\_t\) |
| ----- | -------- |
| \(4\) | \(0\) |
| \(5\) | \(1/2\) |
| \(6\) | \(p\) |
| \(7\) | \(\(1-p\)\) |
| \(8\) | \(\(p\+1\)/3\) |
| \(9\) | \(\(2p \+ 3\)/8\) |

### Matthew Kay’s Initial Extension to Weighted Data

In his post, Matthew suggested extending this parameterization to weighted data\. This involves some additional notation\. Matthew used the notation \(w\_\{\(i\)\}\) to denote the weight associated with the case in the data with the \(i\)-th smallest value of the variable \(X\), and the notation \(W\_\{\(i\)\}\) to denote the cumulative sum of weights, \(W\_\{\(i\)\} = \sum\_\{j \leq i\} w\_\{\(j\)\}\)\. To help us make sense of this suggestion, we’ll also introduce the notation \(W\) to denote the total sum of weights in the sample\.

Matthew then used the index \(l\) to denote the largest value of \(i\) such that \(W\_\{\(i\)\}/W \leq p\), and he also used the index \(u\) to denote the smallest value of \(i\) such that \(W\_\{\(i\)\}/W \geq p\)\. This means that we’ll end up with a quantile estimate somewhere between the lower bound \(X\_\{\(l\)\}\) and the upper bound \(X\_\{\(u\)\}\)\.

Matthew then proposed extending the unweighted quantile to a weighted quantile as follows\.

\[ \begin\{aligned\} \hat\{Q\}\_\{X, w,t\}\(p\) & =\(1-\gamma\) X\_\{\(j\)\}\+\gamma X\_\{\(j\+1\)\} \\ j & =\left\lfloor l\+\frac\{p-W\_\{\(l\)\}\}\{w\_\{\(l\+1\)\}\}\+m\_t\right\rfloor \\ \gamma & =l\+\frac\{p-W\_\{\(l\)\}\}\{w\_\{\(l\+1\)\}\}\+m\_t-j \end\{aligned\} \]

This seems like a surprisingly intuitive extension\. The big problem with this extension is what I pointed out earlier in this blog post: when there are multiple observations with the same value of \(X\) but different weights, then the values for \(w\_\{\(l\+1\)\}\) and \(W\_\{\(l\)\}\) are undetermined\. So this proposed extension implicitly depends on the arbitrary order of weights for observations with the same value of \(X\)\.

### Could We Improve This Extension?

One idea to fix this would be to replace the cumulative weight sums with the actual weighted ECDF\. That is, instead of using \(W\_\{\(l\)\}\), the definition could have used the weighted empirical CDF\.

\[ \hat\{F\}\(X\_\{\(l\)\}\) = \frac\{1\}\{W\}\sum\_\{i=1\}^\{n\} w\_i I\(X\_i \leq X\_\{\(l\)\}\) \]

Second, instead of using the individual weights \(w\_\{\(l\+1\)\}\)–which suffer from the ordering problem–the definition could have used the difference in values of the weighted empirical CDF\.

\[ \hat\{F\}\(X\_\{\(l\+1\)\}\) - \hat\{F\}\(X\_\{\(l\)\}\) \]

And finally, instead of using indices \(j\) corresponding to the observations, \(1,\dots,n\), we would instead deduplicate the dataset to just the distinct values of \(j\) with positive weight, so that we can use indices \(j=1,\dots,n^\{\star\}\), where \(n^\{\star\}\) is the number of distinct values of \(X\) with nonzero weight\. Putting it all together, you’d get the following extension of the Hyndman and Fan types to weighted quantiles\.

\[ \begin\{aligned\} \hat\{Q\}\_\{X, w,t\}\(p\) & =\(1-\gamma\) X\_\{\(j\)\}\+\gamma X\_\{\(j\+1\)\} \\ j & =\left\lfloor l\+\frac\{p-\hat\{F\}\(X\_\{\(l\)\}\)\}\{\hat\{F\}\(X\_\{\(l\+1\)\}\) - \hat\{F\}\(X\_\{\(l\)\}\)\}\+m\_t\right\rfloor \\ \gamma & =l\+\frac\{p-\hat\{F\}\(X\_\{\(l\)\}\)\}\{\hat\{F\}\(X\_\{\(l\+1\)\}\) - \hat\{F\}\(X\_\{\(l\)\}\)\}\+m\_t-j \end\{aligned\} \]

Below is R code implementing this approach\.

Show R code

#’ @param x The variable whose quantile is to be estimated. #’ @param w A vector of weights, the same lengtht as x #’ @param p A value between 0 and 1, specifying which quantile to estimate #’ (e.g., \code{p = 0.5} for the median). #’ @param type A number between 4 and 7, indicating which Hyndman and Fan #’ quantile type to estimate. #’ @return The weighted quantile estimate proposed_wtd_quantile <- function(x, w, p, type = 7) {

Get the constant appropriate the Hyndman and Fan type

m_t <- switch( as.character(type), ‘4’ = rep(0, times = length(p)), ‘5’ = rep(0.5, times = length(p)), ‘6’ = p, ‘7’ = (1 - p), ‘8’ = (p + 1)/3, ‘9’ = (2*p + 3)/8 )

Omit points with zero weight, as they don’t affect the CDF

x <- x[w != 0] w <- w[w != 0]

n <- length(x) if (n == 1) { return(1.0) } else if (n == 0) { stop(“x must have at least one observation with nonzero weight”) }

Compute the empirical CDF

ecdf <- aggregate( x = list(w_sum = w), by = list(x = x), FUN = sum ) ecdf[[‘Fhat’]] <- ( cumsum(ecdf[[‘w_sum’]]) / sum(ecdf[[‘w_sum’]]) ) n_values <- nrow(ecdf) ecdf_indices <- seq_len(n_values)

Compute quantiles based on the empirical CDF

q <- vector(‘numeric’, length(p)) for (p_index in seq_along(p)) { is_lte_p <- ecdf[[‘Fhat’]] <= p[p_index] if (any(is_lte_p)) { l <- tail(ecdf_indices[is_lte_p], 1) u <- pmin(l+1, n_values) } else { l <- 1L u <- 2L } if (u > l) { pre_j <- ( l + (p[p_index] - ecdf$Fhat[l])/(ecdf$Fhat[u] - ecdf$Fhat[l]) + m_t[p_index] ) } else { pre_j <- l } j <- floor(pre_j) gamma <- pre_j - j q[p_index] <- ( gamma*ecdf$x[pmin(j+1, n_values)] + (1-gamma)*ecdf$x[pmax(j, 1L)] ) } return(q) }


## A Brief Simulation Study

Let’s see if this approach works out OK in an actual simulation study\.

First we’ll define a finite population of size \(N=10,000\), divided into \(1,000\) PSUs \(i\.e\., clusters\)\. We’ll create one normally distributed variable \(`x_normal`\) and one skewed variable \(`x_skewed`\), with a high degree of intracluster correlation\. To induce ties in the data, we’ll round these variables to two decimal places\.

Show R code to define the population

library(dplyr)

set.seed(2025)


Show the R code that produces the plot

library(ggplot2)

finite_pop |> tidyr::pivot_longer(cols = c(x_normal, x_skewed), names_to = “variable”, values_to = “value”) |> ggplot(aes(x=value)) + facet_wrap(~ variable, scales = “free”) + geom_histogram(binwidth=0.5, fill=“dodgerblue4”, color=“#e9ecef”, alpha=0.9) + labs(title = “Distributions of variables in finite population”)


![](https://www.practicalsignificance.com/posts/index_files/figure-html/unnamed-chunk-14-1.png)

Next we’ll define PSU sampling probabilities\. To make things interesting, we’ll make the sampling probabilities approximately proportional to the variables of interest\. This means that, despite the clustering, our sample design might be more efficient than simple random sampling, and the weights are informative\. This kind of scenario poses a challenge for approaches to weighted quantiles that rely on crude effective sample size measures

Here’s the R code that determines sampling probabilities and defines a function to draw samples\.

Show the code to define the sample design

psu_probs <- finite_pop |> group_by(PSU) |> summarize(a = mean(x_normal)) |> mutate(a = a + rnorm(n = n(), mean = 1, sd = 1)) |> mutate(psu_prob = sampling::inclusionprobabilities(a = a, n = 100))

finite_pop <- left_join( x = finite_pop, y = psu_probs, by = ‘PSU’ ) |> mutate(wgt = 1/psu_prob)

draw_sample <- function() { sampled_psus <- psu_probs$PSU[ as.logical(sampling::UPbrewer(pik = psu_probs$psu_prob)) ] finite_pop |> subset(PSU %in% sampled_psus) }


Next we define a function that computes weighted quantile estimates using a few different R packages, as well as the approach I suggested earlier\.

Show R code

get_quantile_functions <- function(type = 7) { fn_list <- list( ‘proposed’ = function(x, w, p) proposed_wtd_quantile(x = x, w = w, p = p, type = type), ‘survey’ = switch( as.character(type), ‘1’ = survey:::qrule_hf1, ‘2’ = survey:::qrule_hf2, ‘3’ = survey:::qrule_hf3, ‘4’ = survey:::qrule_hf4, ‘5’ = survey:::qrule_hf5, ‘6’ = survey:::qrule_hf6, ‘7’ = survey:::qrule_hf7, ‘8’ = survey:::qrule_hf8, ‘9’ = survey:::qrule_hf9 ), ‘ggdist’ = function(x, w, p) ggdist::weighted_quantile(x = x, probs = p, weights = w, type = type) |> unname() ) if (type %in% 4:9) { fn_list[[‘collapse’]] <- function(x, w, p) collapse::fquantile(x = x, probs = p, w = w, type = type) |> unname() } if (type == 7) { fn_list[[‘cNORM’]] <- function(x, w, p) cNORM::weighted.quantile(x = x, probs = p, weights = w, type = “Type7”) |> unname() } return(fn_list) }


With all that out of the way, let’s run some simulations and evaluate the root mean squared error of different weighted estimators for the median\. For each continuous quantile type \(i\.e\., Hyndman and Fan types 4 through 9\), we’ll repeat the processing of drawing a sample and producing weighted quantile estimates 1,000 times\. Then we’ll compare the weighted estimates against the finite population quantile to compute the root mean square error \(RMSE\) of the estimator\.

Show the code to run simulations

sim_results <- data.frame( variable = character(), type = numeric(), method = character(), rmse = numeric() ) for (type_number in 4:9) {

Draw samples and estimate quantiles, for each variable

x_normal_sim_output <- replicate( n = 1000, expr = { sample_data <- draw_sample() sapply(get_quantile_functions(type = type_number), (fn) { fn(x = sample_data$x_normal, w = sample_data$wgt, p = 0.5) }) } ) x_skewed_sim_output <- replicate( n = 1000, expr = { sample_data <- draw_sample() sapply(get_quantile_functions(type = type_number), (fn) { fn(x = sample_data$x_skewed, w = sample_data$wgt, p = 0.5) }) } )

Compute finite population quantiles, for each variable

x_normal_pop_quantile <- quantile(x = finite_pop$x_normal, probs = 0.5, type = type_number) x_skewed_pop_quantile <- quantile(x = finite_pop$x_skewed, probs = 0.5, type = type_number)

Compute RMSE for each variable, for each variable

x_normal_rmse_by_method <- apply(X = x_normal_sim_output, MARGIN = 1, FUN = (estimates) { sqrt(mean((estimates - x_normal_pop_quantile)^2)) }) x_skewed_rmse_by_method <- apply(X = x_skewed_sim_output, MARGIN = 1, FUN = (estimates) { sqrt(mean((estimates - x_skewed_pop_quantile)^2)) })

Collect output into a dataframe, append to collected results

sim_results <- sim_results |> rbind( data.frame( variable = “x_normal”, type = type_number, method = names(x_normal_rmse_by_method), rmse = x_normal_rmse_by_method, row.names = NULL ) ) |> rbind( data.frame( variable = “x_skewed”, type = type_number, method = names(x_skewed_rmse_by_method), rmse = x_skewed_rmse_by_method, row.names = NULL ) ) }


Now let’s plot the results\.

Show the code to produce the table

sim_results |> tidyr::pivot_wider(names_from = “type”, values_from = “rmse”) |> relocate(variable, .before = method) |> mutate( method = factor( method, levels = c(“proposed”, “survey”, “ggdist”, “collapse”, “cNORM”), labels = c(“proposed”, “survey package”, “ggdist package”, “collapse package”, “cNORM package”) ) ) |> arrange(variable, method) |> gt::gt(groupname_col = “variable”, row_group_as_column = TRUE) |> gt::fmt_number(decimals = 3, columns = -method) |> gt::sub_missing(missing_text = “N/A”) |> gt::tab_spanner(label = “Hyndman and Fan Type”, columns = as.character(4:8))


method

Hyndman and Fan Type

9
4 5 6 7 8
x\_normal proposed 0\.106 0\.099 0\.101 0\.096 0\.101 0\.098
survey package 0\.103 0\.104 0\.105 0\.101 0\.104 0\.103
ggdist package 0\.103 0\.104 0\.105 0\.101 0\.104 0\.103
collapse package 0\.103 0\.104 0\.105 0\.101 0\.104 0\.103
cNORM package N/A N/A N/A 0\.101 N/A N/A
x\_skewed proposed 0\.490 0\.469 0\.461 0\.455 0\.476 0\.456
survey package 0\.494 0\.469 0\.462 0\.456 0\.479 0\.457
ggdist package 0\.494 0\.469 0\.462 0\.456 0\.479 0\.457
collapse package 0\.494 0\.469 0\.462 0\.456 0\.479 0\.457
cNORM package N/A N/A N/A 0\.456 N/A N/A

All the methods turned out essentially the same here\. However, the proposed method for the most part performed a tiny bit better than all of the other alternatives\. The only place where it performed worse was in the Hyndman and Fan type 4 quantile for the normally-distributed variable, where the proposed method performed slightly worse\. These are encouraging results, although a full simulation study would probe additional scenarios where things could turn out quite different\.

For the sake of documentation, the R package versions involved in this simulation study are shown below\.

Click to show R session info

sessionInfo()

R version 4.5.1 (2025-06-13 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 11 x64 (build 26100)

Matrix products: default LAPACK version 3.12.0

locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_4.0.0 dplyr_1.1.4 reticulate_1.43.0

loaded via a namespace (and not attached): [1] gt_1.1.0 ggdist_3.3.3 sass_0.4.10 [4] generics_0.1.4 tidyr_1.3.1 xml2_1.4.0 [7] lpSolve_5.6.23 lattice_0.22-7 digest_0.6.37 [10] magrittr_2.0.4 evaluate_1.0.5 grid_4.5.1 [13] RColorBrewer_1.1-3 fastmap_1.2.0 sampling_2.11 [16] jsonlite_2.0.0 Matrix_1.7-4 DBI_1.2.3 [19] survival_3.8-3 purrr_1.1.0 scales_1.4.0 [22] cli_3.6.5 mitools_2.4 rlang_1.1.6 [25] cNORM_3.5.0 leaps_3.2 splines_4.5.1 [28] withr_3.0.2 yaml_2.3.10 tools_4.5.1 [31] parallel_4.5.1 JuliaConnectoR_1.1.4 vctrs_0.6.5 [34] R6_2.6.1 png_0.1-8 lifecycle_1.0.4 [37] fs_1.6.6 htmlwidgets_1.6.4 MASS_7.3-65 [40] pkgconfig_2.0.3 pillar_1.11.1 gtable_0.3.6 [43] glue_1.8.0 Rcpp_1.1.0 collapse_2.1.3 [46] xfun_0.53 tibble_3.3.0 tidyselect_1.2.1 [49] rstudioapi_0.17.1 knitr_1.50 farver_2.1.2 [52] htmltools_0.5.8.1 survey_4.5 rmarkdown_2.29 [55] labeling_0.4.3 compiler_4.5.1 S7_0.2.0 [58] distributional_0.5.0


## Where Do We Go From Here?

I hope this blog post stimulates discussion among the authors of the software packages it highlights\. The R package authors are all statisticians who have all thought about this problem and landed upon slightly different solutions, none of which seem to be clear silver bullets to this problem\. I plan to share this post with these authors to get their thoughts on whether the surprising behavior I highlighted is concerning and what approaches they might want to take to address it\. For developers working on other software packages \(like [Polars](https://github.com/pola-rs/polars/issues/10726) or [Survey\.jl](https://github.com/xKDR/Survey.jl/issues/130)\), I would hold off on implementing weighted continuous quantile estimators for now until there’s better methodological consensus on how to implement them\.

## Footnotes

1.

Funny enough, the diversity of quantile options in software packages is in part an unintended legacy of Hyndman and Fan’s paper, which was written in part to convince software developers to just agree on using one single quantile method\. Check out [this interesting blog post](https://robjhyndman.com/hyndsight/sample-quantiles-20-years-later/) reflecting on the paper 20 years later\.\][↩︎](#fnref1)
1.

https://cran\.r-project\.org/web/packages/survey/vignettes/qrule\.pdf[↩︎](#fnref2)
1.

https://sebkrantz\.github\.io/collapse/reference/fquantile\.html[↩︎](#fnref3)
1.

https://htmlpreview\.github\.io/?https://github\.com/mjskay/uncertainty-examples/blob/master/weighted-quantiles\.html[↩︎](#fnref4)
1.

See https://github\.com/JuliaStats/StatsBase\.jl/issues/435 and https://github\.com/JuliaStats/StatsBase\.jl/issues/933[↩︎](#fnref5)