, I will present a solution to the Subset Sum Problem, which has linear time complexity O(n), if all the βnβ input values are βclose enoughβ to each other. We will see shortly what that condition actually means, as well as why it might often be kept (or almost kept) in practice. For the inputs that are βalmost closeβ to each other, the algorithm still has a certain probability of working fast enough.
This article is organized in the following way:
- In chapter 1, we will recall the Subset sum problem and show probably the most trivial solution to it, based on the brute-force technique.
- Chapter 2 recalls the class of NP-complete problems and how the Subset sum problem is related to it.
- Chapter 3 reviews the commonly used solution based on Dynamic programming technique, β¦
, I will present a solution to the Subset Sum Problem, which has linear time complexity O(n), if all the βnβ input values are βclose enoughβ to each other. We will see shortly what that condition actually means, as well as why it might often be kept (or almost kept) in practice. For the inputs that are βalmost closeβ to each other, the algorithm still has a certain probability of working fast enough.
This article is organized in the following way:
- In chapter 1, we will recall the Subset sum problem and show probably the most trivial solution to it, based on the brute-force technique.
- Chapter 2 recalls the class of NP-complete problems and how the Subset sum problem is related to it.
- Chapter 3 reviews the commonly used solution based on Dynamic programming technique, and highlights why it is a pseudo-polynomial algorithm.
- In chapter 4, I will introduce the new algorithm, called βInterval-based solutionβ.
- Chapter 5 will show that, similarly to the Dynamic programming technique, the Interval-based algorithm can also restore the exact subset of the items that form given target sum, and not just answer if the target sum is achievable or not.
- Finally, in chapter 6, we will see why the Interval-based algorithm actually reaches linear time complexity, if all the βnβ input values are close enough to each other (as well as we will understand there what the βclose enoughβ condition actually means).
1. Recalling the Subset sum problem
The definition of the Subset Sum Problem (SSP) is quite simple:
We are given βnβ input integers X={x1, x2, β¦, x**n}, and a target sum βqβ. It is necessary to figure out if there exists such a subset βYβ of those βnβ integers, numbers of which will sum up exactly to βqβ. If it does, the problem might also ask to report all the numbers of the subset βYβ.
Here is an example of input to the SSP:
The input set βXβ with βn=8β input values is presented on the right.
The target sum βq=22β is shown on the left.
And here is its solution:
Choosing the input values 14, 2, and 6 results in the target sum βqβ: β14+2+6=22β.
Note, there can be cases with more than one solution, for a given target sum βqβ:
For the target sum βq=30β, we have two options: β2+13+15=30β and β6+24=30β.
Also, there can be cases when there is no solution at all, meaning that there is no such subset, integers of which will accumulate exactly to the given target sum βqβ:
No subset of the input items βXβ will sum up to 25.
In this article, we will consider only positive input values xi** β X. However, all the ideas to be presented beneath can be applied (with minor modifications) also to the case when there are negative input values as well.
In the Subset sum problem (SSP), a slight change in the input values can lead to a complete change of the answer. Like, if there are many subsets that form the target sum βqβ, it doesnβt mean that there will be even one subset, that forms the target sum βq+1β. This fact also makes the SSP not an easy one to solve.
Even when there are not many input numbers, it might still be difficult to solve the problem on a piece of paper. Often, we will need to consider all different subsets and check which one of them accumulates towards βqβ. That approach lies in the trivial solution of SSP, brute-force, which just enumerates all possible subsets.
The idea for brute-force implementation is: considering that there exists such a subset YβX, items of which accumulate towards βqβ, letβs address the last input item x**n. There are two scenarios:
- either x**n participates in the result subset βYβ,
- or it doesnβt.
Having that said, if x**n does participate in βYβ (x**nβY), then we should continue searching for such a subset from the reduced set of numbers β*X *\ {x**n} = {x1, x2, β¦, x**n-1}β, which will accumulate now to βqβxnβ:
While summing up βq=22β, we decided to take input item β6β into the result subset βYβ.
So, from the remaining numbers, we should build up the sum β22-6=16β.
Otherwise, if the case is that x**n doesnβt participate in βYβ (x**nβY), then we should continue searching for such a subset from the reduced set of numbers β*X *\ {x**n} = {x1, x2, β¦, xn-1}β, which will itself accumulate to the same target sum βqβ:
While summing up βq=22β, we decided not to take input item β6β into the result subset βYβ.
So, from the remaining numbers, we should build up the same sum β22β.
Surely, we donβt know in advance which case is true; thatβs why the brute-force algorithm just tries one case after another.
When trying to solve the reduced problem (i.e., finding a proper subset from the reduced set of numbers β*X *\ {x**n} = {x1, x2, β¦, x**n-1}β), the brute-force algorithm applies the same logic recursively. So, regardless of which branch we took, next, the value x**n-1 will be considered, and at first, we will look for a solution where x**n-1 does participate in the result subset, after which weβll look for such a solution where it doesnβt.
The decision diagram for the Subset Sum Problem. We start at the rightmost state, and on every step we either take xi into the result subset (the upper branch), or we donβt take it (the lower branch). The numbers near every state show the remaining sum that is necessary to gather.
While recursion deepens, if the remaining target sum becomes zero, it means that we are currently on the proper branch, and the considered numbers already accumulate to the original target sum βqβ.
Following the proper path (red arrows), we will eventually build the target sum βq=22β.
The pseudo-code of the mentioned brute-force algorithm becomes like this:
// Searches for such a subset of βXβ which has a sum equal
// to βqβ, and places it into βYβ. The set βXβ is assumed
// to have βnβ integers.
procedure ssp_brute_force(
X[1..n]: set of Integers,
n: Integer,
q: Integer,
Y: Reference to set of Integers )
if q==0 then
// By picking the proper integers into βYβ, we have
// iteratively reduced βqβ to zero, so the solution
// subset is already in βYβ.
print Y
return // No need to do anything else in this branch
if n==0 then
// βqβ is not zero, while the input set βXβ is
// exhausted. This branch didnβt led to a solution.
return
// Try with βX[n]β in the solution subset
place X[n] into Y
ssp_brute_force( X\{X[n]}, n-1, q-X[n], Y )
// Continue searching with the reduced set
// (βXβ without the last integer X[n]), for
// such a subset, which sums to βq-X[n]β.
remove X[n] from Y
// Try without βX[n]β
ssp_brute_force( X\{X[n]}, n-1, q, Y )
// Continue searching with the reduced set
// (βXβ without the last integer X[n]), for
// such a subset, which still sums to βqβ.
Upon the pseudo-code, we see that solving the problem for βnβ input items requires solving two problems, each with βn-1β input items. Each of these will, in its turn, require solving two reduced problems, this time with βn-2β input items, resulting overall in 4 problems with βn-2β input items each. This way, the number of problems doubles on each arrival of a new input item, which makes the time complexity of the brute-force algorithm exponential β O(2n).
The height of the decision diagram is exponential β 2n, while its width equals βnβ.
In practice, this algorithm can be used only if βnβ is sufficiently small.
However, the next algorithms for the Subset sum problem which will be described in this article, still rely on the logic of βusing vs. not usingβ the current item.
2. Subset sum and NP-complete problems
There is a class of problems in Computer Science, called βNP-completeβ, which consists of such problems that are considered difficult to solve. The described Subset Sum Problem belongs to the NP-complete class. More precisely, in order for a problem to be NP-complete, the following criteria must be met:
- It is a decision problem, meaning that for any input to the problem, the output is either βyesβ or βnoβ.
- Regarding the Subset sum problem, any input consisting of a set of input values X = {x1, x2, β¦, x**n} and a target sum βqβ results in a βyesβ or βnoβ answer: we either can pick a subset which accumulates towards βqβ, or we canβt.
- When the answer is βyesβ, this can be demonstrated through the existence of a short (polynomial length) solution.
- In regard to the SSP, if it is possible to pick and present such a subset YβX, we can easily sum up all its integers, and check if that really equals βqβ or not.
- The correctness of each solution can be verified quickly (namely, in polynomial time), and a brute-force search algorithm can find a solution by trying all possible solutions.
- The brute-force solution for Subset sum problem is what we have just recalled in the previous chapter.
- The problem can be used to simulate every other problem for which we can quickly verify that a solution is correct. Hence, if we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily converted.
The last statement shows that there are also other NP-complete problems, as well as that any of them can be converted into another. So, if a polynomial-time algorithm for one NP-complete problem will be found, we could use it to solve any other NP-complete problem, by converting it to the former one. Here is a common conversion diagram between different NP-complete problems:

We see that, for example, the βBoolean formula satisfiability problemβ can be converted into the βSubset sum problemβ, while the latter one can be converted into the βKnapsack problemβ.
3. The Dynamic programming solution of the Subset sum problem
The common solution of the Subset Sum Problem (SSP) uses a Dynamic Programming technique. The principle of any Dynamic Programming algorithm lies in initially solving problems of smaller sizes, and later using those solutions to solve the initial large problem.
Letβs βSβ be a boolean matrix having dimensions βS[0..n][0..q]β, with βS[i][p]β denoting if we can gather the sum βpβ by choosing only between the first βiβ input items.
X = {8, 4, 11, 3, 9}, q = 28.
The matrix βSβ with dimensions β(n+1)*(q+1)β.
In the first chapter, we already expressed the value βS[n][q]β in a recursive way. βS[n][q]β denotes whether we can obtain the sum βqβ by choosing between all the βnβ input integers, which is the answer to the initial problem. We addressed two cases there:
- If the last item βx**nβ participates in the result subset βYβ, then we should be able to obtain the sum βqβx**nβ, by choosing between the first βn-1β input items. In other words, βS[n-1][qβx**n]β should be equal to βtrueβ.
- Otherwise, if the last item βx**nβ does not participate in the result subset βYβ, then we should manage to obtain the target sum βqβ by choosing also between the first βn-1β input items. In other words, βS[n-1][q]β should be equal to βtrueβ then.
There is no third option. If either of these two conditions is satisfied, then the target sum βqβ is reachable. If they both are satisfied, it means that βqβ can be obtained both by picking βx**nβ and by not picking it into the result subset βYβ.
So the formula for the initial problem becomes:
[\begin{equation*} S[n][q] = S[n-1][q-x_n] \ or \ S[n-1][q] \end{equation*}]
And it works not only for βS[n][q]β but also for any βS[i][p]β, as the logic remains the same:
[\begin{equation*} S[i][p] = S[i-1][p-x_i] \ or \ S[i-1][p] \end{equation*}]
Now, when filling the matrix βS[0..n][0..q]β, we see that the value of any cell βS[i][p]β depends only on two other cells, both situated in the row above:
*Value of the cell S[3][14] (the light red one) depends only on the values of the two cells *
S[2][14] and S[2][3] (the light blue ones).
This means we can calculate the matrix in top-down order, which will guarantee that at the moment βS[i][p]β is being calculated, we already know the values of both βS[i-1][pβx**i]β and βS[i-1][p]β.
The matrix will be filled in the top-down direction, calculating cells of every row from left to right.
The yellow cells are not calculated yet.
What is required to start the calculation is the content of the first row βS[0][p], pβ[0..q]β. The cell βS[0][p]β denotes if it is possible to gather sum βpβ, having only the first 0 input items (i.e. having no item at all)β. Obviously, the answer is βfalseβ if βp>0β and is βtrueβ only if βp==0β (we can gather the sum 0, without using any item).
Regardless of the set of input items βXβ, the first row of table βSβ always has βS[0][0] = trueβ, and βS[0][p] = falseβ, when βpβ₯1β. The yellow cells are not calculated yet.
Having that said, we can calculate all cells of the table in the top-down order. For our input set, the result will be:
The last cell βS[5][28] = trueβ, which means that the target sum βq=28β can be obtained
by choosing between all 5 input items.
The pseudo-code of the Dynamic Programming solution becomes:
// Given an n-long set of integers βXβ, returns if it is possible
// to find such a subset of it, which sums up to βqβ.
function ssp_dynamic_programming(
X[1..n]: set of Integers,
n: Integer,
q: Integer ) : Boolean
S[0..n][0..q]: Matrix of Booleans
// Fill first row of βSβ
S[0][0] := true
for p in [1..q]
S[0][p] := false
// Fill content of the matrix
for i in [1..n]
for p in [0..q]
if p < x[i]
S[i][p] := S[i-1][p]
else
S[i][p] := S[i-1][p-x[i]] or S[i-1][p]
// The answer is in the bottom-right cell
return S[n][q]
The bottom-right cell βS[n][q]β will contain the answer to the original problem, stating if we can gather the sum βqβ by choosing between all the βnβ input items.
The presented solution requires filling the matrix βSβ, which has β(n+1)*(q+1)β cells. Calculating each cell is done in O(1) time. Hence, the time complexity of the Dynamic Programming algorithm becomes O(nq). This solution is called pseudo-polynomial, because here the factor βqβ is not proportional to any polynomial of the problemβs size. In fact, βqβ can be proportional even to the exponent of problemβs size.
4. Introducing the Interval-based algorithm
In the general case, the table βSβ filled in the previous chapter can have quite unpredictable content at the end. However, if the input values βX = {x1, x2, β¦, xn}β do satisfy certain conditions, rows of the filled table might contain many adjacent cells with βtrueβ values, as well as many adjacent cells with βfalseβ values.
X = {9, 4, 3, 12, 5}, q = 30.
For the presented input, the last row contains one long range of true-values, starting from S[5][12] till S[5][21].
In the current row, letβs denote those adjacent cells with a value of βtrueβ as βtrue-intervalsβ, and the adjacent cells with a value of βfalseβ as βfalse-intervalsβ.
Some true-intervals are highlighted in green, while some false-intervals in red.
If we know in advance that the calculated table βSβ will have sufficiently long true-intervals and/or sufficiently long false-intervals at the end, it will make sense to compute βSβ not cell-based (as we did in the previous chapter), but interval-based.
Then, instead of representing every row as a boolean array, we will represent it as a sequence of ordered true-intervals. Within the current row, the true-intervals never intersect, so we can keep them sorted by their starting points. Also, for simplicity of the formulas to come later, when denoting a true-interval, we will mean a half-opened range [a..b). So, for example, a true-interval [5..8) will mean that on the current row, the cells 5, 6, and 7 are equal to βtrueβ, while cell 8 is βfalseβ.
The last row contains the following true-intervals: { [0,1), [3,6), [7,10), [12,22), [24,27), [28,31) }. All those are highlighted in green. The set of true-intervals uniquely identifies the content of the last row of the cell-based table.
Now, having the true-intervals (later denoted just as βintervalsβ, as false-intervals do not play a role in the algorithm to be described) of the previous row βi-1β, how should we compute the intervals of the current row βiβ? Letβs recall the formula for filling the table:
[\begin{equation*} S[i][p] = S[i-1][p-x_i] \ or \ S[i-1][p] \end{equation*}]
If altering it for a moment, and assuming that there is only the second part of the OR-expression:
[\begin{equation*} S[i][p] = S[i-1][p] \end{equation*}]
it will result in all cells of the previous row being copied into the current row. In other words, for that it would be just enough to copy the intervals from row βi-1β to row βiβ.
Copying intervals from row βi-1β (light green bricks) to row βiβ (dark green bricks).
On the other hand, if altering the formula in the other way, assuming that there is only the first part of the OR-expression:
[\begin{equation*} S[i][p] = S[i-1][p-x_i] \end{equation*}]
it will still result in copying all the cells from the previous row to the current row, but this time shifted by βx**iβ positions rightwards. So that is what will be necessary to do also with the intervals:
Copy-and-shifting rightwards by βxiβ all the intervals from row βi-1β (light green bricks) to row βiβ (dark green bricks).
Finally, referring to the original formula:
[\begin{equation*} S[i][p] = S[i-1][p-x_i] \ or \ S[i-1][p] \end{equation*}]
it requires us to do βORβ on the two obtained sequences of intervals, which is the same as uniting them geometrically.
Geometrically uniting the original and shifted rightwards by βxiβ sequences of true-intervals (first two rows) results in the sequence of true-intervals for the next item βxiβ (the bottom row).
Summarizing what was said above, when filling the table of true-intervals, to compute the content of row βiβ we should:
- shift the content of row βi-1β to the right by βx**iβ positions, and
- unite it with the original content of row βi-1β.
Note, this also means that the span of row βiβ (right endpoint of the right-most interval) will always be by βx**iβ units greater than the span of row βi-1β.
Spans of rows grow by sizes, equal to values βxiβ. Span of row βx1=9β is β9β. Span of row βx2=4β is β9+4=13β. Span of row βx3=3β is β9+4+3=16β, and so on.
So the span of any row βiβ can be calculated as:
[\begin{equation*} span(i) = 1 + x_1 + x_2 + β¦ + x_i = 1 + \sum_{k=1}^i x_k \end{equation*}]
where the free term β1β comes because of all the intervals being half-open (the initial interval at row 0 is [0, 1), so βspan(0) = 1β).
Considering that the previous row βi-1β has βcβ intervals, shifting them all by x**i will obviously require O(c) time. Calculating the union of 2 sequences, each having βcβ intervals, can also be performed in O(c) time if scanning both sequences from left to right simultaneously. This means that the transition from the previous row to the current one requires O(c) time.
Assuming that the maximal number of intervals on any row is βcβ, the time complexity for the Interval-based algorithm becomes O(nc).
The important note that we should make here is that the value βcβ significantly depends on the order of input values βX = {x1, x2, β¦, x**n}β. Here is an example with βn=5β values, which at the end produces lots of intervals in the table.
X = {6, 11, 4, 2, 5}.
The βn=5β input values are considered in arbitrary order. This results in lots of short intervals in the middle of the table.
And here is the problem solved with the same set of input values βX={x1, x2, β¦, x**n}β, but arranged in a different order, more precisely, in ascending order.
X = {2, 4, 5, 6, 11}.
The same input set with βn=5β values, but now arriving in increasing order. We see that numbers of intervals in intermediate rows are reduced significantly.
Such a behavior is quite intuitive: if we want the intervals of the current row to remain as few as possible, their span should grow as slowly as possible. And an obvious way to achieve that is to consider all the input values x**i in ascending order.
5. Restoring the exact subset of values
The Dynamic programming solution of the Subset sum problem, which we reviewed in chapter 3, not only answers if it is possible to obtain given sum βqβ from the input items βX = {x1, x2, β¦, xn}β, but also specifies the exact subset of βXβ (letβs denote it by βYβ, βYβXβ), items of which sum up to βqβ. To retrieve βYβ, we must move over the table in the opposite direction β from bottom to top.
The last cell of the table βS[n][q]β, which contains the answer to the original problem, was calculated by the formula:
[\begin{equation*} S[n][q] = S[n-1][q-x_n] \ or \ S[n-1][q] \end{equation*}]
If βS[n][q]β appears to be true (meaning that the target sum βqβ is reachable), it means that at least one of the 2 operands of the OR-expression is also βtrueβ. So, we can check 2 cases:
- if βS[n-1][q] == trueβ, it means that the target sum βqβ can be obtained also by choosing only between the first βn-1β input items. So the last item βx**nβ does not participate in βYβ, and we can continue building βYβ only from the first βn-1β items.
- if βS[n-1][qβx**n] == trueβ, it means that the first βn-1β input items participated in building the sum βqβx**nβ, after which we added the last item βx**nβ, and resulted with the necessary sum βqβ. So the last item βx**nβ was actually used, and we must build now another target sum βqβx**nβ, by choosing between only the first βn-1β input items.
This judgement answers whether the last item βx**nβ participates in the target sum βqβ. But the same logic also works for figuring out the participation of every other input item βx**iβ, in any other target sum βpβ. Recalling the general formula by which the entire table was filled:
[\begin{equation*} S[i][p] = S[i-1][p-x_i] \ or \ S[i-1][p] \end{equation*}]
a similar judgement can be made in the case if any βS[i][p] == trueβ, which will help us to understand if the item βx**iβ participates in making sum βpβ, or not.
So to reconstruct the complete subset βYβ, we just apply this principle repeatedly. The pseudo-code of retrieving βYβ becomes:
// Given the filled matrix βSβ, returns an exact subset of
// the n-long set βXβ, integers of which sum up to βqβ.
function obtain_subset(
X[1..n]: set of Integers,
n: Integer,
q: Integer,
S[0..n][0..q]: Matrix of Booleans ) : Set of Integers
if S[n][q] == false then
return β
// βqβ canβt be obtained
Y := empty set of Integers
while n > 0
// Here we know that βS[n][q]β is always true
if S[n-1][q] == true then
// Item βx[n]β can be not used in βYβ
else
// Item βx[n]β should be used in βYβ
insert X[n] into Y
q := q-X[n] // Remains to build the sum βq-X[n]β
// Move upwards to the preceding item
n := n-1
return Y
Time complexity of this function is O(n), as on every iteration we move upwards by one row over the table, while its height is βn+1β.
In our example, reconstructing the result subset βYβ is performed by the following path:
X = {9, 4, 3, 12, 5}, q = 30.

Step 1: Here we have βS[n][q] = S[5][30] = trueβ (rightmost green circle), which means that the target sum βq=30β is reachable. We see that βS[4][30] = falseβ (blue circle above), so the sum β30β canβt be obtained if choosing between the first 4 items. So, we need the item βx5=5β for construction.
Step 2: It remains to construct the sum β30-5=25β, by choosing between the first 4 items (2nd green circle). We see that βS[3][25] = falseβ (blue circle), which means that β25β canβt be constructed if choosing only between the first 3 items. So we need to use the item βx4=12β.
Step 3: It remains to construct the sum β25-12=13β, by choosing between the first 3 items (3rd green circle). We see that βS[2][13] = trueβ (4th green circle), which means that β13β can also be gathered by choosing between only the first 2 items. Thatβs why we skip item βx3=3β.
Step 4: Now we should construct the sum β13β, by choosing between the first 2 items. We see that βS[1][13] = falseβ (blue circle), which means that we canβt gather the sum β13β if using only the first item βx1β. So we need to use βx2=4β for that.
Step 5: It remains to construct the sum β13-4=9β, by choosing only the first input item (5th green circle). So we pick it up, as βx1=9β.
Can we act similarly in the Interval-based solution? We see that the only thing needed to reconstruct subset βYβ in the Dynamic programming solution is knowing the value of cell βS[i][p]β, which, in relation to the Interval-based solution, is knowing whether the i-th sequence of intervals covers the coordinate βpβ. That can be checked by running a Binary search over the current i-th sorted sequence of intervals.
The interval-based solution for the same sequence of input values X = {9, 4, 3, 12, 5}. The path to reconstruct subset βYβ remains the same, and the intervals that will help in reconstruction are highlighted in yellow.
Assuming that the i-th sequence contains βc**iβ intervals, binary search will take O(log c**i) time. To retrieve the complete subset βYβ, we run binary search on each of the βnβ sequences of intervals. If there are at most βcβ intervals on each row, retrieval of the result subset βYβ will run in O(n log c) time. Techniques like Fractional cascading can probably be applied here to speed up the subset retrieval process.
6. Time complexity of the Interval-based algorithm
In Chapter 4, we derived the time complexity of the Interval-based algorithm as O(nc), where βnβ is the number of input values and βcβ is the maximal number of intervals on a row. We also noted there that the order of input values βX = {x1, x2, β¦, xn}β matters significantly, and βcβ generally results in a smaller value when items of βXβ arrive in an increasing order. Now, are there such input cases for which the value βcβ will be really small?
Case 1: A geometric progression
First letβs consider the case when values of βXβ form a geometric progression with initial value of β1β and a common ratio of β2β:
[\begin{equation*} X = { 1, 2, 4, 8, 16, β¦, 2^{n-1} } : x_i = 2^{i-1} \end{equation*}]
What shape will have the set of constructed intervals?
As we already know, having intervals of the previous row βi-1β, to calculate intervals of the current row βiβ, there are 2 steps:
- offset all intervals of row βi-1β by βx**iβ to the right,
- unite the offsets with original content of row βi-1β.
Doing that on the mentioned input will result in:
- row 0 always has only one interval [0, 1), stating that only the sum β0β is achievable if using no input item at all.
- as βx1=1β, the shifted interval becomes β[0,1) >> 1 = [1,2)β, and after uniting with the original one we have: β[0,1) κ΄ [1,2) = [0,2)β,
- as βx2=2β, the shifted interval becomes β[0,2) >> 2 = [2,4)β, and after uniting with the original one we have: β[0,2) κ΄ [2,4) = [0,4)β,
- as βx3=4β, the shifted interval becomes β[0,4) >> 4 = [4,8)β, and after uniting with the original one we have: β[0,4) κ΄ [4,8) = [0,8)β,
- and so onβ¦
We see that each row βiβ has just 1 interval: [0,2i).
X = {1, 2, 4, 8, 16, β¦}
If the input values of βXβ form the geometric progression X = {1, 2, 4, 8, 16, β¦}, every row of the table will have just one interval.
Of course, the presented case is very special, and it is not a surprise that any target sum βq β [0, 2n)β can be constructed from those βnβ numbers. If representing βqβ in the binary numeral positional system, then its β1β digits will correspond to the powers of β2β (the input values), which should be added up to get βqβ.
Case 2: Slower than a geometric progression
Values grow rapidly in any geometric progression. In the previous case, as the common ratio was equal to 2, we could also express every value βx**iβ as the sum of all previous values, plus 1:
[\begin{equation*} x_i = 1 + \sum_{k=1}^{i-1} x_k \end{equation*}]
What if the sequence βXβ grows more slowly? In other words, what if after sorting all values of βXβ in an increasing order, each value βx**iβ turns out less than or equal to the sum of all previous values, plus 1?
[\begin{equation*} \hspace{1cm} x_i \leq 1 + \sum_{k=1}^{i-1} x_k \hspace{1cm} (1) \end{equation*}]
To understand how the intervals will be derived in such a case, letβs recall from Chapter 4 that the span of row βiβ equals the sum of all previous and the current input values, plus 1:
[\begin{equation*} span(i) = 1 + \sum_{k=1}^{i} x_k \end{equation*}]
Now, if every value βx**iβ is less than or equal to the sum of all previous values, it means that βxiβ is also less than the span of its previous row:
[\begin{equation*} x_i \leq 1 + \sum_{k=1}^{i-1} x_k = span(i-1) \end{equation*}]
which means that if there is only one interval at row βi-1β, uniting it with its offset rightwards by βx**iβ, will still result in only one interval at row βiβ. Initially, we have only one interval in row βi=0β. So, in the case when input values grow more slowly than the mentioned geometric progression (1), each row of the table will have only one interval.
X = {1, 2, 3, 6, 11, 20},
When the input values grow slower than the geometric progression with a common ratio of β2β, we still have only one interval per row.
Concluding this case, if after sorting all the input values of βXβ increasingly, we have the constraint:
[\begin{equation*} x_i \leq 1 + \sum_{k=1}^{i-1} x_k \end{equation*}]
the table will have only βc=1β interval at every row, and the Interval-based algorithm itself will run in guaranteed O(n) time. Letβs note that from the practical perspective, such a constraint might often be satisfied. If the input data comes in random order, then an additional O(n log n) time will be required to sort it.
Case 3: Almost slower than a geometric progression
Letβs generalize the previous case 2, which had the constraint:
[\begin{equation*} x_i \leq 1 + \sum_{k=1}^{i-1} x_k \end{equation*}]
Now we will allow for it to be violated from time to time.
Once that happens for some βx**iβ, we can expect that the number of intervals βcβ in row βiβ will become greater. But how long can βcβ grow? Letβs observe it on the following example:
n = 7, X = { 1, 2, 5, 7, 10, 28, 30 },
It will be easier for us to write down another array βXPβ, where βxpiβ is equal to the sum of all preceding values:
XP = { 0, 1, 3, 8, 15, 25, 53, 83 }
β¦ we see that the mentioned condition is violated at the input values βx3=5β (as βx3>xp3+1β) and βx6=28β (as βx6>xp6+1β).
Now letβs construct the intervals. This time, as the span of all the inputs is large (βspan(n) = 83+1 = 84β), for the last 2 rows, we will write down the sequences of intervals, instead of presenting them geometrically:

As we can see, if for some βx**iβ the mentioned condition is violated, the number of intervals is doubled on that row. In our example, that took place for input values βx3β and βx6β. It happens because all the shifted intervals appear rightwards from the span of the current intervals:
If the current item xi is greater than the sum of all previous items plus 1, its shifted intervals (top-right sequence) will not have any common point with the current intervals (top-left sequence), which is why after uniting those two sequences, the number of intervals will be doubled (bottom sequence).
However, on the other rows, the number of intervals tends not to increase much, because many of them, after being united with the current rowβs content, just turn into one long interval. In our example, that explicitly happened at the last row βx7=30β.
If the current item xi is less than or equal to the sum of all previous items plus 1, its shifted intervals (top-right sequence) might have many common fragments with the current intervals (top-left sequence), which is why after uniting those two sequences (the bottom sequence), lots of intervals might be merged into one. Here 6 intervals are being merged into one long interval at the bottom-center.
The naming of this article comes from the fact that if all input items βX = {x1, x2, β¦, x**n}β are close enough to each other, many intervals will tend to unite during the top-down construction of the table, so the constant βcβ might remain bounded with certain probability. If that is the case, we will end up with βO(nc) = O(n)β linear time solution for the Subset Sum Problem, assuming that the input values arrive in a sorted way. Otherwise, the initial sorting of values of βXβ becomes the most time-consuming step, requiring additional βO(n log n)β time.
Case 4: Faster than a geometric progression
Letβs consider one more case, when the sorted input values βX = {x1, x2, β¦, x**n}β grow faster than the mentioned geometric progression with a common ratio of 2. That will happen if every value βx**iβ is greater than the sum of all previous values, plus 1:
[\begin{equation*} \hspace{1cm} x_i > 1 + \sum_{k=1}^{i-1} x_k \hspace{1cm} (2) \end{equation*}]
In the previous case 3, we already noted that once it happens for some βx**iβ (i.e., when βx**iβ appears rightwards from the span of all previous values), the number of intervals doubles, because no common fragments remain between the current and the shifted sequences of intervals.
And in the current case, when βXβ grows faster than a geometric progression with a common ratio of 2, it happens for every input value βx**iβ, so the number of intervals doubles on every row, resulting in an exponential growth.
Letβs consider the following example:
n = 6, X = { 2, 5, 9, 20, 39 }.
The array with sums of all preceding values will be:
XP = { 0, 2, 7, 16, 36, 75 }.
So we see that every item βx**iβ is greater than the sum of all previous items, plus 1. Constructing intervals will result in the following table:

So we see that if input values βXβ have the mentioned constraint (2), proceeding with the interval-based algorithm is not the best idea, as it will result in 2n short intervals at the last row, thus requiring O(2n) time to run. If we know in advance that this will be our case, another decision-based algorithm can be applied, which will pick a necessary subset of βXβ in linear time O(n).
Conclusion
In this article, we have observed a novel approach for solving the Subset Sum Problem, called βInterval-based algorithmβ. It is similar to the Dynamic Programming solution, with the difference that here we operate not on individual cells of the table, but on its continuous ranges.
We have observed 4 special distributions of input values, and proved that if the inputs are dense enough, the Interval-based algorithm runs in linear time. We have also shown that when the inputs are βalmost denseβ, the algorithm might still work fast enough. Similar to the Dynamic Programming solution, the Interval-based algorithm allows obtaining the exact subset, items of which sum up to the given target.
The Subset Sum Problem is one of the NP-complete problems; thus, this article highlights another special case of inputs, for which they can be solved fast enough.
I am happy that you have read till the end, and thanks for your interest!
All the used illustrations are prepared by Lilit Danielyan ( https://www.behance.net/lilitdanielyan1 ).
If you enjoyed reading, feel free to connect with me on LinkedIn, and DM to share your thoughts. I would certainly like it! ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ ).
All used images, unless otherwise noted, are designed by request of the author.