1

I have about 10000 vectors of about 100 elements (floats) and 1 target vector of also 100 elements. My goal is to approach this target vector by using some combination of these 10000 vectors. However, I can use every vector only once and the elements can be positive or negative. Anybody any thoughts on this, and if its even possible to optimize?

A small example would be:

v1 = [1.5, 3.0, -1.5] v2 = [-0.5, -1.0, 3.0] v3 = [1.0, 0.0, 0.0] target = [0.5, 2.0, 1.0] # the best combination here would be v1 and v2 

PS. I'm using Julia, but python, c(++) or thoughts on some algorithm are also very welcome

9
  • Ooh sorry, just values, 64bits floatingpoints Commented Apr 18, 2021 at 18:46
  • In other words, you have 10K parameters? That's a lot... Commented Apr 18, 2021 at 18:57
  • 2
    This seems to be a 0-1 integer linear programming problem, which is NP-hard. Commented Apr 18, 2021 at 19:01
  • 1
    Can you have coefficients (other than just 0 or 1) for each vector? Commented Apr 18, 2021 at 19:03
  • 1
    You can set this up as an ILP as mentioned above, but the rub is that you have basically 2**10,000 combinations to explore, which, without any additional problem knowledge or constraints is just wayyy too huge. Commented Apr 18, 2021 at 21:40

1 Answer 1

3

Reading the comments, It seems that the interpretation of the problem is minimize distance(sum(w[i] * v[i]) - target), for w[i] in [0, 1].

If we use the standard Euclidean this is not even a MILP (mixed integer linear programming) problem it is a mixed integer quadratic programming problem. Since you did not define distance I will use the norm1 as measure of distance.

As mentioned in the comments you have basically 2**10,100 possibilities. But fortunately MILP can use bounds to prune the search (e.g. branch and bound), and the complexity in the typical case will be much less.

I posted a solution for the problem without the constraint w[i] in [0, 1] some time ago here. And that can easily modified for the problem we are talking about.

def place_there_ilp(X, target): # some linear algebra arrangements target = np.array(target).reshape((1, -1)) ncols = target.shape[1] X = np.array(X).reshape((-1, ncols)) N_vectors = X.shape[0] # variable of the problem w = cvxpy.Variable((1, X.shape[0]), integer=True) # solve the problem with the objective of minimize the norm of w * X - T (@ is the matrix product) P = cvxpy.Problem(cvxpy.Minimize(cvxpy.atoms.norm1((w @ X) / N_vectors - target)), [w >= 0, w <=1]) # here it is solved return w.value 

Trying with the problem instance you provided

v1 = [1.5, 3.0, -1.5] v2 = [-0.5, -1.0, 3.0] v3 = [1.0, 0.0, 0.0] target = [0.5, 2.0, 1.0] place_there_ilp([v1, v2, v3], target) 

gives

array([[1., 1., 0.]]) 

That means 1 * v1 + 1 * v2 + 0 * v3

You will probably have a hard time, to run your 10000 x 100 problem but I would say that it is possible this.

With the code below I tried a 400 x 100 it runs in 1.38s, 800 x 100 runs in 9.64s

X = np.random.randn(801, 100) # 800 x 100 problem place_there_ilp(X[1:, :], X[0, :]) 
Sign up to request clarification or add additional context in comments.

5 Comments

Its a bit assuming that everyone reading your answer would know a) python and b) some un-mentioned library you seem to use (cvxpy could be anything). As such the value of your answer is asymptotically 0 for anyone not knowing the answer before reading your answer. ;)
@BitTickler OP said he was amenable to a python based solution and the answer above is a good stab at this complicated problem, and refers to another answer which goes into much greater detail on cvxpy. It adds value.
This is a nice approach. I had it coded similarly as an ILP using a solver and Manhattan Distance as a measure (which I think is equivalent to norm1), but it bogged down pretty hard above 50 x 100 ish dimensionality. I don't think Branch & Bound is contributing much by the nature of the problem. It is amazing how much more performant the cvxpy approach is here vs. solver. Likely due to efficiency of the linear algebra in that model.
Yes, I don't know the details of the solver, but they may include some preconditioning to the problem. The basic idea solving a MILP will have to determine bounds, at some point the bound will be so tight that you can conclude that you don't have to explore other solutions, but you may rank your candidates so that you try first the ones that are more promising, and if the estimates are good the branch & bound is more efficient.
@user12750353 I guess you refer to pruning techniques. Only I cannot currently see how the search tree could be pruned, given, that any additional vector added might improve the performance value of the solution. Without pruning, simple brute force search is not very efficient. And any kind of greedy approach is also not possible, I think. This is why I pointed out earlier, that having the algorithm here in the answer is much more useful compared to a library "where you also do not know the details" on how it works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.