Using odd prime powers, my heuristic tries to solve instances of Exact 3 Cover in polytime, what can be done to optimize it?

Question

By the way, a heuristic is not the same thing as a proven working algorithm that solves all input instances. It could either be experimental or be used to study intricacies in a problem.

The purpose of using odd prime powers is to see if it complicates the search for a counter-example. Although, I believe there likely will be a counter-example I haven't found an input that failed for my heuristic.

Exact 3 Cover: Given a list of distinct whole numbers with a length divisible by 3, and a collection C of sub-lists of S, each containing three distinct elements. Decide if there are len(S)/3 sub-lists that cover every element in S one time.

For example, S = 1,2,3 and C = [1,2,3]. The answer is yes, because len(S)/3 = 1 and there is 1 sub-list where all elements are covered no more than one time.

My heuristic transforms Exact 3 Cover into the Subset Sum Problem. I transform S into the first N odd primes raised to the exponents 5,6,7 and map out the collection of sub-lists with their new values. I then get the sums of the transformed collection of sub-lists. And the sum of the transformed S will be my target sum.

A transformed S would be a list of distinct odd prime powers and would look like the following. The largest exponent is at most 7. Also C, the collection of sub-lists will be transformed as well.

$$newS = [a^5,b^6,c^7,d^5,e^6,f^7,g^5,....]$$ $$C = [a^5,b^6,c^7],[d^5,e^6,f^7]...$$

This code snippet shows that newS follows a "sequential" order of 5,6,7 and then 5,6,7.... And, it also shows how C is transformed.

# Assign exponents 5,6,7 in sequential order primes = get_primes_up_to_N(len(S)) R = [5,6,7] new_S = [] i = 0 x = 0 while len(new_S) != len(S): new_S.append(primes[x]**R[i]) i = i + 1 x = x + 1 if i == 3: i = 0 # Create a dictionary to map elements in S to their indices in new_S index_mapping = {element: new_S[index] for index, element in enumerate(S)} # Mapping new C for SL in C: for j in range(len(SL)): # Use the dictionary to map the elem to its corresponding value in new_S SL[j] = index_mapping[SL[j]]

As shown above using the heuristic is not practical because of the worse case constant is 7 in the transformation process. The sum of newS has a very large value; despite being capped by a constant of 7. Anyway, this is the first part of my code.

import sys import gmpy2 import copy import os import math import scipy.sparse as sp import numpy as np # I'm dealing with very large values in the transformation sys.set_int_max_str_digits(100000000) # Set precision for gmpy2 (optional) gmpy2.get_context().precision = 1000 # Exact 3 Cover, We're using 3-lists treated as sets S = [5,24,33,45,46,47,564,234,12] C = [[5,33,24],[24,5,33],[45,46,47],[24,46,33],[564,12,5],[47,45,5],[5,45,12],[12,45,33],[33,24,12]]

The 2nd part of the code checks that the collection of sub lists follow combinatorial rules without sacrificing correctness. This shaves down the input and get rid of unnecessary sub lists.

# Making sure 3-lists follow the rules of combinations # And thus treated as sets # Remove 3lists with duplicate elements C = [SL for SL in C if len(SL) == len(set(SL))] # remove 3-lists that have elements not in S C = [SL for SL in C if all(element in S for element in SL)] # remove duplicates of 3lists removed_duplicates = [] for J in C: if J not in removed_duplicates: removed_duplicates.append(J) C = removed_duplicates # Remomve multiple permutations of a subset # This does not affect correctness as only one # permutation of size 3 is needed to form # an exact 3 covering. mPerm = [] for i in C: if [i[0],i[1],i[2]] not in mPerm: if [i[0],i[2],i[1]] not in mPerm: if [i[1],i[0],i[2]] not in mPerm: if [i[1],i[2],i[0]] not in mPerm: if [i[2],i[0],i[1]] not in mPerm: if [i[2],i[1],i[0]] not in mPerm: mPerm.append(i) C = mPerm # Making a hard copy for reduction reference C_copy = copy.deepcopy(C)

This part of the code is a function that will be used in the transformation of Exact 3 Cover into Subset Sum. It will find up to len(S) odd primes.

def is_prime(n): if n <= 1: return False elif n <= 3: return True elif n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True # Get the first N distinct odd primes def get_primes_up_to_N(N): primes = [] num = 3 while len(primes) < N: if is_prime(num): primes.append(num) num += 1 return primes

Here is the transformation of Exact 3 Cover into Subset Sum. I use a dictionary to map elements in S to their indices in newS. This helps me transform C.

# Assign exponents 5,6,7 in sequential order primes = get_primes_up_to_N(len(S)) R = [5,6,7] new_S = [] i = 0 x = 0 while len(new_S) != len(S): new_S.append(primes[x]**R[i]) i = i + 1 x = x + 1 if i == 3: i = 0 # Create a dictionary to map elements in S to their indices in new_S index_mapping = {element: new_S[index] for index, element in enumerate(S)} # Mapping new C for SL in C: for j in range(len(SL)): # Use the dictionary to map the elem to its corresponding value in new_S SL[j] = index_mapping[SL[j]] # Define N for Subset Sum dynamic solution N = sum(new_S) # Here we get the sums of the 3lists get_the_sums = [] for i in C: get_the_sums.append(sum(i))

Because the transformations has large values for the variable N, I quickly ran out of memory and had to create a subset table to be used on a 32GB flash drive.

# Function to write a list to a file on the flash drive def write_list_to_file(filename, data): with open(filename, 'wb') as f: # Open the file in binary mode for row in data: # Convert boolean values to bytes before writing to file row_bytes = bytearray([1 if cell else 0 for cell in row]) f.write(row_bytes) f.write(b'\n') # Add newline separator # Function to read a list from a file on the flash drive def read_list_from_file(filename): with open(filename, 'rb') as f: # Open the file in binary mode return [[byte != 0 for byte in line.strip()] for line in f]

Here is the dynamic solution for subset sum.

def isSubsetSumFlashDrive(arr, n, target, filename): # Initialize a set to store the indices of subset sums that are possible subset_indices = set() subset_indices.add(0) # 0 is always possible # Perform dynamic programming and write intermediate results to the flash drive with open(filename, 'wb') as f: # Open the file in binary write mode for i in range(1, n + 1): new_indices = set() for j in subset_indices: new_indices.add(j + arr[i - 1]) subset_indices.update(new_indices) # Convert boolean values to bytes and write them to the file for j in range(target + 1): f.write(np.uint8(int(j in subset_indices)).tobytes()) # Backtrack to find the solution # backtracking does not explore all possible subsets exhaustively. # Instead, it prunes the search space # This pruning is based on the information stored in the subset table. solution = [] j = target for i in range(n, 0, -1): if j - arr[i - 1] in subset_indices: solution.append(arr[i - 1]) j -= arr[i - 1] return target in subset_indices, solution[::-1] # Return whether solution exists and the solution itself

This part of the code, resets the subset table.

subset_table_file = 'F:\\SSUM\\subset_table.txt ' # Function to reset the subset table file before using the code. def reset_subset_table(filename): with open(filename, 'w') as f: pass # Simply open the file in write mode, which clears its contents reset_subset_table(subset_table_file)

This is the end of the code.

Since, get_the_sums is essentially the sum of all the sub-lists in C, it has the same ordering, and we can use the indices for the get_the_sums to pull the solution from C_copy.

I reverse map the solution back into the original solution. I verify by making sure a flattened list (of len(S)/3 sub-lists) all have distinct elements. If verified, my code outputs "Solution Found" otherwise, it says "no solution found". The output is always correct whether the code finds a solution or not. So technically, my heuristic is correct.

n = len(get_the_sums) # Call isSubsetSumFlashDrive function with flash drive file path solution = isSubsetSumFlashDrive(get_the_sums, n, N, subset_table_file) cover = [] if solution[0] == True: get_i = solution[1] for i in get_i: get_index = get_the_sums.index(i) reverse_map = C_copy[get_index] cover.append(reverse_map) if len(cover) == len(S)//3: # flatten the list and check for duplicates F = [item for sublist in cover for item in sublist] if len(F) == len(set(F)): print('Solution Exists') else: print('no solution found')

There's not much you could do to reduce running time transformation wise, simply because the sum of newS is too large. But perhaps, there are other parts of the code that can be optimized.

Unfortunately its worse than O(n^7) time because I'm making sure input is validated and then I'm transforming C into get_the_sums. And then to make it worse, I'm bottlenecked by I/O when using the subset table on the flash drive.

Questions

Is there a pseudo-polynomial subset sum algorithm with better space complexity?

What optimizations could be used to circumvent the I/O bottleneck and could graphics processing be used to speed things up?

You included no profile measurements, which is especially important for this submission since internet participants are unlikely to have an I/O subsystem whose performance characteristics are identical to what you're using. Please show us an automated test suite which exercises this target code in a way that you care about and wish to optimize. — J_H
– J_H, Commented May 6, 2024 at 1:47

J_H · Accepted Answer · 2024-05-06 19:40:56Z

This submission suffers from a lack of clarity. It definitely contains code, but the motivation for running it and the manner in which it would be called is unclear.

top level vs. function

This code snippet shows that new_S follows ...

You cited one reference, wikipedia, and I thank you for that. So S has the "multi-set" definition we see there. It wouldn't hurt to introduce S before using it in "... a collection C of sub-lists of S ...", or at least offer the citation before that. And I was a little surprised that we're not talking about "sub-sets of S", that is, smaller multi-sets. The terminology continues in "Decide ... sub-lists that cover...", and in subsequent paragraphs. Maybe we're switching from math-speak to programmer-speak?

I transform S into the first N odd primes raised ...

... a list of distinct odd prime powers ...

Sorry, you lost me there. It's unclear how that representation could capture a multi-set.

In any event, I recommend you invent a new name for this new concept, as new_S or \$S'\$ seem to not be of the same type as \$S\$.

# Assign exponents 5,6,7 in sequential order

Ok, good. But we seem to have a bunch of top-level module globals that we're reading and writing, without much clarity on what the pre- and post- conditions are. I understand that this snippet has accomplished something important. I'm unclear on what exactly that is, for example, is the final value of x significant to our computations, or would it typically be a local variable that goes out of scope once we have computed new_S? When you choose not to write down a function signature with its docstring, you're leaving a lot of expressive power on the table unused.

# Mapping new C

Similarly, we had an opportunity here to explain that we are computing an SL value. And I still don't know if the final j value is significant or something that should go out-of-scope and disappear.

The fact that you haven't shown us the type or initialization of C isn't helping with the exposition.

problem motivation

We eventually see S and C assigned, though it's unclear why those integers are of interest, or whether they were perhaps drawn from a PRNG.

Often there is some business use case that depends on obtaining an efficient theoretical result. Feel free to share such details with the reader, to motivate the problem being studied and to reveal why people care about this or that aspect.

defined term

# Making sure 3-lists follow the rules of combinations

This appears to be a term of art in this field of study. But the cited reference makes no mention of it. From the code it seems very clear that we have entirely abandoned the notion of multi-sets. (I was surprised not to see a Counter to represent \$S\$ in any of the preceding code.)

appropriate datastructure

C = [SL for SL in C if all(element in S for element in SL)]

That in operator looks expensive. Are we ready to move from list to set representation at this point? If we're worried about long lists, then testing membership in a set would be cheaper.

mPerm = [] for i in C: if [i[0],i[1],i[2]] not in mPerm: if [i[0],i[2],i[1]] not in mPerm: if [i[1],i[0],i[2]] not in mPerm: if [i[1],i[2],i[0]] not in mPerm: if [i[2],i[0],i[1]] not in mPerm: if [i[2],i[1],i[0]] not in mPerm: mPerm.append(i)

Ok, now we're getting crazy. Thank you for explaining I should pronounce that "multiple permutations", I really do appreciate it.

First define half a dozen 3-tuples of small integer indexes, and iterate over that rather than having half a dozen ifs.

I can't imagine that list is appropriate here, given all those in tests. Surely we wanted mPerm = set(), right?

Although, given the absence of profiling measurements, it's unclear from reading the submission whether we're spending much time here or not.

standard English terms

# Making a hard copy for reduction reference

Most English speakers would interpret that as making a paper printout we could refer to later. If "hard" is some term of art, please offer a definition or citation. Maybe "deep" was intended here?

re-inventing the wheel

The prime predicate and generator are very nice and conventional. (Well, except for capitalizing the scalar N against both math and python norms.)

But they're not doing anything new or interesting, they're not part of this submission's contribution, and we see no motivation for re-implementing them. Consider calling out to an existing implementation which is well documented and known to produce the desired results. Then you and your colleagues would have just a little less code to maintain.

repeated code

We see again the # ... 5,6,7 and Mapping new C snippets. It is perfectly fine to discuss some code, move on to other topics, and return to that same piece of code. But rather than pasting it in a few times, consider using GitHub URLs which refer to a line of code or a range of lines, in a particular frozen SHA1 commit.

list comprehension

# Here we get the sums of the 3lists get_the_sums = [] for i in C: get_the_sums.append(sum(i))

Use verbs for function names, and nouns when naming datastructures.

Consider rephrasing in a way that is simple enough we don't need a comment, one of these:

sums = [sum(i) for i in C]

sums = list(map(sum, C))

pointer storage

large values for the variable N, I quickly ran out of memory ...

Think about this list: [1, 2, 3]. It stores half a dozen 64-bit quantities: three ints and three object pointers. Assume our integer values fit within 64 bits. Putting lots of them in an array or ndarray will cut memory consumption roughly in half, due to ditching all those pointers. And since it's more cache friendly, we can scan through it more rapidly.

 row_bytes = bytearray([1 if cell else 0 for cell in row])

Consider using map(bool, row) so that goes a little faster. Or give the task to numpy.

Consider using a bitarray to save time and space.

docstrings

Sometimes you present your code at module top-level, and sometimes nicely packaged up as a function, which is a big improvement. For one thing it reduces coupling. For another it helps the Gentle Reader to reason about the code when the gazintas and gozoutas are clearly called out and the promises are identified.

Consider adding optional type annotations to each signature, for several reasons.

As an aid to the Gentle Reader.
So mypy linting can help you to notice mistakes or inconsistencies.
To set you up for a @beartype decorator that performs checking at runtime.
To set you up for numba JITting.

Once you have written a def signature, the next thing to do which would significantly improve the maintainability of this codebase is to add a """docstring""".

It's not hard. Often a single English sentence suffices. Just tell us what the function's single responsibility is. We want to know about the pre- and post- condition promises that matter for this function. Having read the signature and docstring, a newly hired maintenance engineer should be able to write a unit test for the function, and should be able to identify whether a returned result is correct or incorrect.

test suite

This codebase would benefit from the addition of unit tests.

Profiling operates on some example workload. Often the automated integration and unit tests will be a convenient way to accomplish that.

This codebase does not appear to achieve its design goals.

I would not be willing to delegate or accept maintenance tasks on it.

The T · Accepted Answer · 2024-05-06 22:12:28Z

Using sub-lists isn't as efficient as using subsets. So this optimized part of the code will use subsets to achieve the goal of ensuring input is valid, and that it follows the combinatorial rules I desire.

I was able to achieve better than O(N^2) time with this optimization at least for input validation.

I create a dictionary for S so that all(element in S_dict for element in SL)is O(1) rather than O(N). Before executing all(element in S_dict for element in SL), I make sure len(SL) == 3 so that it always remains constant time.

To disregard multiple permutations of a subset, I sort SL in ascending order as virtually {3,2,1}, {1,3,2} is all the same {1,2,3} when in ascending order. This is much better than checking all 6 possible permutations. I use a dictionary to keep track of multiple permutations. This also accounts for duplicate subsets so {1,2,3} won't be in there more than once.

Since this is all done with one loop of C, its better than O(N^2) time.

S = [5,24,33,564,234,12] S_dict = {element: True for element in S} C = {5,33,24},{24,5,33},{45,46,47},{24,46,33} valid_subsets_dict = {} valid_subsets = [] for SL in C: # Make sure that subsets have 3 elements if len(SL) == 3: # Make sure only subsets with elements in S are considered if all(element in S_dict for element in SL): # Remove multiple permutations of subsets and only # allows one occurence of a subset # Does not affect the correctness of deciding if a solution exists # because you only need one permutation to form a solution. if tuple(sorted(SL)) not in valid_subsets_dict: valid_subsets_dict[tuple(sorted(SL))] = True # Since sets aren't subscribtable I have to convert them to a list. # I have to touch the subsets to perform the reduction. valid_subsets.append(list(SL))

There's no need to keep looping through C multiple times that adds to the complexity of the heuristic.

I've noticed I didn't need to loop through C to get the sums, instead I can do it while transforming C.

# Create a dictionary to map elements in S to their indices in new_S index_mapping = {element: new_S[index] for index, element in enumerate(S)} # Define N for Subset Sum dynamic solution N = sum(new_S) # sums from the collection of transformed subsets sums = [] # Mapping new C for SL in C: for j in range(len(SL)): # Use the dictionary to map the elem to its corresponding value in new_S SL[j] = index_mapping[SL[j]] sums.append(sum(SL))

I like it, looks like an improvement! (Though I do recommend you embrace the whole def function aspect of programming, perhaps with type annotations + docstring.) — J_H
– J_H, Commented May 6, 2024 at 23:04

Stack Exchange Network

Using odd prime powers, my heuristic tries to solve instances of Exact 3 Cover in polytime, what can be done to optimize it?

2 Answers 2

top level vs. function

problem motivation

defined term

appropriate datastructure

standard English terms

re-inventing the wheel

repeated code

list comprehension

pointer storage

docstrings

test suite

You must log in to answer this question.

Hot Network Questions

Using odd prime powers, my heuristic tries to solve instances of Exact 3 Cover in polytime, what can be done to optimize it?

2 Answers 2

top level vs. function

problem motivation

defined term

appropriate datastructure

standard English terms

re-inventing the wheel

repeated code

list comprehension

pointer storage

docstrings

test suite

You must log in to answer this question.

Related

Hot Network Questions