By the way, a heuristic is not the same thing as a proven working algorithm that solves all input instances. It could either be experimental or be used to study intricacies in a problem.
The purpose of using odd prime powers is to see if it complicates the search for a counter-example. Although, I believe there likely will be a counter-example I haven't found an input that failed for my heuristic.
Exact 3 Cover: Given a list of distinct whole numbers with a length divisible by 3, and a collection C of sub-lists of S, each containing three distinct elements. Decide if there are len(S)/3 sub-lists that cover every element in S one time.
For example, S = 1,2,3 and C = [1,2,3]. The answer is yes, because len(S)/3 = 1 and there is 1 sub-list where all elements are covered no more than one time.
My heuristic transforms Exact 3 Cover into the Subset Sum Problem. I transform S into the first N odd primes raised to the exponents 5,6,7 and map out the collection of sub-lists with their new values. I then get the sums of the transformed collection of sub-lists. And the sum of the transformed S will be my target sum.
A transformed S would be a list of distinct odd prime powers and would look like the following. The largest exponent is at most 7. Also C, the collection of sub-lists will be transformed as well.
$$newS = [a^5,b^6,c^7,d^5,e^6,f^7,g^5,....]$$ $$C = [a^5,b^6,c^7],[d^5,e^6,f^7]...$$
This code snippet shows that newS follows a "sequential" order of 5,6,7 and then 5,6,7.... And, it also shows how C is transformed.
# Assign exponents 5,6,7 in sequential order primes = get_primes_up_to_N(len(S)) R = [5,6,7] new_S = [] i = 0 x = 0 while len(new_S) != len(S): new_S.append(primes[x]**R[i]) i = i + 1 x = x + 1 if i == 3: i = 0 # Create a dictionary to map elements in S to their indices in new_S index_mapping = {element: new_S[index] for index, element in enumerate(S)} # Mapping new C for SL in C: for j in range(len(SL)): # Use the dictionary to map the elem to its corresponding value in new_S SL[j] = index_mapping[SL[j]] As shown above using the heuristic is not practical because of the worse case constant is 7 in the transformation process. The sum of newS has a very large value; despite being capped by a constant of 7. Anyway, this is the first part of my code.
import sys import gmpy2 import copy import os import math import scipy.sparse as sp import numpy as np # I'm dealing with very large values in the transformation sys.set_int_max_str_digits(100000000) # Set precision for gmpy2 (optional) gmpy2.get_context().precision = 1000 # Exact 3 Cover, We're using 3-lists treated as sets S = [5,24,33,45,46,47,564,234,12] C = [[5,33,24],[24,5,33],[45,46,47],[24,46,33],[564,12,5],[47,45,5],[5,45,12],[12,45,33],[33,24,12]] The 2nd part of the code checks that the collection of sub lists follow combinatorial rules without sacrificing correctness. This shaves down the input and get rid of unnecessary sub lists.
# Making sure 3-lists follow the rules of combinations # And thus treated as sets # Remove 3lists with duplicate elements C = [SL for SL in C if len(SL) == len(set(SL))] # remove 3-lists that have elements not in S C = [SL for SL in C if all(element in S for element in SL)] # remove duplicates of 3lists removed_duplicates = [] for J in C: if J not in removed_duplicates: removed_duplicates.append(J) C = removed_duplicates # Remomve multiple permutations of a subset # This does not affect correctness as only one # permutation of size 3 is needed to form # an exact 3 covering. mPerm = [] for i in C: if [i[0],i[1],i[2]] not in mPerm: if [i[0],i[2],i[1]] not in mPerm: if [i[1],i[0],i[2]] not in mPerm: if [i[1],i[2],i[0]] not in mPerm: if [i[2],i[0],i[1]] not in mPerm: if [i[2],i[1],i[0]] not in mPerm: mPerm.append(i) C = mPerm # Making a hard copy for reduction reference C_copy = copy.deepcopy(C) This part of the code is a function that will be used in the transformation of Exact 3 Cover into Subset Sum. It will find up to len(S) odd primes.
def is_prime(n): if n <= 1: return False elif n <= 3: return True elif n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True # Get the first N distinct odd primes def get_primes_up_to_N(N): primes = [] num = 3 while len(primes) < N: if is_prime(num): primes.append(num) num += 1 return primes Here is the transformation of Exact 3 Cover into Subset Sum. I use a dictionary to map elements in S to their indices in newS. This helps me transform C.
# Assign exponents 5,6,7 in sequential order primes = get_primes_up_to_N(len(S)) R = [5,6,7] new_S = [] i = 0 x = 0 while len(new_S) != len(S): new_S.append(primes[x]**R[i]) i = i + 1 x = x + 1 if i == 3: i = 0 # Create a dictionary to map elements in S to their indices in new_S index_mapping = {element: new_S[index] for index, element in enumerate(S)} # Mapping new C for SL in C: for j in range(len(SL)): # Use the dictionary to map the elem to its corresponding value in new_S SL[j] = index_mapping[SL[j]] # Define N for Subset Sum dynamic solution N = sum(new_S) # Here we get the sums of the 3lists get_the_sums = [] for i in C: get_the_sums.append(sum(i)) Because the transformations has large values for the variable N, I quickly ran out of memory and had to create a subset table to be used on a 32GB flash drive.
# Function to write a list to a file on the flash drive def write_list_to_file(filename, data): with open(filename, 'wb') as f: # Open the file in binary mode for row in data: # Convert boolean values to bytes before writing to file row_bytes = bytearray([1 if cell else 0 for cell in row]) f.write(row_bytes) f.write(b'\n') # Add newline separator # Function to read a list from a file on the flash drive def read_list_from_file(filename): with open(filename, 'rb') as f: # Open the file in binary mode return [[byte != 0 for byte in line.strip()] for line in f] Here is the dynamic solution for subset sum.
def isSubsetSumFlashDrive(arr, n, target, filename): # Initialize a set to store the indices of subset sums that are possible subset_indices = set() subset_indices.add(0) # 0 is always possible # Perform dynamic programming and write intermediate results to the flash drive with open(filename, 'wb') as f: # Open the file in binary write mode for i in range(1, n + 1): new_indices = set() for j in subset_indices: new_indices.add(j + arr[i - 1]) subset_indices.update(new_indices) # Convert boolean values to bytes and write them to the file for j in range(target + 1): f.write(np.uint8(int(j in subset_indices)).tobytes()) # Backtrack to find the solution # backtracking does not explore all possible subsets exhaustively. # Instead, it prunes the search space # This pruning is based on the information stored in the subset table. solution = [] j = target for i in range(n, 0, -1): if j - arr[i - 1] in subset_indices: solution.append(arr[i - 1]) j -= arr[i - 1] return target in subset_indices, solution[::-1] # Return whether solution exists and the solution itself This part of the code, resets the subset table.
subset_table_file = 'F:\\SSUM\\subset_table.txt ' # Function to reset the subset table file before using the code. def reset_subset_table(filename): with open(filename, 'w') as f: pass # Simply open the file in write mode, which clears its contents reset_subset_table(subset_table_file) This is the end of the code.
Since, get_the_sums is essentially the sum of all the sub-lists in C, it has the same ordering, and we can use the indices for the get_the_sums to pull the solution from C_copy.
I reverse map the solution back into the original solution. I verify by making sure a flattened list (of len(S)/3 sub-lists) all have distinct elements. If verified, my code outputs "Solution Found" otherwise, it says "no solution found". The output is always correct whether the code finds a solution or not. So technically, my heuristic is correct.
n = len(get_the_sums) # Call isSubsetSumFlashDrive function with flash drive file path solution = isSubsetSumFlashDrive(get_the_sums, n, N, subset_table_file) cover = [] if solution[0] == True: get_i = solution[1] for i in get_i: get_index = get_the_sums.index(i) reverse_map = C_copy[get_index] cover.append(reverse_map) if len(cover) == len(S)//3: # flatten the list and check for duplicates F = [item for sublist in cover for item in sublist] if len(F) == len(set(F)): print('Solution Exists') else: print('no solution found') There's not much you could do to reduce running time transformation wise, simply because the sum of newS is too large. But perhaps, there are other parts of the code that can be optimized.
Unfortunately its worse than O(n^7) time because I'm making sure input is validated and then I'm transforming C into get_the_sums. And then to make it worse, I'm bottlenecked by I/O when using the subset table on the flash drive.
Questions
Is there a pseudo-polynomial subset sum algorithm with better space complexity?
What optimizations could be used to circumvent the I/O bottleneck and could graphics processing be used to speed things up?