An impractical reduction: factoring->3SAT->SUBSETSUM

in steemstem •  6 years ago  (edited)

The Subset Sum problem is NP-complete, but what does a reduction from another problem actually look like? I set out to create a concrete example.

Let's start with factoring. What are the factors of 91? (Or, in decision problem form, does 91 have any factors less than some number M?)

This web tool turns factoring into a 3SAT (boolean satisfiability) problem, in conjunctive normal form, an "AND of OR": https://www.cs.indiana.edu/cgi-pub/sabry/cnf.html

Using 91 as the input produces a 3SAT instance which has 116 variables and 434 clauses. In DIMACS format, that looks like:

p cnf 116 434
1 7 -11 0
1 -7 -11 0
-1 7 -11 0
-1 -7 11 0
2 7 -12 0
2 -7 -12 0
-2 7 -12 0
-2 -7 12 0
3 7 -13 0
...

Each line is a single clause (combined with OR.) The three variables in each clause are represented by their index (1 though 116) and is given as a negative number if the variable should be negated. The zero marks end of line.

This SAT instance is satisfiable if and only if 91 can be factored (and the factors can be read out from some of the variables, explicitly.)

Now we need a reduction from SAT. The slide deck here: https://www.cs.mcgill.ca/~lyepre/pdf/assignment2-solutions/subsetSumNPCompleteness.pdf shows a simple reduction based on the following idea:

For each variable, have a number that represents "true" and a number that represent "false". For variable i, the i'th digit is a 1:

t_1 = 10000000.....
f_1 = 10000000.....
t_2 = 01000000.....
f_2 = 01000000.....

After all those digits are defined, we'll use later digits to represent the clauses. Digit j in this section will be set to 1 in t_i if variable i appears in clause j, or in f_i if variable i appears negated in clause j. For example, if the first clause is "1 -2 3" then we get

t_1 = 1000....1000...
f_1 = 1000....0000...
t_2 = 0100....0000...
f_2 = 0100....1000...
t_3 = 0010....1000...
f_3 = 0010....0000...

If we pick some of these numbers and add them together, that corresponds to an assignment of truth values to variables. We want to avoid picking both "true" and "false", and to avoid not choosing, so we want the sum of the first set of digits to be "1111....". We want every clause to be satisfied, so the sum of the second set of digits should be at least "...1111...." But all three terms in the clause might be true, so it could be as big as "...3333..."

So, to have an exact match, we add some auxiliary numbers per clause that will let us always make the sum of the clause digits equal to 3.

t_1 = 1000....1000...
f_1 = 1000....0000...
t_2 = 0100....0000...
f_2 = 0100....1000...
t_3 = 0010....1000...
t_3 = 0010....0000...
x_1 = 0000....1000...
y_1 = 0000....1000...  (=x_1)
x_2 = 0000....0100...
y_2 = 0000....0100...  (=x_2)
---   ---------------
sum = 1111....3333...

Now to solve SUBSETSUM (that is, pick a set of numbers that match the desired sum), we must use exactly one of t_i or f_i to make the "variable digits" sum to 1. We have to find at least one t_i or f_i to pick that makes each of the "clause digits" sum to at least 1, and then we can use x_j or y_j or both to get the digit sum up to exactly 3.

Each column contains at most 5 one's (because it's 3SAT, plus the two auxiliary numbers) so we can use base 6 instead of base 10 to be more compact and still avoid worrying about carries. Or we can use base 4, because it's not possible for any of the clause digits to be 3 and generate a carry.

Doing this with the 3SAT instance results in a SUBSETSUM target value of W = 4527661763497952830924504761197555928678312823105815165828398899092711621214467209188022084775295485151946522124553722119049654441916904709910750284851441684561318928061797552824158151674088805607120404932246132968538085398242167820568025790444515682326688804196711134215815381800392441542981645015241294491116954007657234558877695

and 1110 numbers in the set (=2 * 116 + 2 * 434 ) ranging from 3395746322623464623193378570898166946508734617329361374371299174319533346901383237484575835008494634841497130232591291199064221510716245345444098899398637111465429632436577732225266419533815042671537502647398260571221934004096164886962195528039504102441913149042262548604812767654233516604492490397751325389277239158172929191051264 to 1.

So, if 91 can be factored, then some subset of those 1110 numbers add up to the target value W above. Each step in the reduction is reversible (though this need not be the case in general) so that we could work our way back from which numbers were chosen, to which variables were set to true, to the actual factors found.

However, this should be an ample demonstration that just because a reduction is possible does not mean it's feasible. An earlier attempt to reduce through a different problem (3-D Matching) resulted in tens of thousands of numbers in the set, exceeding what I could easily manage in memory. A polynomial reduction can add significant overhead. This example also demonstrates that the typical dynamic-programming approach is basically completely useless for instances that are reductions from other problems.

This happy little image (from https://algorithms.tutorialhorizon.com/dynamic-programming-subset-sum-problem/)?

Try visualizing it with 200x as many rows and 10^300 times as many columns. (That's a googol, cubed.)

I was originally looking for hard instances of the "Change Making Problem" and the only reduction I learned was from SUBSETSUM: https://cstheory.stackexchange.com/questions/1000/simple-reduction-to-unbounded-knapsack/1002#1002

Performing this additional reduction would double the number of numbers, and increase their length substantially, so I don't know if I'll end up carrying that out, or think about whether we can go more directly from 3SAT.

Code

Full file, including the 3-D matching reduction and DIMACS parser: https://gist.github.com/mgritter/68eca9281cd5403ceca0e69cc016678f

# SAT -> SubsetSum reduction given here
# https://www.cs.mcgill.ca/~lyepre/pdf/assignment2-solutions/subsetSumNPCompleteness.pdf
class SubsetSum(object):
    def __init__( self ):
        self.w = []
        self.targetW = None
                  
    def zeros( self ):
        return [0] * (self.m + self.n)
                  
    def initFromSAT( self, numClauses, numVariables ):
        # Maximum column sum is 5 (three variables in a clause, plus the
        # two auxiliary columns.)
        # But, 5 doesn't produce a valid result base 4, it's 1 + a carry,
        # we only care if it's 3 + a carry
        self.d = 4
        self.n = numVariables
        self.m = numClauses

        self.tDigits = []
        self.fDigits = []
        for i in xrange( 1, numVariables + 1 ):
            # i'th digit =1
            ti = self.zeros()
            ti[i-1] = 1
            self.tDigits.append( ti )
                  
            fi = self.zeros()
            fi[i-1] = 1
            self.fDigits.append( fi )

        self.xDigits = []
        self.yDigits = []
        for j in xrange( 1, numClauses + 1 ):
            # n + j'th digit  = 1
            xj = self.zeros()
            xj[ self.n + j - 1 ] = 1
            # these won't be modified so they can be shared
            self.xDigits.append( xj )
            self.yDigits.append( xj )
        self.j = 1
                
    def addClause( self, c ):
        # Handling clause j, marking the appropriate variables
        j = self.j
        for v in c:
            # Variable v is at offset v-1
            # it's negated if we want it false
            if v < 0:
                self.fDigits[-v - 1][self.n + j-1] = 1
            else:
                self.tDigits[v-1][self.n + j-1] = 1
                  
        self.j += 1

    def digitsToNumber( self, digits ):
        base = self.d
        ret = 0
        for (p, x) in enumerate( reversed( digits ) ):
            ret += x * ( base ** p )
        return ret        
                  
    def finishSAT( self ):
        for i in xrange( self.n ):
            self.w.append( self.digitsToNumber( self.tDigits[i] ) )
            self.w.append( self.digitsToNumber( self.fDigits[i] ) )

        for j in xrange( self.m ):
            self.w.append( self.digitsToNumber( self.xDigits[j] ) )
            self.w.append( self.digitsToNumber( self.yDigits[j] ) )

        targetDigits = [1] * self.n + [3] * self.m
        self.targetW = self.digitsToNumber( targetDigits )

    def dump( self ):
        numDigits = int( math.log( max( self.w ), 10 ) ) + 1
        fmtString = " {:>" + str( numDigits ) + "}"
        
        print "W =", self.targetW
        print "|w| =", len( self.w )
        print "w = {"
        for x in self.w:
            print fmtString.format( x )
        print "}"

def SAT_to_SUBSETSUM( sat ):
    ssum = SubsetSum()
    ssum.initFromSAT( len( sat.clauses ), sat.numVariables )
    for c in sat.clauses:
        ssum.addClause( c )
    ssum.finishSAT()
    return ssum

SUBSET-SUM instance

Full output: https://gist.github.com/mgritter/4d6ee650c892eb8218d37dd1b8748acb which also has the input 3SAT instance.

(Embedding it here exceeds the 65KB limit on Steem posts.)

w = {
 3395746322623464623193378570898166946508734617329361374371299174319533346901383237484575835008494634841497130232591291199064221510716245345444098899398637111465429632436577732225266419533815042671537502647398260571221934004096164886962195528039504102441913149042262548604812767654233516604492490397751325389277239158172929191051264
 3395746322623464623193378570898166946508734617329361374371299174319532770324090785281549759189802928497572851019344936201903715143615452046079837824056312176158565172200658842275687935009460486440448553413795948280888504259348968872627782384300065970646938259077688046419414236432088686135627938955449377790141971732258593873854464
  848936580655866155798344642724541736627183654332340343592824793579883185373806540667841078921477656473807990845000636690555618433863549172784106155429572317511334094046024288642830357631384977780200741523975605848304550128495612701718288724583415147789706242318936565794959304396108263693566231788170829827807208979787586109177856
  848936580655866155798344642724541736627183654332340343592824793579883183121551492026431009808035356984607819522764534918650276476915393501296402045425322609017452335921219701543079093299110822679631911219831864614315817729054215435655554336235862687797936311960570194645088316138181539317809679063158156852565986632454920581677056
  212234145163966538949586160681135434156795913583085085898206198394970795752234684898590126587582089136642317938838240603631801819973196303682680479613436199199872640052123452345319148616729819765073431500456567047800853112862720137081535721564769274890739545968941678970107451044620015472886129548355935873872048682168032851656704
  212234145163966538949586160681135434156795913583085085898206198394970795743436813614834618505584631434788849894915388819099625194948055188479516382653583347243240528764093511647307995775332929127409430687424382752686749156059729919017136305335147889945941139576968019756357555593644944176295830490871615989199012060440051165691904
   53058536290991634737396540170283858539198978395771271474551549598742698935749230012661710775371189607101887819378584922194448598883653307047916008465834435730472356390094763162007340497042646621371883090438940683035591891515801672816911244362163251709455330927050626973618025934547399717257954168454453698482334986445829613551616
   53058536290991634737396540170283858539198978395771271474551549598742698935714863327959540821925887066381044244645661252980312432730160936565331963698550548431466644464589871666556846027761994261858827997813895620739523452346190082163442033456972655798158437671417955391032586596748260784638733984053245855425648191104001628962816
   13264634072747908684349135042570964634799744598942817868637887399685674733928286248431108081063405484836248709135898455436213295169606209585622714476583569052438854103204523181340867435017962669717540530379476751600774115959996595424687435395736920154156941737300610521669902611061407579643974596889597427254513533383148824428544
   13264634072747908684349135042570964634799744598942817868637887399685674733928152003568990229682759771786559608304618577300553179773083539317810722947145447115355458929002519941066651026362829878863117609314082753987498122029112906617139234883317920651214178730700610958899733907151100049305276374436810464021032659449231121055744
    3316158518186977171087283760642741158699936149735704467159471849921418683482036322831471084278431871533518788315655919908971302045252776991112984207638122279944692074172411245081165630103167911595105588706611156285718659047545478806254557276195037479963907494999180745386726664251199503928280754583626681004217254972499978878976
    3316158518186977171087283760642741158699936149735704467159471849921418683482035798437478436421476224216918440265634725953258168584074086366621887336067126154891050891523835013375784554799999797476136930835504561520969851135874872276164977372533280627892586686467279496910511665005640487889899537694279323767169969338977328562176
     829039629546744292771820940160685289674984037433926116789867962480879484334927653494847240225487929487778814558790414964054466902243240505963240125773546327901275466328514985139321376440778906624964382136175060637641410063090746159017944184994258610949560786137670066223629098752352141757195262733215292000625347506751376719872
     829039629546744292771820940160685289674984037433926116789867962480485874236613619025814108787186828821159406838878849932803109166809967687426275196061335284454220393959740608396472684693905087256529661842312038010032839504853342509909299554802782809816358690140383656620675524386076139218175213789864354241037615057327060156416
     207259907386686073192955235040171322418746009358481529197466990620088667717627237074596267570896623293225810844221849396007939254159590744658616778198004686908373043730640024424298217551024597197214618666781202449472130207665621005747010729625469941669730106687140996575377047513847605896729166543384424122773870084658439389184
...
Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  




This post has been voted on by the SteemSTEM curation team and voting trail in collaboration with @curie.

If you appreciate the work we are doing then consider voting both projects for witness by selecting stem.witness and curie!

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!