## Subsets by Susets

This post explains the concept of locked sets, and the terms associated with them.  It suggests how to find them, and includes  a scratchpad version of a common computer algorithm for detecting them.

A locked set is defined with reference to a containing unit, a box or line. It is a set  n cells within a unit known to contain the true candidates of n numbers.  When a locked set is recognized, candidates of the n numbers can be removed other cells of the unit, or candidates of the 9 – n other numbers can be removed from these cells.   Clues are locked sets.  The full set of candidates of a unit are a locked set automatically, but when there is a smaller locked set, the remaining cells are also locked, hence the term “subset”.  The more specific terms  “pair”, “triple”, “quad”, and the like are substituted for “subset” or “set”.

The terms “naked” and “hidden”, or equivalent terms coined by Sudoku writers, describe the forms in which the locked sets can appear. “Naked” means without extra candidates within its cells. In our Fiendish example cells r136c2 contain only 3, 7 and 8. The hopeless candidates 378 outside of these three cells are removed.

“Hidden” means, with extra candidates within its cells.  In Master Class 40, the triple 4, 7, and 9 is obscured by hopeless candidates sharing their three cells.

In each case, one subset of the unit is naked, the other, hidden.  On the right, we have the hidden pair  14.  On the left, we have the naked 23568.

For a time I was skeptical that this naked/hidden relationship between raw subsets is universal.  Of course, after removal of hopeless candidates, two naked subsets remain.  Two naked subsets can happen.  But if you have a hidden subset, can you have two?  That is the right question.

The answer is no.  We can construct a visual pathway to the proof by adding a 1-candidate to r3c2 on the right, spoiling the naked triple and dividing c2 into two hidden subsets?  No.  The numbers  3,7 and 8 are no longer known to be in 3 cells, but 4 cells. The true 1 can be in any of 3 cells.  You can try the same experiment with the Master Class 40 SW subsets.  Let me know if it comes out any differently.

OK, now that we’re onto subsets, how do we recognize them?  Naked is usually easier, because we look at cells with fewest candidates to see if n of these candidates are confined to n cells. The naked five-cell example above is definitely not easy.  But I didn’t find it that way.  What I found, and put in my trace as a cause, was the hidden subset of a few candidates.

Computers don’t mind big sets, but the brain does mind.  The human engineered way to recognize locked sets is to follow the scarce numbers and look for confinement into few cells.  You may come out naked or hidden, and it doesn’t matter which.  In other words, its n numbers in n cells regardless of where other numbers are in the n cells.

We look for subsets among the free (unlocked) cells of a unit.  If there are m free cells we have to consider no more than m/2 numbers and cells. This is as large as the smaller of the subsets can be, and finding it will reveal the complementary subset. For odd numbers m round down, not up.

If this gets complex, and you’d like a scratchpad scorecard to be sure you are right, I have one that works for subsets and belongs in your tackle box when you go fishing.  It is actually an efficient  algorithm used by computers to compensate for their total lack of vision and neurons to back it up. The word algorithm certifies that it will not fail to identify subsets that are there, or reject unqualified wannabes.

To create a scratchpad version we define an entity I call a suset, short for Sudoku set.  This general term is adopted because susets are useful  in fishing and ALS finding as well.

A suset is a pair of related sets.  It is written as a pair of digit strings separated by a slash.  In the subset app, the left string represents a set of Sudoku numbers, and the right is a set of cells that contain all candidates of these numbers.  For set representation by strings of digits, the cells are numbered left to right for rows, top to bottom for columns, and by keypad convention for boxes.

Suppose you decided to get out your scratchpad to tackle the SW 40 box above.  You start by writing susets for the single numbers and the cells they are in:

2/123789, 3/14, 4/689, 5/123, 6/28, 7/69, 8/4679, 9/89.

But wait, there are 8 free cells, so the smaller of the two subsets can be no larger than 4, so cross off 2 as a possible member.

You probably would see that three numbers “fit” into cells 6, 8 and 9. You have the neurons and they are awesome.  But if you haven’t had your coffee, you might need to go on.  In that case, the game is to fit more numbers into less cells as we combine susets.  If the numbers catch up to the cells, we have a subset.  Ready to play? No, you already see it, you’ll have to watch me plod along.

OK. I see that and 3 is scarce but does not fit into other cell sets. But wait, 7 joins 4 with no increase in cells. Now I’m looking at

3/14, 47/689, 5/123, 6/28,  8/4679, 9/89.

I’m thinking about 69/289 when I see that 9 can fit in the 47 tent, and I get the subset suset 479/689 ! I win!!  Stop smirking.

On less obvious cases, we should order the singles and no cost combinations by increasing cell size, and try every combination that keeps the cell size within the limit.

Here are a couple of subset detection challenges from the restored but frozen sudocue.net guide:

Try to identify the subsets first by the searching technique recommended  above.  Then on the first one, try the suset subset app, whether or not you succeeded. I’ll checkpoint the app next post.

The second example is tough enough for the full scratchpad algorithm.

There are 9 free cells, so the smaller of the two complementary locked subsets has no more than four members.  As before,  we write out our starting suset list, omitting numbers in more than four cells:

2/569,  5/459, 7/56, 8/27, 9/456

Next, we combine 2 and 7 and 7 and 9, both at no increase in cells, to reach a list ordered by increasing cell size

7/56, 8/27, 27/569, 5/459, 79/456

Note that the single 7 suset is still on the list. It may combine well with other susets. But there is no need to have single 2 or 9, because with either number, you get an extra number, 7.  The more numbers, and the less cells, the better.

Now we try all combinations, discarding any over the limit of 4 cells. To cover all combinations, we try each suset in list order against the remainder of the list, adding new susets to the list each time. We can skip list susets which already include the numbers of the suset we are trying against the list.

In this case the process generates

7/56, 8/27, 27/569, 5/459, 79/456,  78/2567, 57/4569, 257/4569, 279/4569, 579/4569, 2579/4569

Bingo!  The four numbers 2579 are confined to cells 4569. We now remove 346r4c4, 346r4c5, and 346r4c6.

Wow!  That started with a human suspicion, which was confirmed by a human engineered version of a computer algorithm.  It ended with human surprise and delight, quite beyond the computing machine.

Next time we’ll  go over the basic solving of Sue de Coq’s post example, to make sure she did not miss any locked set removals that would have  wrecked it for the demonstration she had in mind. This will be good practice for the actual solving situation, where you don’t know that the units contain locked sets.