# Pattern Analysis

In Sudoku, a pattern is a set of candidates of the same value that includes a single candidate for every row, column and box. Every unit not occupied by a clue of that value, that is. No candidate of a pattern can see another. Patterns are called “templates” by some writers.

Normally, as advanced solving begins, each value has more than a few patterns on the grid. Among them, there is one pattern for each value, the solution pattern, that combines with the solution patterns of the other values without conflict.

In a short chapter on patterns in Andrew Stuart’s The Logic of Sudoku, Andrew introduces a visual technique for marking patterns in his Figure 32.3. He calls these connected line segments “overlaid lines”. Here we name them for what they are in graphical software, freeforms.

Logic tells us the value 7 has exactly two patterns. The candidates outside of these patterns can be removed. I call them orphans. The three candidates  in both patterns are confirmed as clues.

Two Types of Pattern Analysis

Patterns gained the most attention from a proposed computer algorithm, the Pattern Overlay Method, or POM, for solving any Sudoku with completed candidates. The idea is to overlay a pattern of one value with patterns of other values. If a pattern conflicts with every pattern of another value, it cannot be the solution pattern.  If a single pattern of one value overlays a single pattern of another value, these two patterns are solution patterns, and that is likely enough to collapse the puzzle. There are ways to proceed when several patterns of each value overlay without conflict, but the requirement to find all patterns of all values is in itself enough to disqualify POM as a human solving method.

However, in many puzzles, the number of patterns of some values are manageably small, especially after advanced methods have made many eliminations.  It then becomes practical to use a limited form of pattern overlay as a solving method.  These methods rely on the POM principle of eliminating patterns based on conflicts with patterns of other values. Let’s call them Limited Pattern Overlay, or LPO methods.

Even though Andrew’s chapter on pattern analysis describes LPO, and is entitled Pattern Overlay Method, his constructed Figure 32.3 illustrates a quite different pattern analysis. It is not based on POM conflict between values for solution cells. Rather, the conflict is between patterns of a single value. It’s all on the X-panel, similar to X-chains and fish. To distinguish this form of analysis from POM and LPO, let’s call it X-Pattern Analysis, or XPA.

Enumerating Patterns with Freeforms

Both types of pattern analysis depend on the ability to discover all patterns of a value X, given the X-panel of its remaining candidates. Freeforms enable humans to do this, based on these easily verified facts:

A freeform drawn from one side to the opposite side, crossing one candidate in each line having and one candidate from each box having a candidate, defines a pattern.

Every pattern of n candidates defines a set of n vertical freeforms crossing the puzzle top to bottom, and another set of n horizontal freeforms crossing the puzzle left to right.

Andrew  challenges the Logic reader to verify Figure 3.23 above, by searching for additional patterns. He suggests, “starting at the top row and working downwards, tracing a valid route that puts a number in each box and column”. In effect, that is enumerating all 7-patterns on a 7-panel with freeforms.

Let’s do that systematically, as a way of exploring how enumeration is done.

We start with a fresh 7-panel and start patterns from r1c1, to see if we can duplicate Andrew’s pair and find any more.  As Andrew suggests, we go North to South. Also we’ll go West before East.

Our first freeform by these rules cannot provide a candidate for r3. The freeform has already selected a candidate in the box of each r3 candidate. The second also fails, this time on r7, because candidates for c2, c4 and c8 are already selected.

Now we take the last alternative, a well known maze exploration strategy, and get through to the blue pattern above.

Finding all patterns requires a maze route enumeration strategy, so we go back to the last untried alternative, at r2. But now, instead of the blind maze strategy, we  see that moving the freeform in r2 from c8 to c9 will require us to keep c3 free until we get to r9. To do that, we must switch  r5 to c2, and to do that, r7 to c4, and then, r7 to c8.  That’s the green pattern above.

This line interchange technique does not completely work unless all alternatives are between two candidates, but it is helpful in limiting the search.

Having found exactly two patterns containing r1c1, we continue with Andrew’s challenge to find more freeforms starting from r1. You might want to indulge your systematic side by copying some 7-panels and trying it yourself, before checking below. Start with r1c9, so the freeforms immediately below won’t be in your eye’s mind.

From c3, one freeform gets a far as r7.  The dashed one is doomed at r2 when it takes away the only remaining ending column in r9.

From c4, three freeforms get to r3, r7 and r9.

From c9, there are two ways to get to r5c3 and two ways to fail from there. The failure occurs for a different reason, r8 is the last row for two columns, c1 and c5, to get a candidate. Yes also, both paths take the remaining candidate for r9.

Having covered all starting points in r1, we have verified that Figure 32.3 has only the two 7 patterns that Andrew’s freeforms identify. But this experience suggests that r1 might not be the best of the four choices available for a starting line.

So what is the best choice for a freeform starting line?  Remembering that every pattern must be found, we should be guided toward the side with starting cells easiest to disprove. That would be the one with the fewest cells, and most constrained freeforms in the first two or three lines, rows or columns. North to South has four possible starting cells and two to three second cells in the second line.

Compare West to East, with its two starting cells with each having three second paths. It’s not surprising it has 8 failures, including three more from r8c1, along with the two patterns. Do note that the patterns are the same, although the freeforms are quite different from the North to South analysis.

Actually, the best freeform starting line is c9.  The S-> N r9 offers  six routes out of the first three lines, while E->W c9 offers only three.

This prediction is confirmed below, where it pays to decide on the best order of starting cells.  r1c9 is first because it leaves only one exit, r8c1. r9c9 is next because leaves only one choice for the first three columns. r1c4 is not an alternative because r8c5 has already forced the exit to be r1.  Last choice r2c9 finds the green pattern, and has only one alternative.

The innate ability we have to solve a simple maze converts Andrew Stuart’s freeform representation of patterns into a sharp tool for uman solving.

Next we illustrate its use in some XPA and LPO examples from Sysudoku reviews.

X-Pattern Analysis

A good first step in pattern analysis is to return to the X-panels and look along the sides for restricted numbers of starting cells.

In a review of “The World’s Hardest Sudoku”, the 5-panel of puzzle 36 reveals that candidate 5r1c9 is an orphan. A sashimi swordfish does the same job, if you spot it.

Next, in the same post, the remaining candidate in the same post is removed by a 9-chain,

but is also orphaned by the N->S freeforms.

In a second XPA example, GM 95 in Xaq Pitkow’s Hard to Extreme Sudoku, the 6-panel is influenced by a coloring cluster already applied. All of the panel coloring can be derived from r9c9 itself being blue.

Here c9 is again the natural choice for starting cells and East to West freeforms. Starting with blue r9c9 generates a  remarkable four patterns without tying down blue pattern in four columns.

However, the green freeform starting in r5c9 has but one exit, and adding r1c1 leaves c3 without a candidate. Blue is confirmed and GM 95 collapses.

Limited Pattern Overlay (LPO)

LPO uses cell conflicts in overlaid patterns of a set of values to make candidate removals. By limiting overlays to values with the fewest patterns, the POM process can sometimes be kept to a human scale of operations. In Andrew Stuart’s examples in The Logic of Sudoku, patterns are limited on the grid by prior removals, but it is also possible to do a conflict analysis when puzzles are unbalanced across values, having a monster fog in some panels, but much fewer candidates in several with possible conflicts. Such a case is illustrated here, with an example from the French magazine Su-doku Maestro, Niveau 8-9 of July-September 2009, # 22.

The line marked grid carries the marking of a swordfish found on the fourth line, but left on for the marking of later rows.

Lacking progress anywhere else, easy freeform starting lines make 22 attractive for LPO, which seeks a few values with a small number of patterns.

Here is the result of a freeform analysis of the X-panel.

Panels 1, 5, 7, and 9 are taken for an initial overlay in search of pattern and candidate eliminations. Panels 1 and 9 have 6 patterns, panel 5, 4 patterns, and panel 7, 2 patterns One emumerated set of patterns leaves an orphan. In panel 4, 4r3c3 fits in no pattern. When that removal is indecisive, the next step in LPO is to overlay the accepted patterns.

While freeforms are useful in enumerating patterns, overlay is easier with another form of pattern representation, in which a letter is assigned each pattern, and appears in each of the cells of the pattern.  Andrew uses this form in Logic pattern analysis examples. In Sysudoku, it’s

called lettering. Here are the four selected pattern sets in the lettering representation:

A decisive result of pattern overlay is to have a pattern of one value conflict all patterns of another value. The best chance for this to happen is to match the two smallest sets of patterns, in this case, the 7-patterns with the four 5-patterns.  Pattern 7b conflicts with 5a and 5c in r5c1, and with 5b and 5d in r7c5. That is all of the 5 patterns, therefore 7b cannot be true.

Confirming a pattern is a big step forward. Once patterns of a value have been enumerated, every removal of a candidate of that value has a direct and easily determined effect on its patterns.

To illustrate, let’s do the regular follow up marking of the 7a clues, marking the effect of each removal on the freefrom panels, then look at the panels again.

The new  7a clues NW7, W7 and SE7 bring two naked triples, a finned swordfish and a 2-chain ANL.

NE8 and NW2 erase the finned swordfish, and a long, but easy XY-chain ANL starts the collapse.

The candidate removals do produce freeform removals and orphans, but in this case, the expanding bv field is an easier route.

This is often the case. LPO conflict resolution is exacting work, more difficult than regular methods, and is best used when they are tried and fail.

Here is a before-and-after trace.

This was the easiest of overlays, but it illustrates how the brain based pattern analysis tools, freeforms and lettering, can bypass the blind back tracking search of computer algorithmic pattern overlay.

This page will shortly continue with examples of partial X-pattern analysis and X-pattern grouping.