Chapter 1 – Opening Disaster

(Collects serial chapters #1 Opening Disaster, #2 Crash Diet, and #3 Clarity and Collaboration.)

13 February 2002

Dear Diary,

Today was a disaster — I really messed it up. I wanted so much to impress the journeymen here, but all I did was make a mess.

It was my first day on the job as an apprentice to Mr. C. I was lucky to get this apprenticeship. Mr. C is a well recognized master of software development. The competition for this position was fierce. Mr. C’s apprentices often become journeymen in high demand. It means something to have worked with Mr. C.

I thought I was going to meet him today, but instead a journeyman named Jerry took me aside. He told me that Mr. C always puts his apprentices through an orientation during their first few days. He said this orientation was to introduce apprentices to the practices that Mr. C insists we use, and to the level of quality he expects from our code.

This excited me greatly. It was an opportunity to show them how good a programmer I am. So I told Jerry I couldn’t wait to start. He responded by asking me to write a simple program for him. He wanted me to use the Sieve of Eratosthenes to calculate prime numbers. He told me to have the program, including all unit tests, ready for review just after lunch.

This was great! I had almost four hours to whip together a simple program like the Sieve. I was determined to do a really impressive job. Commit 1 shows what I wrote. I made sure it was well commented, and neatly formatted.

Commit 1, GeneratePrimes.java
/**
 * This class Generates prime numbers up to a user specified
 * maximum. The algorithm used is the Sieve of Eratosthenes.
 * <p>
 * Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
 * d. c. 194, Alexandria. The first man to calculate the
 * circumference of the Earth. Also known for working on
 * calendars with leap years and ran the library at Alexandria.
 * <p>
 * The algorithm is quite simple. Given an array of integers
 * starting at 2. Cross out all multiples of 2. Find the next
 * uncrossed integer, and cross out all of its multiples.
 * Repeat untilyou have passed the square root of the maximum
 * value.
 * 
 * @author Alphonse
 * @version 13 Feb 2002 atp
 */
import java.util.*;

public class GeneratePrimes {
  /**
   * @param maxValue is the generation limit.
   */

  public static int[] generatePrimes(int maxValue) {
    if (maxValue >= 2) // the only valid case
    {
      // declarations
      int s = maxValue + 1; // size of array
      boolean[] f = new boolean[s];
      int i;

      // initialize array to true.
      for (i = 0; i < s; i++)
        f[i] = true;

      // get rid of known non-primes
      f[0] = f[1] = false;

      // sieve
      int j;
      for (i = 2; i < Math.sqrt(s) + 1; i++) {
        if (f[i]) // if i is uncrossed, cross its multiples.
        {
          for (j = 2 * i; j < s; j += i)
            f[j] = false; // multiple is not prime
        }
      }

      // how many primes are there?
      int count = 0;
      for (i = 0; i < s; i++)
      {
        if (f[i])
          count++; // bump count.
      }

      int[] primes = new int[count];

      // move the primes into the result
      for (i = 0, j = 0; i < s; i++)
      {
        if (f[i]) // if prime
          primes[j++] = i;
      }

      return primes; // return the primes
    }
    else // maxValue < 2
      return new int[0]; // return null array if bad input.
  }
}

Then I wrote a unit test for GeneratePrimes. It is shown in Commit 2. It uses the uses the JUnit framework as Jerry had instructed. It takes a statistical approach; checking to see if the generator can generate primes up to 0, 2, 3, and 100. In the first case there should be no primes. In the second there should be one prime, and it should be 2. In the third there should be two primes and they should be 2 and 3. In the last case there should be 25 primes the last of which is 97. If all these tests pass, then I assumed that the generator was working. I doubt this is foolproof, but I couldn’t think of a reasonable scenario where these tests would pass and yet the function would fail.

Commit 2, TestGeneratePrimes.java
import junit.framework.*;
import java.util.*;

public class TestGeneratePrimes extends TestCase
{
  public static void main(String args[])
  {
    junit.swingui.TestRunner.main(
      new String[] {"TestGeneratePrimes"});
  }

  public TestGeneratePrimes(String name)
  {
    super(name);
  }

  public void testPrimes() {
    int[] nullArray = GeneratePrimes.generatePrimes(0);
    assertEquals(nullArray.length, 0);

    int[] minArray = GeneratePrimes.generatePrimes(2);
    assertEquals(minArray.length, 1);
    assertEquals(minArray[0], 2);

    int[] threeArray = GeneratePrimes.generatePrimes(3);
    assertEquals(threeArray.length, 2);
    assertEquals(threeArray[0], 2);
    assertEquals(threeArray[1], 3);

    int[] centArray = GeneratePrimes.generatePrimes(100);
    assertEquals(centArray.length, 25);
    assertEquals(centArray[24], 97);
  }
}

I got all this to work in about an hour. Jerry didn’t want to see me until after lunch, so I spent the rest of my time reading the Design Patterns book that Jerry gave me.

After lunch I stopped by Jerry’s office, and told him I was done with the program. He looked up at me with a funny grin on his face and said: “Good, let’s go check it out.”

He took me out into the lab and sat me down in front of a workstation. He sat next to me. He asked me to bring up my program on this machine. So I navigated to my laptop on the network and brought up the source files.

Jerry looked at the two source files for about five minutes. Then he shook his head and said: “You can’t show something like this to Mr. C! If I let him see this, he’d probably fire both of us. He’s not a very patient man.” I was startled, but managed keep my cool enough to ask: “What’s wrong with it?”

Jerry sighed. “Let’s walk through this together.” he said. “I’ll show you, point by point, how Mr. C wants things done.”

“It seems pretty clear,” he continued, “that the main function wants to be three separate functions. The first initializes all the variables and sets up the sieve. The second actually executes the sieve, the third loads the sieved results into an integer array.”

I could see what he meant. There were three concepts buried in that function. Still, I didn’t see what he wanted me to do about it.

He looked at me for awhile, clearly expecting me to do something. But finally he heaved a sigh, shook his head, and continued. “To expose these three concepts more clearly, I want you to extract them into three separate methods. Also get rid of all the unnecessary comments and pick a better name for the class. When you are done with that, make sure all the tests still run.”

You can see what I did in Commit 3. (I’ve highlighted the lines with changes, just like Martin Fowler does in his Refactoring book.) I changed the name of the class to a noun, got rid of all the comments about Eratosthenes, and made three methods out of the three concepts in the generatePrimes function.

Extracting the three functions forced me to promote some of the variables of the function to static fields of the class. Jerry said that this made it much clearer which variables are local and which have wider influence.

Commit 3, PrimeGenerator.java
/**
 * This class Generates prime numbers up to a user specified
 * maximum. The algorithm used is the Sieve of Eratosthenes.
 * Given an array of integers starting at 2:
 * Find the first uncrossed integer, and cross out all its
 * multiples. Repeat until the first uncrossed integer exceeds
 * the square root of the maximum value.
 */

import java.util.*;

public class PrimeGenerator
{
  private static int s;
  private static boolean[] f;
  private static int[] primes;

  public static int[] generatePrimes(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else {
      initializeSieve(maxValue);
      sieve();
      loadPrimes();
      return primes; // return the primes
    }
  }

  private static void loadPrimes()
  {
    int i;
    int j;

    // how many primes are there?
    int count = 0;
    for (i = 0; i < s; i++)
    {
      if (f[i])
        count++; // bump count.
    }

    primes = new int[count];

    // move the primes into the result
    for (i = 0; j = 0; i < s; i++)
    {
      if (f[i]) // if prime
        primes[j++] = i;
    }
  }

  private static void sieve()
  {
    int i;
    int j;
    for (i = 2; i < Math.sqrt(s) + 1; i++)
    {
      if (f[i]) // if i is uncrossed, cross out its multiples.
      {
        for (j = 2 * i; j < s; j += i)
          f[j] = false; // multiple is not prime
      }
    }
  }

  private static void initializeSieve(int maxValue)
  {
    // declarations
    s = maxValue + 1; // size of array
    f = new boolean[s];
    int i;

    // initialize array to true.
    for (i = 0; i < s; i++)
      f[i] = true;

    // get rid of known non-primes
    f[0] = f[1] = false;
  }
}

Jerry told me that this was a little messy, so he took the keyboard and showed me how to clean it up. Commit 4 shows what he did. First he got rid of the s variable in initializeSieve and replacing it with f.length. Then he changed the names of the three functions to something he said was a bit more expressive. Finally he rearranged the innards of initializeArrayOfIntegers (née initializeSieve) to be a little nicer to read. The tests all still ran.

Commit 4, PrimeGenerator.java (partial)
public class PrimeGenerator
{
  private static boolean[] f;
  private static int[] result;

  // ...

  public static int[] generatePrimes(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      initializeArrayOfIntegers(maxValue);
      crossOutMultiples();
      putUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void initializeArrayOfIntegers(int maxValue)
  {
    f = new boolean[maxValue + 1];
    f[0] = f[1] = false; //neither primes nor multiples.
    for (int i = 2; i < f.length; i++)
      f[i] = true;
  }

  // ...
}

I had to admit, this was a bit cleaner. I’d always thought it was a waste of time to give functions long descriptive names, but his changes really did make the code more readable.

Next Jerry pointed at crossOutMultiples. He said he thought the if(f[i] == true) statements could be made more readable. I thought about it for a minute. The intent of those statements was to check to see if i was uncrossed; so I changed the name of f to unCrossed.

Jerry said that this was better, but still wasn’t pleased with it because it lead to double negatives like unCrossed[i] = false. So he changed the name of the array to isCrossed and changed the sense of all the booleans. Then he ran all the tests.

Jerry got rid of the initialization that set isCrossed[0] and isCrossed[1] to true. He said it was good enough to just made sure that no part of the function used the isCrossed array for indexes less than 2. The tests all still ran.

Jerry extracted the inner loop of the crossOutMultiples function and called it crossOutMultiplesOf. He said that statements like if (isCrossed[i] == false) were confusing so he created a function called notCrossed and changed the if statement to if (notCrossed(i)). Then he ran the tests.

Then Jerry asked me what that square root was all about. I spent a bit of time writing a comment that tried to explain why you only have to iterate up to the square root of the array size. I tried to emulate Jerry by extracting the calculation into a function where I could put the explanatory comment. In writing the comment I realized that the square root is the maximum prime factor of any of the integers in the array. So I chose that name for the variables and functions that dealt with it. Finally, I made sure that the tests all still ran. The result of all these changes are shown in Commit 5.

Commit 5, PrimeGenerator.java (partial)
public class PrimeGenerator
{
  private static boolean[] isCrossed;
  private static int[] result;

  public static int[] generatePrimes(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      initializeArrayOfIntegers(maxValue);
      crossOutMultiples();
      putUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void initializeArrayOfIntegers(int maxValue)
  {
    isCrossed = new boolean[maxValue + 1];
    for (int i = 2; i < isCrossed.length; i++)
      isCrossed[i] = false;
  }

  private static void crossOutMultiples()
  {
    int maxPrimeFactor = calcMaxPrimeFactor();
    for (int i = 2; i <= maxPrimeFactor; i++)
      if (notCrossed(i))
        crossOutMultiplesOf(i);
  }

  private static int calcMaxPrimeFactor()
  {
    // We cross out all multiples of p, where p is prime.
    // Thus, all crossed out multiples have p and q for
    // factors. If p > sqrt of the size of the array, then
    // q will never be greater than 1. Thus p is the
    // largest prime factor in the array, and is also
    // the iteration limit.
    double maxPrimeFactor = Math.sqrt(isCrossed.length) + 1;
    return (int) maxPrimeFactor;
  }

  private static void crossOutMultiplesOf(int i)
  {
    for (int multiple = 2*i;
         multiple < isCrossed.length;
         multiple += i)
      isCrossed[multiple] = true;
  }

  private static boolean notCrossed(int i)
  {
    return isCrossed[i] == false;
  }
}

I was starting to get the hang of this so I took a look at the putUncrossedIntegersIntoResult method. I saw that this method had two parts. The first counts the number of uncrossed integers in the array, and creates the result array of that size. The second moves the uncrossed integers into the result array. So, as you can see in Commit 6, I extracted the first part into its own function and did some miscellaneous cleanup. The tests all still ran. Jerry was just barely nodding his head. Did he actually like what I did?

Commit 6, PrimeGenerator.java (partial)
private static void putUncrossedIntegersIntoResult()
{
  result = new int[numberOfUncrossedIntegers()];
  for (int j = 0, i = 2; i < isCrossed.length; i++)
    if (notCrossed(i))
      result[j++] = i;
}

private static int numberOfUncrossedIntegers()
{
  int count = 0;
  for (int i = 2; i < isCrossed.length; i++)
    if (notCrossed(i))
      count++;

  return count;
}

Next Jerry made pass over the whole program, reading it from beginning to end, rather like he was reading a geometric proof. He told me that this was a real important step. “So far”, he said, “We’ve been refactoring fragments. Now we want to see if the whole program hangs together as a readable whole.”

I asked: “Jerry, do you do this with your own code too?”

Jerry scowled: “We work as a team around here, so there is no code I call my own. Do you consider this code yours now?”

“Not anymore.” I said, meekly. “You’ve had a big influence on it.”

“We both have.” he said, “And that’s the way that Mr. C likes it. He doesn’t want any single person owning code. But to answer your question: Yes. We all practice this kind of refactoring and code clean up around here. It’s Mr. C’s way.”

During the read-through, Jerry realized that he didn’t like the name initializeArrayOfIntegers.

“What’s being initialized is not, in fact, an array of integers;”, he said, “it’s an array of booleans. But initializeArrayOfBooleans is not an improvement. What we are really doing in this method is uncrossing all the relevant integers so that we can then cross out the multiples.”

“Of course!” I said. So I grabbed the keyboard and changed the name to uncrossIntegersUpTo. I also realized that I didn’t like the name isCrossed for the array of booleans. So I changed it to crossedOut. The tests all still run. I was starting to enjoy this; but Jerry showed no sign of approval.

Then Jerry turned to me and asked me what I was smoking when I wrote all that maxPrimeFactor stuff. (See Commit 6.) At first I was taken aback. But as I looked the code and comments over I realized he had a point. Yikes, I felt stupid! The square root of the size of the array is not necessarily prime. That method did not calculate the maximum prime factor. The explanatory comment was just wrong. So I sheepishly rewrote the comment to better explain the rationale behind the square root, and renamed all the variables appropriately. The tests all still ran.

Commit 7, PrimeGenerator.java (partial)
private static int calcMaxPrimeFactor() {
  // We cross out all multiples of p, where p is prime.
  // Thus, all crossed out multiples have p and q for
  // factors. If p > sqrt of the size of the array, then
  // q will never be greater than 1. Thus p is the
  // largest prime factor in the array, and is also
  // the iteration limit.
  double maxPrimeFactor = Math.sqrt(isCrossed.length) + 1;
  return (int) maxPrimeFactor;
}

“What the devil is that +1 doing in there?” Jerry barked at me.

I gulped, looked at the code, and finally said: “I was afraid that a fractional square root would convert to an integer that was one too small to serve as the iteration limit.”

“So you’re going to litter the code with extra increments just because you are paranoid?” he asked. “That’s silly, get rid of the increment and run the tests.”

I did, and the tests all ran. I thought about this for a minute, because it made me nervous. But I decided that maybe the true iteration limit was the largest prime less than or equal to the square root of the size of the array.

“That last change makes me pretty nervous.” I said to Jerry. “I understand the rationale behind the square root, but I’ve got a nagging feeling that there may be some corner cases that aren’t being covered.”

“OK,” he grumbled. “So write another test that checks that.”

“I suppose I could check that there are no multiples in any of the prime lists between 2 and 500.”

“OK, if it’ll make you feel better, try that.” he said. He was clearly becomming impatient.

So I wrote the testExhaustive function shown in Commit 9. The new test passed, and my fears were allayed.

Then Jerry relented a bit. “It’s always good do know why something works;” said Jerry, “and it’s even better when you show you are right with a test.”

Then Jerry scrolled one more time through all the code and tests (shown in Commit 8). He sat back, thinking for a minute, and said: “OK, I think we’re done. The code looks reasonably clean. I’ll show it to Mr. C.”

Then he looked me dead in the eye and said: “Remember this. From now on when you write a module, get help with it and keep it clean. If you hand in anything below those standards you won’t last long here.”

And with that, he strode off.

Commit 8, PrimeGenerator.java
/**
 * This class Generates prime numbers up to a user specified
 * maximum. The algorithm used is the Sieve of Eratosthenes.
 * Given an array of integers starting at 2:
 * Find the first uncrossed integer, and cross out all its
 * multiples. Repeat until there are no more multiples
 * in the array.
 */

public class PrimeGenerator
{
  private static boolean[] crossedOut;
  private static int[] result;

  public static int[] generatePrimes(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      uncrossIntegersUpTo(maxValue);
      crossOutMultiples();
      putUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void uncrossIntegersUpTo(int maxValue)
  {
    crossedOut = new boolean[maxValue + 1];
    for (int i = 2; i < crossedOut.length; i++)
      crossedOut[i] = false;
  }

  private static void crossOutMultiples()
  {
    int limit = determineIterationLimit();
    for (int i = 2; i <= limit; i++)
      if (notCrossed(i))
        crossOutMultiplesOf(i);
  }

  private static int determineIterationLimit()
  {
    // Every multiple in the array has a prime factor that
    // is less than or equal to the sqrt of the array size,
    // so we don't have to cross out multiples of numbers
    // larger than that root.
    double iterationLimit = Math.sqrt(crossedOut.length);
    return (int) iterationLimit;
  }

  private static void crossOutMultiplesOf(int i)
  {
    for (int multiple = 2*i;
         multiple < crossedOut.length;
         multiple += i)
      crossedOut[multiple] = true;
  }

  private static boolean notCrossed(int i)
  {
    return crossedOut[i] == false;
  }

  private static void putUncrossedIntegersIntoResult()
  {
    result = new int[numberOfUncrossedIntegers()];
    for (int j = 0, i = 2; i < crossedOut.length; i++)
      if (notCrossed(i))
        result[j++] = i;
  }

  private static int numberOfUncrossedIntegers()
  {
    int count = 0;
    for (int i = 2; i < crossedOut.length; i++)
      if (notCrossed(i))
        count++;

    return count;
  }
}
Commit 8, TestGeneratePrimes.java
import junit.framework.*;

public class TestGeneratePrimes extends TestCase
{
  public static void main(String args[])
  {
    junit.swingui.TestRunner.main(
      new String[] {"TestGeneratePrimes"});
  }

  public TestGeneratePrimes(String name)
  {
    super(name);
  }

  public void testPrimes()
  {
    int[] nullArray = PrimeGenerator.generatePrimes(0);
    assertEquals(nullArray.length, 0);

    int[] minArray = PrimeGenerator.generatePrimes(2);
    assertEquals(minArray.length, 1);
    assertEquals(minArray[0], 2);

    int[] threeArray = PrimeGenerator.generatePrimes(3);
    assertEquals(threeArray.length, 2);
    assertEquals(threeArray[0], 2);
    assertEquals(threeArray[1], 3);

    int[] centArray = PrimeGenerator.generatePrimes(100);
    assertEquals(centArray.length, 25);
    assertEquals(centArray[24], 97);
  }

  public void testExhaustive()
  {
    for (int i = 2; i<500; i++)
      verifyPrimeList(PrimeGenerator.generatePrimes(i));
  }

  private void verifyPrimeList(int[] list)
  {
    for (int i=0; i<list.length; i++)
      verifyPrime(list[i]);
  }

  private void verifyPrime(int n)
  {
    for (int factor=2; factor<n; factor++)
      assert(n%factor != 0);
  }
}

What a disasater! I thought sure that my original solution had been top-notch. In some ways I still feel that way. I had tried to show off my brilliance, but I guess Mr. C values collaboration and clarity more than individual brilliance.

I had to admit that the program reads much better than it did at the start. It also works a bit better. I was pretty pleased with the outcome. Also, in spite of Jerry’s gruff attitude, it had been fun working with him. I had learned a lot.

Still, I was pretty discouraged with my performance. I don’t think the folks here are going to like me very much. I’m not sure they’ll ever think I’m good enough for them. This is going to be a lot harder than I thought.