Monday, September 10, 2012

Finding Arithmetic Sequences

As a final step of solving a problem, I needed to find out arithmetic sequences within a series of numbers. I am not sure if there is a more efficient algorithm than what I wrote. This one will do the job just fine - the complexity is quite close to O(N^2), which seems fair to me.

The problem.
Given a series of integers in ascending order, we have to find all the arithmetic sequences that occur in the entire series. An arithmetic sequence is defined as a sequence of 3 or more integers in ascending order such that difference between any pair of consecutive integers is same.
For example.
In the series, 2 3 4 5 6 7 8 9, there are 5 sequences. The entire series is a sequence with difference as 1. Then there are (2 4 6 8), (3 5 7 9), (2 5 8) and (3 6 9).
A number can be part of multiple sequences, and multiple sequences can have the same difference between the consecutive integers.

This possibility of a number being in multiple sequences may make the problem seem harder. It isn't that hard.

The idea behind my solution is simple.
1. Iterate over the sorted array from the first element to the second last.
2. For each number i, iterate from the next number j to last. This gives us all the pairs [i,j].
3. For each pair, find the difference, d = j - i.
4. Keep a map with this difference d as the key, and for value - a list of all sequences that have this difference in its consecutive numbers. Note that this will be a list of sequences, as there can be more than one sequence for that difference.
5. See if the difference was already encountered, if not create a new list of sequences.
6. If there was an existing list of sequences, find the sequence in that list that has the first number i. If a sequence was found, add j to it. If not, we create a new sequence and add i and j to it. It's very important to note that for a given difference, there can be only one sequence that has the number i in it. For a sequence, we will use a set data structure.
7. At the end of all iterations, we have created an interesting data structure. At the top level is the map, that uses an the difference between consecutive integers as the key, and for value it has a list of sequences that have that difference. And each sequence is a set.
8. When using this data structure, we have to discard the sequences that have only 2 elements in it.
Let's see how this work for the sequence 2 4 5 6 8 9. The following list shows how the data structure will look after each iteration.

i = 2, j = 4. Map { 2:[(2,4)] }
i = 2, j = 5. Map { 2:[(2,4)] 3:[(2,5)] }

i = 2, j = 6. Map { 2:[(2,4)] 3:[(2,5)] 4:[(2,6)] }
...
i = 4, j = 5. Map { 1:[(4,5)] 2:[(2,4)] 3:[(2,5)] 4:[(2,6)] 6:[(2,8)] 7:[(2,9)] }

i = 4, j = 6. Map { 1:[(4,5)] 2:[(2,4,6)] 3:[(2,5)] 4:[(2,6)] 6:[(2,8)] 7:[(2,9)] }
i = 4, j = 8. Map { 1:[(4,5)] 2:[(2,4,6)] 3:[(2,5)] 4:[(2,6) (4,8)] 6:[(2,8)] 7:[(2,9)] }

Note that till these iterations, there is only one true sequence in the data structure above : 2:[(2,4,6)]. Of course more would be found in later iterations.

With this, let's look at the Java code.
/**
 *  Finds arithmetic sequences within a list of sorted numbers
 */
public static void findSequence (int[] nums)
{

    // Iterate over the array
    // Starting from first element, till second last element
    // For every element, iterate over the list again to find all pairs
    // For every pair, find the difference
    // For the difference, find the sequences
    //
    // There can be multiple arithmatic sequences for the same difference.
    // See if the lower number is already part of any sequence for that diff.
    // If yes, then add the higher number to that set.
    // Else, create a new set, add both numbers to it
    //
    // Any set that has 3 or more elements
    // is an arithmetic sequence.


    // Note the data structure.
    // Map the difference to a collection of sequences.
    // Each sequence is a SortedSet

    Map >> arith_sequences
        = new HashMap >> ();

    for (int i = 0 ; i < nums.ltheength - 1 ; i++ ) {
      for ( int j = i+1 ; j < nums.length ; j++ ) {
        // work on each pair
        long diff = nums[j] - nums[i] ;
        if ( ! arith_sequences.containsKey (diff) ) {
          // create the collection when the diff is
          // encountered for the first time
          arith_sequences.put (diff, new ArrayList>()) ;
        } //
       
        Collection> sequences_for_diff
                   = arith_sequences.get (diff) ;

        boolean found_sequence = false ;
        for (SortedSet sequence : sequences_for_diff) {
        if ( sequence.contains (nums[i]) ) {
            found_sequence = true ;
            sequence.add (nums[j]) ;
            break ;
        } //
      } //
      // if no sequence was found that contained i,
      // then create a new one
      if ( ! found_sequence ) {
        SortedSet sequence = new TreeSet () ;
        sequence.add (nums[i]) ;
        sequence.add (nums[j]) ;
        sequences_for_diff.add (sequence) ;
      } //
    } //
  } //



} //

The complexity is obviously at least O(N^2). For every pair of thse N*N pairs, we iterate to find the sequence that has the lower number in it. Intuitively, it may seem that the worst case will happen when the array contains N consecutive integers from 1 to N. Then the map will have keys for each number from 1 to N-1. But the value will have a list that contains only one sequence, and it will be found in just one iteration. For multplying the complexity, we need the list to have many sequences for the same difference. This depends on the data. Honestly, I do not know how to compute that complexity. Of course the cost of manipulating the data structure - adding a number to the sorted set should also be taken into account. Again, it's getting added only at the end - so that should be fast.

In any case, this is simple, and reasonably efficient algorithm for the job.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...