This chapter presents some more array operations, and then shows how arrays and hashes work together. I also present a useful feature for variable-length argument lists, the gather and scatter operators.
For arrays, the spaceship operator <=>
starts by comparing the first element from each array. If they are
equal, it goes on to the next elements, and so on, until it finds
elements that differ. Subsequent elements are not considered (even if
they are really big).
>> [0, 1, 2] <=> [0, 1, 2]
=> 0
>> [0, 1, 2] <=> [0, 3, 4]
=> -1
>> [0, 1, 2000000] <=> [0, 3, 4]
=> -1
>> [0, 1, 2000000] <=> [0, 1, 4]
=> 1
>> [0, 2] <=> [0, 1, 2]
=> 1
The output is 0
if the arrays are identical. -1
to indicate that the
first array is less than the second one. And 1
otherwise. The spaceship
operator, or an equivalent logic, is useful to customize sort
method.
To render an array partially immutable, use freeze
method.
If you try to assign a new value to one of the elements of frozen array,
you get an error:
>> t = ['a', 'b', 'c', 'd', 'e'].freeze
=> ["a", "b", "c", "d", "e"]
>> t[0] = 'A'
FrozenError (can't modify frozen Array)
But the individual elements of the array themselves are not frozen, mutable elements can still be modified:
>> t[0].upcase!
=> "A"
>> t
=> ["A", "b", "c", "d", "e"]
It is often useful to swap the values of two variables. With
conventional assignments, you have to use a temporary variable. For
example, to swap a
and b
:
temp = a
a = b
b = temp
This solution is cumbersome; array assignment is more elegant:
a, b = b, a
The left side is an array of variables; the right side is an array of expressions. Each value is assigned to its respective variable. All the expressions on the right side are evaluated before any of the assignments. Enclosing the right side expressions within square brackets is optional.
The number of variables on the left and the number of values on the
right need not be the same. Extra values are ignored and extra variables
are assigned nil
:
>> a, b, c = 1, 2
=> [1, 2]
>> [a, b, c]
=> [1, 2, nil]
Another example - to split an email address into a user name and a domain, you could write:
>> addr = 'gem@ruby.org'
=> "gem@ruby.org"
>> uname, domain = addr.split('@')
=> ["gem", "ruby.org"]
The return value from split
is an array with two elements;
the first element is assigned to uname
, the second to domain
.
>> uname
=> "gem"
>> domain
=> "ruby.org"
Strictly speaking, a method can only return one value, but if the value
is an array, the effect is the same as returning multiple values. For
example, if you want to divide two integers and compute the quotient and
remainder, it is inefficient to compute x/y
and then
x%y
. It is better to compute them both at the same time.
The built-in numeric method divmod
returns an array of two values, the
quotient and remainder. You can store the result as an array:
>> t = 7.divmod(3)
=> [2, 1]
Or use array assignment to store the elements separately:
>> quot, rem = 7.divmod(3)
=> [2, 1]
>> quot
=> 2
>> rem
=> 1
Here is an example of a method that returns an array:
def min_max(t)
return t.min, t.max
end
max
and min
are array methods that find the
largest and smallest elements of an array. min_max
computes both and
returns an array of two values.
Methods can take a variable number of arguments. A parameter name that
begins with *
gathers arguments into an array. For example,
printall
takes any number of arguments and prints them:
def printall(*args)
puts args.inspect
end
The gather parameter can have any name you like, but args
is conventional. inspect
method is used here to get string
representation of array, otherwise puts
would recursively
print each element of array on separate line. Here’s how the
method works:
>> printall(1, 2.0, '3')
[1, 2.0, "3"]
>> puts [1, 2.0, "3"]
1
2.0
3
>> puts [1, 2.0, "3"].inspect
[1, 2.0, "3"]
The complement of gather is scatter. If you have a
sequence of values and you want to pass it to a method as multiple
arguments, you can use the splat operator. For example,
values_at
method doesn’t work with an array:
>> primes = [2, 3, 5, 7, 11, 13]
=> [2, 3, 5, 7, 11, 13]
>> idx = [3, 0, -1]
=> [3, 0, -1]
>> primes.values_at(idx)
TypeError (no implicit conversion of Array into Integer)
But if you scatter the array, it works:
>> primes.values_at(*idx)
=> [7, 2, 13]
As an exercise, write a method called sumall
that takes any
number of arguments and returns their sum.
The zip
array method takes one or more arguments and
returns an array of arrays where each array element contains one element
from each of the input arguments. The name of the method refers to a
zipper, which joins and interleaves two rows of teeth.
This example zips a string and an array:
>> s = 'abc'
=> "abc"
>> t = [0, 1, 2]
=> [0, 1, 2]
>> s.chars
=> ["a", "b", "c"]
>> s.chars.zip(t)
=> [["a", 0], ["b", 1], ["c", 2]]
The most common use of zip
is in a loop:
>> for pair in s.chars.zip(t)
>> puts pair.inspect
>> end
["a", 0]
["b", 1]
["c", 2]
>> s.chars.zip(t) { |pair| puts pair.inspect }
["a", 0]
["b", 1]
["c", 2]
If the arguments are not the same length, the result has the length of
the array on which zip
is invoked. nil
value
is used as filler if required.
>> 'Hi'.chars.zip('Hello'.chars)
=> [["H", "H"], ["i", "e"]]
>> 'Anne'.chars.zip('Elk'.chars)
=> [["A", "E"], ["n", "l"], ["n", "k"], ["e", nil]]
You can use array assignment in a loop to traverse an array of arrays:
t = [['a', 0], ['b', 1], ['c', 2]]
for letter, number in t
puts "#{number} #{letter}"
end
t.each { |letter, number| puts "#{number} #{letter}" }
Each time through the loop, we get an array as value. Array assignment
then assigns the elements to letter
and
number
. The output of either looping technique is:
0 a
1 b
2 c
If you combine zip
, loop and array assignment, you get a
useful idiom for traversing two (or more) arrays at the same time. For
example, has_match?
takes two arrays, t1
and
t2
, and returns true
if there is an index
i
such that t1[i] == t2[i]
:
def has_match?(t1, t2)
for x, y in t1.zip(t2)
return true if x == y
end
return false
end
If you need to traverse the elements of an array and their indices, you
can use the method each_with_index
:
'abc'.chars.each_with_index do |index, element|
puts "#{index} #{element}"
end
The result from each_with_index
is an
Enumerator
object, which iterates a sequence of pairs; each
pair contains an index (starting from 0) and an element from the given
sequence. In this example, the output is
0 a
1 b
2 c
Again.
Hashes have a method called to_a
that returns an array of
arrays, where each element is a key-value pair.
>> h = { 'a' => 0, 'b' => 1, 'c' => 2 }
=> {"a"=>0, "b"=>1, "c"=>2}
>> t = h.to_a
=> [["a", 0], ["b", 1], ["c", 2]]
Going in the other direction, you can use an array of arrays to initialize a new hash:
>> t = [['a', 0], ['b', 1], ['c', 2]]
=> [["a", 0], ["b", 1], ["c", 2]]
>> h = t.to_h
=> {"a"=>0, "b"=>1, "c"=>2}
Arrays as keys in hashes can be useful. For example, a telephone
directory might map from last-name, first-name pairs to telephone
numbers. Assuming that we have defined last
,
first
and number
, we could write:
directory[[last, first]] = number
The expression in brackets is an array used as a key. To partially make
it immutable, use [last, first].freeze
. We could use array assignment
to traverse this hash.
for key, value in directory
puts "#{key[1]} #{key[0]} #{value}"
end
This loop traverses the key-value pairs in directory
. It
assigns the elements of each pair to key
and
value
, then prints first-name, last-name and corresponding
telephone number.
There are two ways to represent arrays in a state diagram. The more
detailed version shows the indices and elements. For example, the array
['Cleese', 'John']
would appear as shown below:
But in a larger diagram you might want to leave out the details. For example, a diagram of the telephone directory might appear as shown below:
Here the arrays are shown using Ruby syntax as a graphical shorthand. The telephone number in the diagram is the complaints line for the BBC, so please don’t call it.
Arrays and hashes are examples of data structures; in this chapter we are starting to see compound data structures, like array of arrays, or hashes that contain arrays as keys and arrays as values. Compound data structures are useful, but they are prone to what I call shape errors; that is, errors caused when a data structure has the wrong type, size, or structure. For example, if you are expecting an array with one integer and I give you a plain old integer (not in an array), it won’t work.
To help debug these kinds of errors, I have written a script called
structshape
that provides a method, also called structshape
, that
takes Array/Set/Hash data structure as an argument and returns a string
that summarizes its shape. You can view it from
https://github.com/learnbyexample/ThinkRubyBuild/blob/master/code/structshape.rb
After creating a local copy of this program, you can use it as a stand-alone or load it inside another program using require statement.
Here’s the result for a simple array:
>> require './structshape'
=> true
>> t = [1, 2, 3]
=> [1, 2, 3]
>> structshape(t)
=> "Array of 3 Integer"
A fancier program might write “Array of 3 Integers”, but it was easier not to deal with plurals. Here’s an array of arrays:
>> t2 = [[1,2], [3,4], [5,6]]
=> [[1, 2], [3, 4], [5, 6]]
>> structshape(t2)
=> "Array of 3 Array of 2 Integer"
If the elements of the array are not the same type,
structshape
groups them, in order, by type:
>> t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]
=> [1, 2, 3, 4.0, "5", "6", [7], [8], 9]
>> structshape(t3)
=> "Array of (3 Integer, Float, 2 String, 2 Array of Integer, Integer)"
Here’s an array of arrays with different types:
>> s = 'abc'.chars
=> ["a", "b", "c"]
>> a2 = t.zip(s)
=> [[1, "a"], [2, "b"], [3, "c"]]
>> structshape(a2)
=> "Array of 3 Array of (Integer, String)"
And here’s a hash with 3 items that map integers to strings.
>> h = a2.to_h
=> {1=>"a", 2=>"b", 3=>"c"}
>> structshape(h)
=> "Hash of 3 Integer->String"
If you are having trouble keeping track of your data structures,
structshape
can help.
-
array assignment:
An assignment with a sequence on the right side and an array of variables on the left. The right side is evaluated and then its elements are assigned to the variables on the left. -
gather:
The operation of assembling a variable-length argument array. -
scatter:
The operation of treating a sequence as an array of arguments. -
data structure:
A collection of related values, often organized in arrays, hashes, sets, etc. -
shape error:
An error caused because a value has the wrong shape; that is, the wrong type or size.
Exercise 1
Write a method called most_frequent
that takes a string and prints the
letters in decreasing order of frequency. Find text samples from several
different languages and see how letter frequency varies between
languages. Compare your results with the tables at
https://en.wikipedia.org/wiki/Letter_frequencies.
Exercise 2
More anagrams!
-
Write a program that reads a word list from a file (see Section Reading word lists) and prints all the sets of words that are anagrams.
Here is an example of what the output might look like:
['deltas', 'desalt', 'lasted', 'salted', 'slated', 'staled'] ['retainers', 'ternaries'] ['generating', 'greatening'] ['resmelts', 'smelters', 'termless']
Hint: you might want to build a hash that maps from a collection of letters to a list of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key?
-
Modify the previous program so that it prints the longest list of anagrams first, followed by the second longest, and so on.
-
In Scrabble a “bingo” is when you play all seven tiles in your rack, along with a letter on the board, to form an eight-letter word. What collection of 8 letters forms the most possible bingos? Hint: there are seven.
Exercise 3
Two words form a “metathesis pair” if you can transform one into the
other by swapping two letters; for example, “converse” and “conserve”.
Write a program that finds all of the metathesis pairs in the
dictionary. Hint: don’t test all pairs of words, and don’t test all
possible swaps. Credit: This exercise is inspired by an example at
http://puzzlers.org.
Exercise 4
Here’s another Car Talk Puzzler
(https://www.cartalk.com/puzzler/browse):
What is the longest English word, that remains a valid English word, as you remove its letters one at a time?
Now, letters can be removed from either end, or the middle, but you can’t rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you’re eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what’s the longest word and how many letters does it have?
I’m going to give you a little modest example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we’re left with spit, we take the s off, we’re left with pit, it, and I.
Write a program to find all words that can be reduced in this way, and then find the longest one.
This exercise is a little more challenging than most, so here are some suggestions:
-
You might want to write a method that takes a word and computes an array of all the words that can be formed by removing one letter. These are the “children” of the word.
-
Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible.
-
The wordlist I provided,
words.txt
, doesn’t contain single letter words. So you might want to add “I”, “a”, and the empty string. -
To improve the performance of your program, you might want to memoize the words that are known to be reducible.