[Snark] Ruby is so slow

Christopher Vollick 0 at psycoti.ca
Wed Nov 21 03:02:41 UTC 2018

On Sat, Oct 27, 2018 at 07:20:41PM -0500, Zachary R. wrote:
> words.each do |word|
>  dictionary = English.new
>  place = if dictionary.is_english?(word)
>    dictionary.data.find_index(dictionary.the_word)
>  end
>  latin = Latin.new
>  pig = Pig.new(latin)
>  output << pig.data[place]
> end
> puts output.join(' ')

Ok, so, right off the bat here we've constructed a tree of objects.
But they're all short lived!
For each word we build a new English to figure out if a word is English.
Then we skip methods and reach directly into its uselessly named `data` to iterate over words anyway.
And we're using `dictionary.the_word` to get the word back out that we put in when we could have just used `word`.

Then, that's in an if. If the word is not english, I think `place` will be `nil`, and it'll fail later on, so all of that doesn't really give us any kind of protection.
So far so good.

Then we build a latin (just for this word) and then a pig, which we give the latin to.
That allows us to then index into `pig.data` with that thing we saw earlier.
This is a mess, but while some parts of this solution might be a little contrived later, this isn't _that_ exaggerated from a typical OOP overload.
Let's go down a layer.

>  def initialize
>    @data = []
>    load_data("english.data")
>  end
> [...]
>  def load_data(data)
>    file = File.new(data, "r")
>    while (line = file.gets)
>      @data << line.downcase.strip
>    end
>    file.close
>  end

This is obviously this solution's biggest gimmick
We have a list of all English words, and we load it all into a big array so we can process it.

There is _some_ shadow of realism here.
We could argue that we _could_ cache this info so it's more easily available to code later.
But in this case we know we rebuild this for every input word because someone was dumb.
We could argue that we _could_ load this into some efficient index that allows for impressive lookup speeds.
But in this case we just dump it into a flat list and don't even sort it.
(The input file looks sorted already, so maybe that's ok, but we don't use that fact later)
So I could argue that this class's ridiculousness is the basics of the "We'll do it like this for now, and we can always clean it up later" problem-solving.
Or maybe it's just crazy.

>  def is_english?(word)
>    is_english = 'not yet'
>    letters = word.split('')
>    letters.each do |letter|
>      data.each do |english|
>        if english.include?(letter)
>          if is_this_the_word?(letters.join(''), english)
>            is_english = 'yes'
>          end
>        end
>      end
>    end
>    is_english == 'yes'
>  end

So here is a particularly straight-forward sounding bit.
Is this word in the list of all words.
And instead they go over each letter of the input, and for every input word they first check if the letter is in that word. If it is, they push them together and say "Is this my word"?
And if so, they set their boolean string to the string "yes".
But they don't short-circuit, and instead just loop around uselessly.

Again, pretty exaggerated, but it has the shape of bad solutions I've seen.
Some useless check that makes things less efficient in reality.
Some variable that should be a boolean but isn't.
Some unnecessary loops.
It's got the right feel.

>  def is_this_the_word?(this, word)
>    @the_word = word if this == word
>    this == word
>  end

I think this is actually one of my favourite parts of this solution.

So here, in a private method called inside the loop of the above method, we have an instance var that's set conditionally.
And all it really does is check for equality, except for the instance state setting side-effect.

This populates the `dictionary.the_word` we saw above, but only when `this == word` (written twice)
It's so useless. We barely need the check, we don't need it to be hidden away in a method like this, but in the end we don't even need that variable for anything, because it doesn't expose anything we didn't already have.
I like this part.
Again, you could almost hear the person saying "Oh, well, that _used_ to hold a cleaned up, normalized, version of the word. But that logic got removed so, yeah, I guess it's not needed anymore, but there's not really a reason to get rid of it"

> == latin.rb ==
> class Latin < English
>  attr_accessor :data
>  attr_accessor :the_word
>  def initialize
>    @data = []
>    load_data("english.data")
>  end
>  def is_english?(word)
>    'no'
>  end
> end

So, this is again fun.
Here we have a subclass of English called Latin, which makes no real sense.
There's no reason for these to be related, I think.
But more importantly, we've basically redefined half of English.
We're not even using the Inheritence to do less work.

And, also, we're obviously loading the full list of English words a second time for each input word.
So that's fun.

> == pig.rb ==
> class Pig
>  attr_accessor :data

Everything is named data, which I like

>    latins.data.each do |latin|

I like here, how we've had to pass a latin into this thing, but it doesn't matter at all.
The only thing we do with this instance of Latin is reach immediately into its guts and pull the data out, which is just an array of strings.
We call no methods that latin implements (except the constructor that loads the data) and there's no reason for it to be a class at all.
Visitor Pattern! I think...

>    latins.data.each do |latin|
>      piglatin = ''
>      @snorts = 0
>      @squeals = 0

Another nice thing here is the lack of care.
Here, in our loop, we set one local variable and two instance variables.
But that's fine, because nothing ever reads these.
In fact, the only reason they're instance variables at all is because other instance methods touch them behind your back.

Not to mention the garbage names.

>      letters = latin.split('')
>      if latin == 'yellow'
>        piglatin = 'ellowyay'

I really like this one.
Because it speaks to me and says "Oh, my tests are failing and I can't figure out how to make a general rule for "y"... so let's just put in a special case so the tests pass.

The code works in general, but yellow is special.

>      elsif latin.include?('qu') && !['a', 'e', 'i', 'o', 'u'].include?(letters[0])
>        piglatin = letters[letters.find_index('u')+1..-1].join('') + letters[0..letters.find_index('u')].join('') + 'ay'

Here's a fun bug.
So, ignoring the repetition of find_index, this is saying "If this word containes a que and doesn't start with a vowel, take everything up to the first u and put it on the end"
It's an ugly way of doing it, but that's fine.

But actually it's also broken but not in a way the tests caught.

"cheque" has a qu and no vowel at the front, and I think it'll get turned into "echqeuay"
And "boutique" will be ravaged even more because the first "u" it finds will not be the one from the "qu", and it'll become "tiquebouay"

>        letters.each_with_index do |l, i|
>          snort if ['a', 'e', 'i', 'o', 'u', 'y'].include?(l)
>          squeal unless ['a', 'e', 'i', 'o', 'u', 'y'].include?(l)
>          if @snorts - @squeals > 0
>            piglatin = latin + 'ay'
>            break
>          elsif ['a', 'e', 'i', 'o', 'u', 'y'].include?(l) && @squeals > 0
>            piglatin = letters[i..-1].join('') + letters[0..i-1].join('') + 'ay'
>            break
>          end
>        end

Ok, so, then we get to our else case.
What this is _doing_, I think, is looking for a prefix of consontants before the first vowel.
Then it puts them the prefix on the end, takes them off the front, and appends "ay".

But the code doesn't make that easy.
There's the two functions that are called in opposite conditions to each other.
So only one or the other will go up, but not both and not neither.

There's the `@snorts - @squeals > 0` which is the same as `@snorts > @squeals` which is actually just asking is this first letter a vowel, I think.
I don't think there's another chance for that to run, but since I'm not sure I think that's a success.

Then we check if it's a vowel, repeating the condition from the snort case, and if `@squeals > 0`, meaning there has ever been a consonant.
So, again, I think this second half is redundant. I don't think it's possible to get to this part of the code, to have a vowel, and not have squealed before.
Which makes it appear much more important than it is.

And then n the end, all we do is jam each transformed word into a new array.

Let's go back to main.

> words.each do |word|
>  dictionary = English.new
>  place = if dictionary.is_english?(word)
>    dictionary.data.find_index(dictionary.the_word)
>  end
>  latin = Latin.new
>  pig = Pig.new(latin)
>  output << pig.data[place]
> end

So this loads every english word into a dictionary.
Then it uses that to find which place in the dictionary this word belongs in, ignoring what might happen if it's actually not an English word.

Then it builds a new Latin, which creates a second copy of the dictionary by loading it again.
Then we give that latin to `Pig` which creates a third copy of the dictionary, but this one is all turned into piglatin.

And at the end of that, we index into that pig's data assuming the index will correspond, so we can get the piglatin version.
And then we throw all of those dictionaries out and build them all over again for the next word.

Like I said before, this one is a little ridiculous, but I can _feel_ the reasoning.
Because, in a strange way, loading every word in and turning each of them into piglatin isn't bad as a concept.
If you had a good index, the list doesn't change that much and it could be efficient somehow.
Especially if some words had really weird special cases.
It probably still wouldn't be a good idea, for this kind of problem, but I could see a person thinking that might be ok.

But then to build, rebuild, and discard these datastructures rather than reuse them, and to implement them with terrible linear searches just defeats the whole point of any of the optimizations you might have even hoped to get with this.
And that's what I like about this solution.

It's a little over-complicated, it's got some secret bits, and I'm able to explain away the bits that are absurd.
Sure it might be bad now, but we'll fix it in the next version.
The story's already in the backlog!

More information about the Snark mailing list