[Snark] C #define True "true"

Christopher Vollick 0 at psycoti.ca
Mon Nov 26 00:08:04 UTC 2018


On Thu, Oct 25, 2018 at 07:53:44PM -0500, Finn Alexander O'Leary wrote:
> #define True "true"
> #define False "false"
> static char indexer_for_the_word_thingies = 1;

Already we're off to a good start.
Defining our own booleans isn't _uncommon_ in C, but making them strings is nice.

The "ies" suffix used throughout is perhaps a bit exaggerated, but not wholely unbelievable, and variables like "indexer_for_the_thing" is far from unbelievable.
Unfortunately...

> static char ***A;

A "char*" is normal, a "char **" comes up sometimes, but a "char ***" is virtually unheard of!
I'm excited.

> int _strlen(char *s)
> {
> 	return (s ? strlen(s) : 0);
> }

Alright, so we've built our own strlen that wraps the system standard one, but returns 0 if given no string at all. Ok.

> void load_me_the_thingie_withie(int charie, char *thisie) {
> 	unsigned char i = ((2)<<6);
> 	do {
> 		void *S = &(word_thingies[i]);
> 		if (i < charie) {
> 			((struct thingimajigie *)S)->start = &(*A)[i];
> 			((struct thingimajigie *)S)->end = malloc(_strlen((*A)[i])+1);
> 			memset((((struct thingimajigie *)S)->end), 'A', _strlen((*A)[i]));
> 			*((char*)((struct thingimajigie *)S)->end+strlen((*A)[i])) = '\0';
> 			S -= (sizeof(char**)+sizeof(char*));
> 		}
> 	} while (i--);
> }

But then here we use our _strlen on one line, and on the very next line we use the normal strlen.
I _guess_ we know that second one can't be NULL? But... wait... they're both being called on the _same_ (*A)[i]...
So if it is NULL the second one will break anyway. Cool.

And that's not all. There's a lot of bad packed into this method:
We've got the poorly named i defined with (2<<6) rather than 128, but more importantly no reasoning about _why_ that's the right value.
We've got a do-while that looks like it should be a for-loop since i starts at a literal number and the condition will always be true on the first pass.
(Not to mention using `i--` in the condition being a little spooky)

But then we've got S, with a capital for no reason, defined as a void pointer even though it only ever holds thingimajigies, initialized to the address of word_thingies indexed by i.
Which could be done with word_thingies + i, but instead round-tripped through array indexing and address.
That one alone could be a stylistic choice, but there's a lot of things going on.

S needs to be cast every time it's used, because it's the wrong type for no reason, and we're computing the length of the same string multiple times, both of which add a lot of noise to the lines.

this line:
> 			*((char*)((struct thingimajigie *)S)->end+strlen((*A)[i])) = '\0';
Is a nightmare of casts, but does a simple thing.
Go to the last character of that end string I just made, and set it to the null-terminator.
But wow is that less clear than it could be.
Compare to something like
> 			end_buffer[input_length] = '\0';

Then we adjust S at the end of the if... I'm actually not sure why.
I think it's just to distract us, because it's not used again until its reset at the top of the loop.
Looks important, doesn't it?

So what does it do?
Well... it assigns the strings pointed to by the array A to corresponding locations in the word_thingies array.
And then allocates an end string of the same size for each one filled with the letter A. Andt that's it.
It doesn't even use the "thisie" parameter, and it starts at i=128 every time, but only really _does_ anything if `i < charie`, so it could have just started there, but instead it loops down.

And lastly in this dense piece of garbage, `i` starts at 128 and works down, but we allocated 256 word_thingies. 
Which means firstly that there's a limit on how many words it can handle that doesn't appear to be checked anywhere, but also we've already allocated enough memory for twice that many words and just didn't even use half of it because we got our clever constant wrong.

Ok...

> /* KITTEN <-> CAT <-> CONCATENATE... it's a bad joke don't @ me */
> #define KITTEN(a, b) a # b
> #define BALL_OF_STRING(a,b) a ## b
> #define DEFISCHARACTER(string) size_t BALL_OF_STRING(is, string) (int c) { return (c == KITTEN(, string)[0] || tolower(c) == KITTEN(, string)[0]) ? 1 : 0; }

So we get some defines.
That's fine, I guess.
As you might guess, KITTEN takes two strings and puts them together.
Oh wait, not really.
Actually the symmetry between `KITTEN` and `BALL_OF_STRING` is a lie.
KITTEN takes the first argument and jams it untouched infront of the stringized version of the second argument.
It's not a binary operator at all, it just looks like one. It's one term followed by a unary operator on b.
So you could do `KITTEN("one", two)` and get `"one""two"` which is the same as `"onetwo"`
But `KITTEN(one,two)` is a syntax error, as it transforms into `one"two"`.

But that's fine, because everytime we use it, we just don't pass a first parameter, and so it's equivalent to just writing `#b`.

Ball of string _really_ does concatenation of terms.
So anyway, we give it a string and it gives us a method that tells us if that is our string or not.
It assumes we give it a lower-case as our input, it returns a size_t for some reason..., not our custom boolean, and we have the classic `thing == thing ? 1 : 0`, which is of course equivalent to just `thing == thing` on its own.

Oh, and the inside joke names... I wish I could say that's ridiculous... but I've seen it...

> char *isvowelay(char S, char S1) /* REMOVE: USE LIKE: S = isvowelay(string[0], string[1]); strcmp(S, "true");  */

This definition is pretty good. I don't know if the comment is telling me to remove the method, or to remove itself.
I think the method is important, but I don't know yet.

> 	#define ISVOWEL(_) (isa(_) || ise(_) || isi(_) || iso(_) || isu(_))
This line is fun. We make a definition that says it's a vowel if it's any of a,e,i,o, or u, as defined above by our macro!
This definition _looks_ to be scoped to our `isvowelay` method, but there's actually no scoping in CPP, so it's global. Which is good, because we use it later in the definition of a different method! WOO!

But now we have two definitions.
We have `ISVOWEL` which tells us if it is, and `isvowelay` that defers to `ISVOWEL`, but then tacks some special cases on, like `y` followed by a consonant, and `x` followed by `r`...

Anyway these return our string booleans!

> struct capture isconsonant(char *string)

Next we have this method called isconsonant which is defined as the opposite of isvowel, right?
Wrong!
It doesn't return a boolean of any kind!
It returns the span of consonants which represents the prefix we're going to strip off in our pig-latinizing!
Using a bunch of mostly hard-coded special cases!
And a variable called `l` that is never used...
We also have another do-while loop with:
> 		int i = 0;
> 		do {
> 			i++, c.j++;
> 		} while (i < strlen(string) && !ISVOWEL(string[i]));
which could, again, be a for loop, and `i` doesn't really need to exist at all since it will always have the same value as `c.j`... but whatever.

Also... c.i is always 0 in all cases.
K.

Now we get to the real logic:

> void dologic(char *S, char *dest)
> {
> 	struct capture a, b, d;
> 	if (strcmp(True, isvowelay(S[0], S[1])) == 0) {
> 		/* append -ay */
> 		a.i = 0;
> 		a.j = _strlen(S);
> 		b.i = 0;
> 		b.j = 3;
> 		write_string(dest, S, a);
> 		write_string(dest+a.j, "ay\0", b);
> 		return;
> 	}

This is our first case. If the string S starts with two vowels, we leave the word as-is and put "ay" on the end.
Perfect!

But we're pretty deep in technical debt here.
We're not copying strings directly, we're going through `write_string` which requires we give it our own `capture` group.
But that's not buying us anything here, it's just making things more confusing when, for example, we have to do:
> 		b.i = 0;
> 		b.j = 3;
> 		write_string(dest+a.j, "ay\0", b);

to get the letters "ay" tacked on the back.
We've also got to deal with the fact that our comparison function returns our string booleans, so we can't just do `if (isvowelay` we have to compute it, and then compare it for string equality with `strcmp(True`.

But then, just in case that wasn't verbose enough we also have `== 0` redundantly on the end to make sure our native booleans are... booleans.

> 	struct capture c = isconsonant(S);

We then follow our early return statement by defining more variables.
Even better, c is between "a" and "d", but is defined after them.
Obviously d was an afterthought.
But it isn't used in the first bit of logic, so it could be defined down here with "c"...
Anyway!
We grab our consonants!

> 	/* printf("c.i: %d; c.j: %d\n", c.i, c.j); */

Nice debugging line left in but commented out.

> 	if (c.i == c.j) { 		a.i = 0;
> 		a.j = _strlen(S);
> 		write_string(dest, S, a);
> 	}

Here we check if `.i` and `.j` are equal... but I don't think there's any way that could be true.
Because we use a do-while in isconsonant, I think we always move c.j up to at least 1.
And c.i is always 0.

So, important looking special check is actually useless, and the only real logic that runs is in the else.

> 		a.i = c.j;
> 		a.j = strlen(S)-c.j;
> 
> 		b.i = 0;
> 		b.j = c.j;
> 
> 		d.i = 0;
> 		d.j = 3;
> 
> 		write_string(dest, S, a);
> 		write_string(dest+a.j, S, b);
> 		write_string(dest+a.j+b.j, "ay\0", d);

Of all the code, this is probably the most straight-forward, even if it is a little muddied up.
First we write all characters after the prefix, then we write the prefix, then "ay"

Probably worth noting that this function doesn't return anything, it just modifies the dest string given to us.

> int people_gave_one_argument_with_the_words_rather_than_giving_them_as_separate_arguments()
> {
> 	int i;
> 	for (i = 0; i < strlen(*((*A)+1)); i++)
> 	{
> 		if (((*A)[1])[i] == ' ' && isalpha(((*A)[1])[i+1]))
> 			return 1;
> 	}
> 	return -1;
> }

Nice descriptive method name here.
And a real for loop! Besides mixing (*((*A)+1)) and ((*A)[1])[1] in the same method, and computing the length every time around our loop, it's actually a pretty straight-forward one too!

Does my first argument contain any spaces that are followed by letters.
If so return 1, otherwise -1... which is a new different boolean.

And then finally we get to the end!

> int main(int _, char **__)
> {
> 	A = &__;

Here we've named our args the most sensible names, and assigned A to the address of char**.
This is the cause of all the `(*A)` we've had to put around everywhere, but since we never reassign A, it buys us nothing.
Woo!

> 	load_me_the_thingie_withie(_, NULL);

We load out of our global A into our words array, passing in NULL because that parameter isn't used anyway, it turns out.

> 	if (_ > 0 && people_gave_one_argument_with_the_words_rather_than_giving_them_as_separate_arguments() > 0) {
> 		/* I never designed for this shit, only noticed when it came to the tests, so instead let's design around it */
> 		char *command = calloc(strlen(__[0]) + strlen(__[1]) + 2, sizeof(char));
> 		strcpy(command, __[0]);
> 		command[strlen(__[0])] = ' ';
> 		strcpy(command+1+strlen(__[0]), __[1]);
> 		// printf("command: %s\n", command);
> 		FILE *f = popen(command, "r");
> 		int i;
> 		do {
> 			fputc((i = fgetc(f)), stdout);
> 		} while (!feof(f) && i != '\n');
> 		fclose(f);
> 		free(command);
> 		return 0;
> 	}

Ok... so...
Here we've discovered that we actually wanted our first argument to contain the sentence, rather than each word being a separate argument.
So rather than handle that in the late hours of the project, instead we build a command string that is our executable and the string we were given, and and it to popen to run ourselves and let popen handle the parsing.

Then it's as simple as proxying everything it prints to our own terminal with another do-while loop and then returning!
That was close!

But really, I think this one is more insidious than it looks.
It turns out, in the tests, that this is the real way this code will be called, every time, in practice.

That means that every time we use this program for real, it will actually be running twice.
And we'll have to wait for it to boot up the second time, see that everything's ok, do the thing, write the thing, then we have to come back to the first process, read the thing, write the thing, then terminate.

With this simple action we've decided to just be at least 2x worse because we didn't want to do the simple loop that would have solved the problem for real.
And I think that's delightful.

> 	/* testing */
> 	int i;
> 	for (i = 1; i < _; i++) {
> 		dologic(*word_thingies[i].start, word_thingies[i].end);
> 		/* printf("%s %s %s\n", *word_thingies[i].start, isvowelay((*word_thingies[i].start)[0], (*word_thingies[i].start)[1]), word_thingies[i].end); */
> 		printf("%s", word_thingies[i].end);
> 		if (i+1 < _) fputc(' ', stdout);
> 	}
> 	fputc('\n', stdout);

And then down here is our happy case, where we only have one word per argument.
First we declare that something is testing, but I don't know what, because this is the real logic.
Then we can do our logic with the start of each argument and the end we allocated.
Then we can not do the next line of commented out debugging stuff.
And finally we can print out the end result.
And a space if we're not at the end.

As is often the case in this solution, it looks like there's no good reason for start to be `char **` because we just dereference it here, but that's fine, I guess.

Less fine is that we allocate end to be `_strlen((*A)[i])+1` bytes long, or exactly the same size as A, but then we tack 2 more characters on it every time.
So I think we extend off the end of the array for every string... but that's obviously been fine so far, or the tests wouldn't pass, right?
It's probably fine...

So, it has some unprofessional names, some useless names, and way too much pointer indirection.
But beyond all that, it's also jam-packed with tiny but crappy patterns, multiple booleans, uncareful memory management, hacks around use-cases, tricky macros, etc.

I liked it.


More information about the Snark mailing list