Category Archives: Musings

Demonstrative Implementation: LZW

There’s something of the “standard literature” in undergraduate algorithms: graph traversals, greedy and dynamic programming, sorts and data structures, and so on. One class of algorithms that seems oddly excluded, at least from my experience, is compression algorithms. So I wanted to explore more of that, in this case by implementing LZW.

I implemented (I think…) a more-or-less standard LZW algorithm here. Continuing my theme of bridging computer science and practice (started (?) with my Bloom filter write-up I wrote it in C, and

Some take-aways:

  • I didn’t really bother with literate coding. I think that was ultimately the right thing to do. Uh oh.
  • My biggest frustration with example implementations online (that I can find, mainly from Wikipedia: there’s an old Dr. Dobbs article, and then a million Rosetta Code implementations all doing the same thing) is that it doesn’t seem to reflect a practical implementation. A linear scan over the dictionary? I know (better than most, I think…) that linear scans are great, but it’s hard to believe that’s the right thing here.
  • The Sedgwick implementation for his undergrad book seems OK.
  • I think I can make an interesting short-course on how I iteratively improved performance.
  • I really dropped the ball implementing clear-codes, that was a mess and a half. But it’s exactly the complications of “really” implementing clear-codes, variable-length coding, and so on that I am frustrated being skimmed over by academic resources! Absent my efforts (such that they are) it seems the students’ options are to ignore them entirely, or dive into gzip’s source base.

I have a PDF with more thoughts here: https://github.com/agorenst/lzw/blob/master/lzw.pdf ; you can see how I’m trying to get the framing.

Demonstrative Implementation: Bloom Filter

The Bloom filter seems to be a “sweet spot” for people doing hobby projects: simple enough to implement, unusual enough to be interesting, and complicated enough to feel satisfying.

I am interested in “postbaccalaureate” projects for self-study, and using those projects to bridge the transition from a recent grad to a professional programmer (still with a computer science bent). So I turned to this project for that purpose.

It went… OK!

  • I got really caught up in the literate coding stuff, and I think the code was, in effect, too simple. It was more the surrounding machinery that was interesting.
  • The driver code ended up being the complicated thing, which does suggest that it’s more running the experiments (which happens a lot in my experience!) that’s interesting, than the implementation-proper.
  • Things like the arbitrary-length-bit-vector, and the constituent pages, also seemed to be a bit more interesting.
  • I did really like the experiments, though. It was fun figuring out how to automatically make the charts.

I have a bigger write-up here: https://github.com/agorenst/bloom-filter/blob/master/bloomfilter.pdf

Find the Duplicate Number (Leetcode)

This is a reasonable warm-up question I think, but the performance-related requirement (constant space) is silly. Reading the editorial, well, I have thoughts to share. This is from the archives.

Using this as a warm-up question

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        seen = set([])
        for n in nums:
            if n in seen: return n
            seen.add(n)

This presents an approachable-enough setting to discuss performance, and I think things like “find the duplicates” are related to real-world questions. Moreover, there are ready sequels: OK, if they use this “hash table approach”, what about one that can take more time, but less space?

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        prev = None
        for n in sorted(nums):
            if prev is None:
                prev = n
            elif prev == n:
                return n
            prev = n

Considering The Leetcode Discussion

The leetcode “editorial” (at least as of June 2023) presents 7 different approaches for the problem. Here are some of my reactions to the editorial. This is my experimental foray into providing a criticism to, what I think is the typical leetcode perspective on questions. I’m not meaning to pick on this particular question.

  • “[The problem] is a classic problem” — I’m not sure in what domain this question is a classic. Deduplication in general is very useful, but the particular of this question don’t strike me as something classic. Maybe it’s a common interview question for a certain company? I wish leetcode questions placed their evaluations in more context. It gives the air of secret knowledge (“oh, this is a classic question, and I didn’t know that!”) when I think they’re just… making things up.
  • Approach 3, in which they borrow a bit from each element in the array (and use the range-trick) they say is O(1) space. In practical terms, sure, but what if the array is, say, unsigned shorts and the size of 2^16? You have no bits to borrow; that’s just an example where this is sort of “punning” between theoretical concerns (asymptotic analysis of resource usage) versus practical concerns. Both are fine, but mixing them leads to confusion.
  • The other approaches are fun things to learn, but all (including approach 3) really rely on the particular structure of the input. It’s just unsatisfying.

Different Categories of Programming Problems

Probably the most talked-about interview question types are those that exercise “data structures and algorithms” (DSA) knowledge. That’s a big category, including everything from linked lists to graph algorithms to dynamic programming to sometimes even combinatorics or probability. These questions are more-or-less quizzing the interviewees on their comfort with undergraduate theory knowledge. As I’ve noticed, a lot of the problems on the website Leetcode, in particular, are exercises in this vein.

A more “organic” type of interview question that’s recently piqued my interest is one I see more often in codewars. The higher-numbered questions (which are the easier ones) are often exercises in manipulating strings or arrays in some fashion. Some are pretty naturally motivated, some are contrived. However, they seem to me to be very nice exercises in reasoning about indexes, expressions, and values evolve over multiple iterations of a loop.

I have an interest, and am able to pursue that interest, in helping people learn computer science and programming. When I consider, for that purpose, the first category of questions, the perhaps-frustrating answer to “how do I better at these?” is “learn or review undergraduate algorithms”. This is a tall order for those with a nontraditional background. The second type of question is, on the one hand, more elementary, but consequently also more able-to-be-learned by those with a nontraditional background. I think also they’re quite useful, and I’m curious if there’s some way of more formally transforming those sorts of questions into teaching material.

I recall taking a computer science course, and the professor needed to refer to 2-dimensional array. Professors rarely used an explicit programming language, but even so he took the time to “allocate” a n^2 block of memory, and an accompanying array of n pointers that each pointed to the “next” length-n sequence in that block. My classmates and I were astonished by the sheer cleverness of what was, I later found out, a fairly routine construction. There is a huge “surface area” of material to learn for anything, including programming, and I’m always keen on finding more ways to help people explore more of that quickly.

The term I like to use is “vocabulary”. Some of the loops revealed by these codewar problems help teach new “vocabulary”, but I don’t think that’s quite the right analogy. If the core building blocks of a language are its words, and programs are the books and essays with that language, then I think we’re learning sentences and phrases. Independent of which problems are harder (leetcode-style DSA or codewars-style loops), I think the sheer breadth of loop-based questions in codewars can really augment the abilities of a junior programmer by showing them lots and lots of new sentences — sentences they can absorb and adapt on their own later. As a pedagogical tool for nontraditional students, I really like those. DSA-style questions feel more like complications over the same two-dozen really-complicated DSA sentences you learn in school.

Now, codewars and leetcode and friends are motivated by the infamous “coding interview”. So: which style questions make for better interviews? I think much harder versions of codewars-style questions, with sophisticated (and well-motivated) loops, could go a long way as an evaluation tool. You’re not quizzing on them if they remember Djikstra’s algorithm, you’re seeing how they handle a complicated loop with interesting interactions. Of course there isn’t a bright line: my favorite interview question is more-or-less the merge step of mergesort. I think that’s sufficiently “real” enough, but its heritage is in DSA.

My experience with codewars versus leetcode may not be representative, but it was interesting to see the contrast between the questions I got in those two sites as I’ve been writing up this blog.