Tag Archives: featured

Demonstrative Implementation: Bloom Filter

The Bloom filter seems to be a “sweet spot” for people doing hobby projects: simple enough to implement, unusual enough to be interesting, and complicated enough to feel satisfying.

I am interested in “postbaccalaureate” projects for self-study, and using those projects to bridge the transition from a recent grad to a professional programmer (still with a computer science bent). So I turned to this project for that purpose.

It went… OK!

  • I got really caught up in the literate coding stuff, and I think the code was, in effect, too simple. It was more the surrounding machinery that was interesting.
  • The driver code ended up being the complicated thing, which does suggest that it’s more running the experiments (which happens a lot in my experience!) that’s interesting, than the implementation-proper.
  • Things like the arbitrary-length-bit-vector, and the constituent pages, also seemed to be a bit more interesting.
  • I did really like the experiments, though. It was fun figuring out how to automatically make the charts.

I have a bigger write-up here: https://github.com/agorenst/bloom-filter/blob/master/bloomfilter.pdf

Permutation Sequence (Leetcode)

I’ve decided to call this permutation sequence question a “classic problem” because it is I think a very reasonable combinatorics/counting question. Also, the problem is marked as a “Leetcode Hard”, so I figure I should get one of those on this blog before too long.

Discussion and Approach to Solution

As a change of pace, I tried to see what it would be like to make a “literate code” approach. This, uh, didn’t work. I ended up with a block comment above the code, so I’ve extracted that out here:

for [1, 2, 3, 4], we have 3! sequences that begin with 1, 3! that begin with 2, etc… so we end up with 4 * 3! = 4!, that checks out. So in general, if we have a size-n set, there are n! distinct values, and the first (n-1)! begin with 1, then the next (n-1)! begin with 2, etc. This reasoning applies recursively. So we want to find the highest “m” in [1, n] such that (m-1)(n-1)! < k (or equiv, the smallest m such that m(n-1)! >= k. Maybe that’s a better formulation). Note the “punning” between m as an element of the remaining set, and n as the count of elements left in the set. In turn, m is the index of the m’th largest element, not literally the value m.

So that’s a terse comment. To take a step back:

If we consider the n! possible permutations of [1..n], in lexicographical order, the first (n-1)! begin with 1, the next (n-1)! begin with 2, and so on. This leads to n sub-sequences of length (n-1)!, and we can be confident that the concatenation of these sequences gives us the original sequence back, as there are n*(n-1)!=n! elements altogether.

This suggests an analogy to digits (if we consider the sequence [0..999], the first 100 “begin” with 0, the next 100 begin with 1, and so for 10 times, leading to 100*10 = 1000 elements). We can repeat this recursively, for the sequence [0..99] there are again 10 sub-sequences. Unspooling the recursive reasoning, it means we can “index” into that sequence by looking at successively-lower-magnitude digits. This feels like a no-op because it’s sort of what we already do with normal indexing.

With this factorial-esque indexing, there are n biggest-“digit” sequences, then n-1 next-digit-sequences, and so on. The formulation in the code below is not explicitly recursive, but instead currentIndex is basically, well, the index of the implicit sequence of [result + start]. And we select certain elements from start to “increment” our currentIndex.

Code

class Solution(object):
    def getPermutation(self, n, k):
        # for [1, 2, 3, 4], we have 3! sequences that begin with 1, 3! that begin with 2, etc...
        # so we end up with 4 * 3! = 4!, that checks out.
        # So in general, if we have a size-n set, there are n! distinct values,
        # and the first (n-1)! begin with 1, then the next (n-1)! begin with 2, etc.
        # This reasoning applies recursively.
        # So we want to find the highest "m" in [1, n] such that (m-1)(n-1)! < k
        # (or equiv, the smallest m such that m(n-1)! >= k. Maybe that's a better formulation).
        # Note the "punning" between m as an element of the remaining set, and n as the *count*
        # of elements left in the set. In turn, m is the *index* of the m'th largest element, not
        # literally the value m.
        def fact(n):
            res = 1
            for i in range(1, n):
                res *= i
            return res
        result = []
        start = list([i for i in range(1, n+1)])
        currentIndex = 1
        while currentIndex != k:
            # find the smallest m. Careful reasoning about indexing
            # to avoid off-by-one errors (1-indexed the range of permutations,
            # but m itself should be zero-indexed as it goes into `start`)
            incrementN = fact(len(start))
            m = -1
            for m in range(len(start)):
                # My one bug: I left out the "currentIndex + " part, which lead
                # to an interesting behavior: n=5, k=1 and n=5, k=120 both passed without this!
                if currentIndex + (m*incrementN) > k:
                    m -= 1
                    break
            # remove m from start, we've selected it
            assert(m >= 0 and m < len(start))
            result.append(start[m])
            start.pop(m)
            currentIndex += m*incrementN
        
        # At this point we know we should emit the "0th" remaining permutation.
        # Some subtlety in making sure we're appending to result in the same "direction"
        # as done in the while loop
        result += start
        return ''.join(map(str, result))

Discussion as an interview question

This is a hard-core question about recursive reasoning. I’m not sure about its appropriateness. As notes for my own attempt, while I didn’t recall my approach I did actually implement this code a long, long time ago, and in a previous form of this blog talked about the general approach. You can see my older implementation (when I was literally studying enumerations) is much better. So in revisiting this I was able to recall the broad strokes of a way that would work, and then take it from there.

Discussion general

This is a really neat computer science problem! I have a soft spot for enumeration problems, and have enjoyed jumping into Knuth 4A a few times particularly for Gray codes. I have dreams of using those to help provide exhaustive test suites for other combinatoric algorithms, but I just don’t have the time. There is a lot of depth to this topic I’m not familiar with; reconciling my “analogy” with the more in-depth treatment of Factorial Number Systems is something I should do one day.

More recursive problems (which I guess will include enumeration problems?) here.

Longest Increasing Subsequence (Leetcode Dynamic Programming Study Plan)

This is a truly classic problem. The opener in one of the big algorithms textbooks on dynamic programming, I’m glad it’s included in the dynamic programming study plan. I bet there is an even prettier way of writing this code, but here is the solution I have:

class Solution:
    def lengthOfLIS(self, nums: List[int]) -> int:
        if len(nums) == 0: return 0
        T = [1] # the first thing is always a sequence
        for i in range(1, len(nums)):
            best = 1
            for j in range(i):
                if nums[j] < nums[i]:
                    best = max(best, T[j]+1)
            T.append(best)
        return max(T)

Recurrence

T[i] is the length of the longest increasing subsequence that has to include the i’th element in nums. Compare to the recurrence in the previous question we discussed; there are similarities, but the big difference here is that our solution at i can build off of any previous solution that ends with a value smaller than ours, versus our solution having to be just contiguous with the previous solution.

So in more depth:

  1. T[i] is the length of the longest increasing subsequence of nums[:i+1] that include nums[i]. Taking the max of this array will find the solution.
  2. T[i] is computed via that append call, so we append the best value. We consider all values in T[:i] — recall T[j] is the length of the longest subsequence ending with nums[j]. If nums[j] < nums[i], we can “continue” that subsequence with nums[i], and so increase its length by 1.
  3. Through the “magic of recursion”, we are able to kick-start this process by defining T up to T[:1], i.e., T = [1].

The Dasgupta algorithm book explains this in a very graph-theoretic way, which I appreciate but is also kind of a lot.

See here for more dynamic programming problems and solution.

Is Graph Bipartite? (Leetcode)

Another classic question, and another from the archives. This one’s nice enough where I took a stab at it in Python for readability.

class Solution(object):
    def isBipartite(self, graph):
        color = {}
        wl = []
        for x, ns in enumerate(graph):
            if x in color: continue
            wl = [x]
            color[x] = 1
            while wl:
                u = wl.pop()
                for w in graph[u]:
                    if w in color and color[w] == color[u]: return False
                    if w not in color:
                        color[w] = 1 - color[u]
                        wl.append(w)
        return True

Thoughts

  • The 1 - color[u] trick is handy. It’s a quick way to have 2 options, sort of like not‘ing a boolean variable. Something to teach new people.
  • I think there is the possibility that many of the same node is added to wl. I think I need another seen set to make sure we’re not iterating too often. Well, it passed.
  • The outermost loop is always a bit annoying: it’d be nice to promise that the graph is always connected. It’s good to know how to handle that situation, so I suppose that’s OK.
  • The term color here is because a bipartite graph is the same as a 2-colorable graph. Graph-coloring is a core combinatoric question. This solution overall follows from the observation that “if U is on the left, then all neighbors of U must be on the right, and vice-versa”. Instead of left/right, we use 2 colors (which in this case are… integers).
  • This is basically an extension of DFS. There is probably an opportunity for more cleverness.

Finally, a graph algorithm! More to come, I imagine. They’re all collected here.

Reverse a Linked List (Leetcode)

It’s time. It’s obvious I have a lot of linked questions, and here’s the classic. At least when I was an intern way-back-when, this was a cliche “overdone and inappropriate interview question”. It’s a nice exercise on thinking about linked lists, but unless your job requires it (my first job did!) I understand it feels a bit esoteric. Still, I like it.

Iterative Solution

class Solution(object):
    def reverseList(self, head):
        tail = None
        while head:
            n = head
            head = head.next
            n.next = tail
            tail = n
        return tail

Recursive Solution

class Solution(object):
    def reverseList(self, head):
        if head is None:
            return None
        if head.next is None:
            return head
        newTail = head
        head = head.next
        newTail.next = None
        res = self.reverseList(head)
        head.next = newTail
        return res

Discussion

I’m open to being wrong, but it seems that the iterative solution is nicer than the recursive one? Some of the pointer “faith”, in particular that head points to the tail of the reversed sub-list (penultimate line), is a bit much. With pointers that’d be a lot clearer I think, but Python’s references make it a bit fuzzier for me.

If anything, I think the reasoning about how the iterative solution works is very recursive-feeling. Or at least a nice application of loop invariants.

This question covers a lot! More linked list questions are here.

Evaluate Reverse Polish Notation (Leetcode 37)

This was a homework assignment in my intro-to-data-structures-and-algorithms course. Evaluating an expression in this way is a good exercise. While it’s not literally recursion, I’ve tagged it as such because it uses those ideas. (And there doesn’t seem to be a better category.) These sort of questions, I think, are to Leetcode’s merit: a student (assuming they don’t cheat, of course) can really use this to test their understanding.

Having already known the answer, and not yet subjecting any of my students to this, it’s hard for me to anticipate what the tricky parts of this question are. I assume the key insight is that you can push (what Python calls append) the intermediate values back on the stack and it “just works”. Maybe that follows from reading carefully what RPN is?

The motivation for this problem is that it’s an easy calculator to make, and compilers.

class Solution:
    def isOp(self, s):
        return s in ['+', '-', '*', '/']
    def evalRPN(self, tokens: List[str]) -> int:
        stack = []
        for t in tokens:
            if self.isOp(t):
                r = stack.pop()
                l = stack.pop()
                if t == '+': x = l + r
                elif t == '-': x = l - r
                elif t == '*': x = l * r
                elif t == '/': x = int(l / r)
                stack.append(x)
            else:
                stack.append(int(t))
        return stack[0]

Postscript

Is this a good interview question? I don’t know. At a big meeting about interview questions, a work colleague said that they always ask this one as their standard interview question. So that’s one vote! I don’t think there’s trickiness per-se, and I think the parsing and case-handling here is an interesting and good exercise.

I am in parallel trying to build a collection of “classic” problems for students, under this tag.

Merge Two Binary Trees (Leetcode 31)

Another from the archives. Part of a big battery where I went through a lot of tree problems, I guess. I wouldn’t call this problem a classic one (versus, say, Tree Equality), but I think it’s a nice “classroom” exercise on making sure someone feels fluent in tree-manipulation.

class Solution:
    def mergeTrees(self, root1: Optional[TreeNode], root2: Optional[TreeNode]) -> Optional[TreeNode]:
        # destructive operation
        if root1 is None:
            return root2
        if root2 is None:
            return root1
        
        root1.val += root2.val
        root1.left = self.mergeTrees(root1.left, root2.left)
        root1.right = self.mergeTrees(root1.right, root2.right)
        
        return root1

Discussion Points

  1. This follows the classic “skeleton” of solving binary tree problems with recursion. We handle (in the case, the 2) “base cases” of when the tree(s) are null; we solve the smaller cases, and we combine those results to synthesize the new tree.
  2. In this case we treat the operation as wholly destructive. When I help newer programmers, that’s a concept they are often unfamiliar with. Here we are unusually destroying both trees (I’m sure we can solve this without destructive operations; that’s a fun follow-up in many circumstances).
  3. I think that we reassign the left and right children of root1, and in general do some subtle interactions between subtrees, makes this a good next-level exercise. A lot harder than, say, incrementing all the values in a single tree by 1!

All solved tree exercises in this blog series can be found here.

Transpose of Matrix (Leetcode 25)

Another from the archives. This is a “real” matrix question (versus an array-of-arrays). For the people I’ve helped enter into the tech world, the bigger challenge is the vocabulary of matrix math, rather than the actual array-of-array machinery or similar. I like this one as an operation, and I’m glad it takes the time to color and otherwise illustrate what the transpose is.

In a way the straightforward solution may seem “cute”, you “just” need to swap the i/j indices. I think, however, that that’s a real insight, and a real insight you can get in an interview setting. Maybe other people would disagree. I think you can also get there the “brute force” way of just carefully thinking about what to do. The larger challenge is allocating the transpose matrix correctly.

class Solution:
    def transpose(self, matrix: List[List[int]]) -> List[List[int]]:
        if len(matrix) == 0: return []
        T = []
        for j in range(len(matrix[0])):
            T.append([])
            for i in range(len(matrix)):
                T[j].append(matrix[i][j])
        return T

Add Two Numbers (Leetcode 21)

This is from the archives, when I’d take more notes. Let me take those insights and preserve them here.

Despite the name, this question is really about linked lists. I’ll assume you’ve read the question and understood it. Now, a few observations:

  1. We get to iterate through the list in the “nice” way: we can start adding digits immediately.
  2. Built into the question is the problem of “carries” — we’re adding things, but each node can only hold the values 0-9, and so if there’s overflow we have to carry that.
  3. This is a linked-list-merge-style question (see featured post here), so we should be mindful of the “end” of each of our two linked lists and how we should handle them. A moment’s thought also suggests that the carry value can also play a role here!
  4. No negative numbers! The natural approach doesn’t suggest we’d introduce an erroneous leading zero or anything either. So it doesn’t seem there’s anything too tricky here.

Some insights:

  1. We can treat the carry value as a “virtual” node: this tightens up our iterations and puts everything into one loop.
  2. The approach I took was to create a new, third list. Another option (in practice; I assume the autograder would accept this too) would be to make this transformation destructive of one or both lists.
class Solution {
public:
    ListNode* addTwoNumbers(ListNode* l1, ListNode* l2) {
        ListNode* lastDigit = nullptr;
        ListNode* result = nullptr;
        bool carry = false;
        
        while (l1 || l2 || carry) {
            auto l1val = l1 ? l1->val : 0;
            auto l2val = l2 ? l2->val : 0;
            auto nextVal = l1val + l2val;
            
            if (carry) {
                nextVal++;
                carry = false;
            }
            
            if (nextVal > 9) {
                nextVal -= 10;
                carry = true;
                // assert(nextVal < 10 && nextVal >= 0)
            }
            
            if (!result) {
                result = new ListNode(nextVal);
                lastDigit = result;
            } else {
                lastDigit->next = new ListNode(nextVal);
                lastDigit = lastDigit->next;
            }
            
            if (l1) { l1 = l1->next; }
            if (l2) { l2 = l2->next; }
        }
        
        return result;
    }
};

I think this is a nice interview question.