
Segfault with Soham Sankaran
Podcast von Honesty Is Best
Nimm diesen Podcast mit

Mehr als 1 Million Hörer*innen
Du wirst Podimo lieben und damit bist du nicht allein
Mit 4,7 Sternen im App Store bewertet
Alle Folgen
2 Folgen
Cornell Professor and former Facebook AI Researcher Bharath Hariharan joins me to discuss what got him into Computer Vision, how the transition to deep learning has changed the way CV research is conducted, and the still-massive gap between human perception and what machines can do. Consider subscribing via email [https://honestyisbest.com/segfault#subscribe_top] to receive every episode and occasional bonus material in your inbox. Soham Sankaran’s Y Combinator [https://ycombinator.com]-backed startup, Pashi [https://pashi.com], is recruiting a software engineer to do research-adjacent work in programming languages and compilers. If you’re interested, email soham [at] pashi.com for more information. Go to transcript [https://honestyisbest.com/segfault/2020/Jul/17/computer-vision/#transcript_anchor] Note: If you’re in a podcast player, this will take you to the Honesty Is Best website to view the full transcript. Some players like Podcast Addict [https://podcastaddict.com/] will load the whole transcript with time links below the Show Notes, so you can just scroll down to read the transcript without needing to click the link. Others like Google Podcasts will not show the whole transcript. SHOW NOTES Participants: Soham Sankaran [https://soh.am] (@sohamsankaran [https://twitter.com/sohamsankaran]) is the founder of Pashi [https://pashi.com], and is on leave from the PhD program in Computer Science at Cornell University [https://www.cs.cornell.edu/]. Professor Bharath Hariharan [http://home.bharathh.info/] is an Assistant Professor in the Department of Computer Science at Cornell University. He works on recognition in Computer Vision. Material referenced in this podcast: ‘Building Rome in a Day’, a project to construct a 3D model of Rome using photographs found online from the Univeristy of Washington’s Graphics and Imaging Lab (Grail): project website [https://grail.cs.washington.edu/rome/], original paper [https://grail.cs.washington.edu/rome/rome_paper.pdf] by Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, and Richard Szeliski in ICCV 2009. The Scale-Invariant Feature Transform (SIFT) algorithm: wikipedia [https://en.wikipedia.org/wiki/Scale-invariant_feature_transform], original paper [https://www.cs.ubc.ca/~lowe/papers/iccv99.pdf] by David G. Lowe in ICCV 1999. The Perceptron: wikipedia [https://en.wikipedia.org/wiki/Perceptron], original paper [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.335.3398&rep=rep1&type=pdf] by Cornell’s own Frank Rosenblatt [https://en.wikipedia.org/wiki/Frank_Rosenblatt] in Psychological Review [https://en.wikipedia.org/wiki/Psychological_Review] Vol. 65 (1958). Rosenblatt was a brilliant psychologist with exceptionally broad research interests across the social sciences, neurobiology, astronomy, and engineering. The perceptron, which is a forerunner of much of modern artificial intelligence, initially received great acclaim in academia and the popular press for accomplishing the feat of recognizing triangular shapes through training. In the 60s, however, legendary computer scientists Marvin Minsky [https://en.wikipedia.org/wiki/Marvin_Minsky] (a high-school classmate of Rosenblatt’s) and Seymour Papert [https://en.wikipedia.org/wiki/Seymour_Papert] released a book, Perceptrons [https://en.wikipedia.org/wiki/Perceptrons_(book)], that made the argument that the perceptron approach to artificial intelligence would fail at more complex tasks, resulting in it falling out of fashion for a few decades in favour of Minsky’s preferred approach, Symbolic AI [https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence]. Symbolic AI famously failed to produce tangible results, resulting in the AI winter [https://en.wikipedia.org/wiki/AI_winter] of the 80s and 90s, a fallow period for funding and enthusiasm. Rosenblatt, meanwhile, died in a boating accident in 1971 at the relatively young age of 43, 40 years too early to see himself vindicated in the battle between Minsky’s Symbolic AI and what we now call Machine Learning. Bharath’s CVPR 2015 paper Hypercolumns for Object Segmentation and Fine-grained Localization [https://arxiv.org/abs/1411.5752] with Pablo Arbeláez, Ross Girshick, and Jitendra Malik, in which information pulled from the middle layers of a convolutional neural network (CNN) trained for object recognition was used to establish fine-grained boundaries for objects in an image. ImageNet [http://www.image-net.org/], originally created by then Princeton (now Stanford) Professor Fei-Fei Li [https://profiles.stanford.edu/fei-fei-li] and her group in 2009: A vast database of images associated with common nouns (table, badger, ocean, etc.). The high quality & scale of this dataset, combined with the vigorous competition [http://www.image-net.org/challenges/LSVRC/] between groups of researchers to top the ImageNet benchmarks, fuelled massive advances in object recognition over the last decade. Credits: Created and hosted by Soham Sankaran. Mixed and Mastered by Varun Patil [https://www.youtube.com/channel/UC7_0sQfCXyofKxxfo0yTXLA/about] (email [varunpatil.audio@gmail.com]). Transcribed by Sahaj Sankaran & Soham Sankaran. TRANSCRIPT [00:00:00] Bharath Hariharan: Humans can basically very quickly learn new things. They know exactly when they see something new, they can learn from very few training examples, and they can learn with very little computational effort. Whereas current techniques, they can only learn a small number of things with lots of examples and lots of computational effort. That’s a big gap which causes all sorts of issues when you apply these techniques to the real world. [ringing tone] [00:00:38] Soham Sankaran: Welcome to Episode 2 of Segfault, from Honesty is Best. Segfault is a podcast about Computer Science research. This episode is about computer vision, and it features Professor Bharath Hariharan from Cornell University. I’m your host, Soham Sankaran. I’m the CEO of Pashi, a start-up building software for manufacturing, and I’m on leave from the PhD program in Computer Science at Cornell University, located in perpetually sunny Ithaca, New York. Computer Science is composed of many different areas of research, such as Operating Systems, Programming Languages, Algorithms and Cryptography. Each of these areas has its own problems of interest, publications of record, idioms of communication, and styles of thought, not to mention, one level deeper, a multitude of sub-areas, just as varied as the areas they are contained within. This can get extremely confusing very quickly, and I certainly didn’t know how to navigate this terrain at all until I was in graduate school. Segfault, in aggregate, is intended to be a map of the field, with each episode featuring discussions about the core motivations, ideas and methods of one particular area, with a mix of academics ranging from first year graduate students to long tenured professors. I hope that listeners who have dipped their toes in computer science or programming, but haven’t necessarily been exposed to research, get a sense of not only the foundations of each area – what work is being done in it now, and what sorts of opportunities for new research exist for people just entering, but also what it is about each specific area that compelled my guests to work in it in the first place, and what the experience of doing research in that area every day actually feels like. This is the map of CS that I didn’t even know I desperately wanted in high school and undergrad, and I hope folks who are just starting their journey in computer science will find within it ideas that will excite them so much, and so viscerally, that they can’t help but work on them. Just a quick note. The first episode was about the research area of programming languages. If you haven’t already listened to it, you can find it at honestyisbest.com/segfault [https://honestyisbest.com/segfault/2020/Jun/16/programming-languages]. My company, Pashi, is actually hiring someone with experience in programming languages and compilers, both in industry and in academia. If you fit this description, or know somebody that does, please reach out – send me an email at soham@pashi.com. [ringing sound] [00:02:44] Soham: So I’m with Professor Bharath Hariharan, who does computer vision. If you just want to introduce yourself briefly… [00:02:50] Bharath: I’m Bharath. I do computer vision and machine learning. My interests are in visual recognition. I came here after a few years at FAIR – Facebook AI Research – and before that I was a Ph.D student at UC Berkeley. [00:03:03] Soham: Where you worked with Jitendra Malik, who is one of the pioneers of vision recently. [00:03:10] Bharath: Yeah. He’s one of the… yes. [00:03:15] Soham: So, what was it that got you into computer vision in the first place? What was your journey to choosing this as a research field? [00:03:21] Bharath: A variety of things. So I think the first source of my interest was that I was actually originally interested in psychology and how brains work. I think that’s been a longstanding interest of mine, and when I started working on computer science, when I started studying computer science, that was the thing I kept going back to. Like, why can’t computers do the things humans can? The other part of it was just images and visual media. Earlier, I had a brief infatuation with computer graphics, which also led to this question of ‘Why can’t machines understand images as well as humans do?’ So that’s sort of roughly the route I took, which is a fairly short route, but it serves as the motivation. [00:04:12] Soham: What was the first problem that was very interesting for you in vision? Or the first problem that you worked on? [00:04:18] Bharath: So the first problem I worked on was very different from the first problem that caught my interest. [laughter] The first problem I worked on was this problem of 3D deconstruction. I was in IIT-Delhi at the time, and one of my mentors, Professor Subhashish Banerjee [http://www.cse.iitd.ernet.in/~suban/] – we called him Suban – he had this project where they were trying to digitize a lot of these historic monuments in Delhi and the surrounding area. If you’re not aware, IIT-Delhi is fairly close to a cluster of historical monuments that are mostly forgotten, in the Hauz Khas area. A lot of those monuments, if you go there, you can see them, but a lot of people don’t travel. So the idea was that we could create 3D models of these monuments, and then that would be educational. I was an undergraduate, I was mostly implementing techniques that others had suggested, including people in Oxford and MSR Cambridge. That was my first exposure to computer vision. One of the big things that actually caused me to consider this as a direction for graduate study was that, through some talk – I forget who gave the talk – I got to know of work that Noah Snavely [http://www.cs.cornell.edu/~snavely/] did when he was a graduate student. It’s actually one of the papers that recently got the Test of Time Award in computer vision. This was the paper called ‘Building Rome in a Day’ [https://grail.cs.washington.edu/rome/], and this was a similar idea, but what they were doing was, they took Internet photographs of tourist destinations like Rome, and basically created this pipeline that would take this collection of Internet photographs and produce a 3D model. I was fascinated by that. I was also fascinated by the technical details in that. In particular, when you take two images of the same scene taken at different times from different locations, how do you know they refer to the same thing? That was actually the key piece that I ended up focusing on, which led me to recognition, like ‘How do we know we’ve seen the same thing before?’ [00:06:55] Soham: I see. So let’s talk about this paper a bit more. So these were just arbitrary images they’d taken off the Internet, they had no pre-planning about which images to take, or anything like this? Bharath: Yeah. Soham: How many images did they need to reconstruct? Bharath: I think it varied. They were going for the large scale. It’s useful to imagine what the environment was like at that time. It’s very different from what it is now. Soham: What year was this? Bharath: I think the paper came out in the late 2000s. But at that time, it was not taken as a given – surprisingly – that the Internet is a resource for data. People in computer vision were still looking at small scale problems, laboratory scale problems. Or, you know, you take five pictures of this stapler and reconstruct this. Soham: I see. [00:07:45] Bharath: So the idea behind… so this paper, along with some others, particularly from Microsoft Research and the University of Washington, they were among the first to recognize that ‘Look, there is this massive resource called the Internet, which now has people posting photographs of their tourism, and you can just use this to build these 3D models.’ And later on, the same idea got morphed into ‘Okay, let’s do recognition’, blah blah blah, and so on. Till where we are now, where it’s kind of assumed that we have all this data and we’re going ‘What if we didn’t have this data?’ [laughter] [00:08:23] Soham: Was this around the same time that the shift was happening from classical vision techniques to ML techniques or had that already happened? [00:08:32] Bharath: So people were already using machine learning. People have been using machine learning in computer vision since the late 90s. This was, I would consider… so computer vision tends to have this… mood swings, is what I’d call it. So there are certain problems which people get really excited about, and people work on them for a decade. Then other problems take their place. So this paper was in the heyday of the time when people were talking about geometry. 3D reconstruction was the core problem people were interested in. There were a few machine learning techniques involved, but a lot of the work – for example Noah’s paper, a lot of it is a combination of good systems building, systems challenges, plus just optimization, mathematical optimization techniques. So there’s not much training, there’s not much machine learning, in that paper per se. So the resurgence of machine learning, the focus on recognition, was something that only started to pick up in the late 2000s. [00:10:05] Soham: So what was the key technical trick in Noah’s paper? What let him recognize that it was the same object that multiple images were looking at? [00:10:15] Bharath: Well, if you read Noah’s paper now… Noah’s paper is actually a systems paper. The key thing is just to put together components that people had explored, but put together those components in a way that was extremely robust, extremely scalable, and so on. The thing that I as an undergraduate got really excited by was another paper that was used in this, about SIFT. So SIFT is Scale-Invariant Feature Transform [https://en.wikipedia.org/wiki/Scale-invariant_feature_transform]. SIFT is a paper which, if you read it even now… I really like the paper. It has a few key ideas, very well evaluated, very well motivated. It was, I think, 2001 or 2002 was when it came out, and we’re still writing papers trying to beat SIFT. SIFT is still a baseline for us. I read SIFT as an undergraduate, and I thought ‘Wow. This is what I want to do.’ That was what kind of started the whole thing. [00:11:22] Soham: Ok. Explain SIFT. [laughter] [00:11:27] Bharath: So the fundamental problem SIFT was trying to tackle is that… you have two views of the same object, but they might be from very different angles, the object may appear very differently. How do we match them? There’s two parts to the SIFT paper. One component is detecting these key points, parts of the object that are distinctive enough that you can use for matching. The second is description. How do you describe these patches so you can match them reliably across two scenes? There are challenges in both, but the key way the paper describes it, which is a very useful way and is the way we describe it now in our courses, is that there are a set of invariances you want. There are certain transformations that might relate these two images. Those transformations should not cause your system to fail. So one transformation they were looking at was scale transformations – one view might be zoomed in, another might be zoomed out. The other is in-plane rotations – 2D rotations of the image, for example. The third is 3D rotations, but to a limited extent. 3D rotations are hard because you don’t know the 3D structure, you just have the image. But if you are careful about it, then small amounts of 3D rotation you can tolerate. So what they did was, they created this feature description and detection pipeline, where they reasoned about how exactly to introduce these invariances. So if what I want is scale invariance, what I should be doing is running my system at many different scales, identifying the optimal scale. That way, no matter which scale the object appears in, I’ll find it. [00:13:30] Soham: So sort of brute-forcing it, in some sense. [00:13:32] Bharath: In some sense, right. The other idea is, if I want to do invariance in perspective changes or 3D deformations, then what I need is something more sophisticated, which is discretization, quantization, binning, histogramming, those ideas. The combination of these two, search across a variety of transformations and intelligent quantization and histogramming, was something that SIFT introduced. Those ideas kept repeating in various forms in various feature extraction techniques, all the way up till neural networks. [00:14:10] Soham: So they came up with a principled and reasonably applicable set of ways to describe these invariants that were useful for other applications as well? [00:14:22] Bharath: Yeah. [00:14:23] Soham: I see. [00:14:23] Bharath: And if you look at the SIFT paper… even when Yann LeCun talks about convolutional networks nowadays, he harkens back to the way these people were describing it. How do you get invariants? Well, you get translation invariance by something like convolution by doing the same operation at every location. You get invariance to small shifts by doing discretization and binning and pooling and so on. So those ideas have survived, and they became the centerpoint of all of computer vision, to the extent that today no-one even thinks about them. It’s like second nature to most people that you have to do this. [00:15:00] Soham: Can you describe exactly what you mean by quantization and binning here? [00:15:03] Bharath: So the idea here is as follows. One of the biggest challenges that people don’t realize about computer vision is that, if you take an image… so an image is essentially a 3D array of pixels, pixel values. Now if I just were to slightly shift my camera, all the pixels might move one pixel over, but now if I look at the 2D array, the 2D array is completely different. If I look at corresponding indices, the pixel values are completely different. That throws everything off. So what you do to get around this – and this keeps coming at every level of reasoning – is that you can imagine taking your image… so, the simplest way to do this is that you can think of reducing the size of the image, some sampling. If you do that, then the difference becomes much less. If your image moved by one resolution, if you reduce resolution by a factor of half, by a factor of two, then you only moved by half a pixel. Then it leads to less changes. So that’s sort of the core idea. So what you do is, you take the image, use a grid to divide it… [00:16:25] Soham: A coarse grid. [00:16:27] Bharath: A coarse grid. And then in each cell, you would compute some aggregate value. [00:16:32] Soham: So four pixels becomes one pixel. [00:16:33] Bharath: Right. So that’s basically the idea. SIFT also had this one thing of using edges instead of image intensities. But that was something that people had thought of earlier too, that edges are more important than image intensities. But that’s the quantization and binning, that you just divide it into a coarse grid and aggregate each grid. So that gives you a fair amount of invariance. [00:16:55] Soham: So now you can compare two images that are somewhat similar, like of the same object, but from slightly different perspectives. And if you have it coarse enough, and if you’re lucky, then they’re going to look the same. Or substantially the same. I see. And that was first introduced by the SIFT paper? [00:17:09] Bharath: Right. That was one of the key ideas. There are other related ideas that come at the same time, but SIFT was the first engineered system. [00:17:25] Soham: So this really caught your attention when you saw it the first time, as an undergrad. [00:17:28] Bharath: Yes. And then after that, it was a fairly straightforward path to where I am, in some sense. [00:17:25] Soham: That makes sense. Could you describe the toolkit of modern computer vision? Both the machine learning and non-machine learning components, broadly. [0:17:46] Bharath: So modern computer vision, right. So there’s those two kinds of things. A big part of modern computer vision is driven by learning. That includes convolutional networks, what people nowadays call deep learning, and the associated learning techniques. The other part of it is geometry and physics. How images are formed, the mathematics of that, the related properties of the geometry. There’s a lot of things that, because of the way images are taken, lead to certain geometric and physical properties images have. All computer vision papers nowadays usually have some combination of these two. If you’re doing just ‘Is this a cat or not?’ you’re mostly using convolutional networks, machine learning. If you’re doing something like ‘Reconstruct this 3D model of a car’, you might actually use a combination, you might say ‘I’ll use my understanding of geometry to say how some feature-matching thing will lead to a 3D model, but I might also use machine learning to refine the model based on some understanding of what a car looks like in general.’ Something like that. So those are the two big toolkits, geometry and learning. There used to be also a significant part of this which was based on signal processing. So a lot of classical computer vision is based on an understanding of Fourier transforms, frequency analysis, things like that. That’s much less there now, though there’s some evidence that those things are still useful. [00:19:35] Soham: But it’s been largely replaced by the ML component? I see. So let’s talk about ML in computer vision. Can we talk about the perceptron model? Tell me what that is, and see if you can explain it to a relatively lay audience. [00:19:48] Bharath: So the perceptron was actually one of the first machine learning models. The perceptron is an algorithm. It’s a simple algorithm. The idea is, you have some number of scalar inputs, and you have to predict a scalar output, which is either 0 or 1. Basically, the way you’re going to do this is, you’re going to add and subtract and multiply with some scalar weights your inputs and combine them in that way to get some number. So you might say, I’ll take half of input A, two times input B, a third of input C, add them all up, and then compare it to some threshold value. If it rises above that threshold, I’m going to claim the output is 1. If it falls below that threshold, I’m going to claim the output is 0. So the threshold, and the weights I’m using to mix and match different inputs, those are the things one needs to figure out for any given problem. We call those typically parameters of the model. Perceptron is the model, parameters are the things we need to figure out. Even before we figure out the parameters, we have made certain assumptions about what this decision rule looks like. There are only certain things I can capture with this decision rule, but assuming we can capture this decision rule, the next thing is, how do I figure out these parameters. Now the perceptron algorithm, the training algorithm, is a fairly straightforward algorithm. It basically involves going through some set of training examples where you have inputs and corresponding outputs. Essentially, you see, you try your current setting of the parameters, you see if you get your classification right. If you don’t get it right, then you change your weights in a particular manner so that you get it right. You keep doing this, and there is a proof that you can show which says that if there is a setting of the parameters that would solve the task, you’ll discover this. [00:22:10] Soham: In some bounded time, or as time goes to infinity? [00:22:12] Bharath: In some bounded time. So that’s the original perceptron algorithm. After that… There’s some history around this, where people showed that the perceptron is not actually capable of capturing a wide variety of techniques, a wide variety of problems, and it was not obvious how to extend that. One of the big things that actually had to change to go from perceptron to more recent techniques is that we had to go from an algorithmic standpoint, like ‘This is the algorithm I’ll use to train’, to an optimization standpoint, like ‘This is a loss function we’re going to optimize, and this is how the optimization procedure is going to operate.’ That kind of optimization-based view leads to a whole class of techniques ranging from logistic regression, which is the next 101 machine learning model from the perceptron, to support vector machines and kernel techniques, to neural networks and beyond. [00:23:26] Soham: I see. And this shift to optimization techniques happened when? [00:23:33] Bharath: I’m not actually sure. By the 90s, there were optimization-based learning techniques. SVM was already there, logistic regression was already there, neural networks were already there. I don’t actually know exactly when it happened. Perceptron was invented fairly early on. In the middle, there was this AI winter, which was when all talk of AI died down, funding died down, and so on and so forth. When we resurfaced in the 90s, people were talking about that. So some of this also involved… In the 80s and 90s, back propagation was invented, and gradient descent was being figured out by people in optimization and control. There were a lot of things going on in control theory, and so on and so forth. A lot of things happened in that interim. [00:24:32] Soham: Got you. So one of the more common machine learning techniques people use are these convolutional neural nets [https://en.wikipedia.org/wiki/Convolutional_neural_network]. Can you tell me what a convolution is, and what it means to be using a convolutional neural net in vision? [00:24:45] Bharath: So a convolution is basically… Before we talk about a convolution, we have to talk about what a linear function is. The kind of thing we talked about when we said ‘Oh, you know, we’ll just combine, multiply some inputs with some weights and add them together.’ That’s an example of a linear function. A convolution is a special kind of a linear function. What a convolution does is, instead of thinking of your input as a long set of inputs, your input has some structure. Usually it’s a two-dimensional array or a one-dimensional array of inputs. [00:25:26] Soham: So this fits well with an image because you have a two-dimensional array of inputs. [00:25:30] Bharath: Yeah. So there’s a notion of space or time. So convolution comes first, actually, in the signal processing community. That’s where the whole idea comes from. And the idea is that if you want a linear function of these kinds of inputs that are invariant to space or time – so in a 2D array you have these two spatial dimensions. If you want a linear operation such that at every location in this 2D array it’s basically doing the same thing, that’s basically a convolution. So the operation itself looks like, at every location in this 2D array, you take the neighborhood of that location and pipe that to a linear function. So, neighborhood, pipe that to a linear function, out comes an output. Then you move one pixel over, again take the neighborhood of that pixel, pipe that to a linear function, out comes an output. You keep doing this for every location in the 2D array, and now you have a 2D array of outputs. So that’s the convolution operation… [00:26:27] Soham: So it’s sort of like you have a slate, and you’re moving it from pixel to pixel. [00:26:31] Bharath: Yeah. The other way people often think of convolution is as a sliding window. So you can think of this as being… So you have your 2D image, you have a small window through which you’re looking at, and you’re sliding that window over this image. At every location where you slide the window over, then you compute a simple linear operation of whatever you see. Convolutional neural networks are basically neural networks which have their primitive operation built up of this convolution. So they just have convolutions, a bunch of convolutions stacked on top of each other. The reason this is useful is because one property of natural images is that they tend to be translation invariant. So the spatial location (1,1) is not particularly different from the spatial location (10,10). All regions of the image are statistically the same thing, essentially. And the reason that happens is, you know, I can take a picture standing up, I can take a picture standing upside-down, I can take a picture kneeling on the ground, I can take a picture kneeling up. It’s rarely the case that you want the top of the image to be processed differently than the bottom of the image. You want everything to be processed similarly. The advantage of convolution compared to other things is that convolution can be expressed with very few parameters. Because you’re using the sliding window, you only need enough parameters to capture the function of the sliding window. So it’s a fairly small window… [00:28:13] Soham: So a function the size of the window, as opposed to the size of the entire input. [00:28:16] Bharath: Yes. So instead of a 300x300 image, you’re only looking at a 3x3 patch at any given time. So you need only nine numbers to describe this operation. [00:28:29] Soham: So what’s a sort of use-case that CNNs are actually used for? What do they actually do? [00:28:34] Bharath: So right now, they do almost everything. The simplest thing they do is recognition. You know, you give an image as input, and out pops a label which says ‘Dog’, ‘Cat’, whatever. The way you do this is, the model basically passes the image through a sequence of convolutions. In the middle it does a bunch of these sub samplings and discretizations, as we talked about earlier. This is the same kind of operation that SIFT does. These networks just do it lots of times to reduce a 300x300 image to a 1x1 probability distribution over class labels. Recognition is the big thing – in goes the image, out comes the class label. But more generally, you can have things where you feed in an image and out comes some featurization of the image, some description of what’s going on in the image. This you can use for basically anything. Any kind of operation you want to do on images, you want to figure out whether two images are close or not, you want to figure out if Image 1 has some artifact or not, you want to match two images, anything, you can use this kind of feature representation. And it turns out that you can train these convolutional neural networks to do one thing, like recognize cats, dogs, and other species, and in the process they produce a good intermediate representation that just ends up being useful for lots of things. [00:30:15] Soham: I see. And this is sort of why people train on ImageNet [http://www.image-net.org/], which is this broad collection of images… Ed. note: ImageNet [http://www.image-net.org/], originally created by then Princeton (now Stanford) Professor Fei-Fei Li [https://profiles.stanford.edu/fei-fei-li] and her group in 2009, is a vast database of images associated with common nouns (table, badger, ocean). The easy availability of this dataset and the competition [http://www.image-net.org/challenges/LSVRC/] between groups of researchers to top the ImageNet benchmarks fuelled massive advances in object recognition over the last decade. [00:30:21] Bharath: And then test on anything they want to. [00:30:24] Soham: So what is going on here? If I’m thinking about a recognition task, if I feed in a bunch of images and these convolutions are happening, what are these convolutions actually doing that allows the recognition to happen? [00:30:36] Bharath: It’s a good question, and we don’t really know. The reason is that it’s really hard to visualize what’s going on. There are efforts to do this, but none of it is particularly interpretable in any way. And things that are interpretable tend not to be faithful to the operation of the network. There are some things that are easy to know. You can ask ‘Well, what’s the first layer doing? What’s the first convolution doing?’ It turns out that the first convolution, the first thing these networks do, is detect edges of various kinds. This is also, for example, what SIFT does. This is also what we know the human visual system does. So all of this goes well with what we expect recognition systems to do. Edges are important because edges tend to correspond to places where object boundaries exist, and object boundaries are important for capturing shape. Edges is the first thing it does… [00:31:40] Soham: So within each window, it’s finding edges? [00:31:42] Bharath: Yeah. So if you look at the output of this, you’d have one convolution operation identifying all vertical lines in the image, another convolution identifying all horizontal lines in the image. As you go deeper… [00:31:56] Soham: And these are not engineered? This all happens as part of the training process? [00:32:00] Bharath: During training, the model discovers that it has to do this. To recognize that this is a cat, it needs to first detect edges. [00:32:08] Soham: So each parameter, which is some factor that’s applied to the input in each of these cells of the sliding window, starts untrained with some null value or something like that… does it start with… [00:32:20] Bharath: It starts with random noise. [00:32:22] Soham: Wth random noise. And it becomes trained such that the first layer is detecting vertical edges? And then the second layer is detecting horizontal edges? [00:32:30] Bharath: Within the same layer, you have multiple filters detecting things in parallel. So you have a bunch of things detecting edges of various orientations. The next layer… there is some evidence that what it ends up doing is detecting things like corners and blobs. Things like “Oh, there’s a red corner.” or “There’s a black-ish blob in the top left corner.” [00:32:54] Soham: And we know this because we can output the pixel output that comes out of that layer? [00:32:58] Bharath: For the first layer, you can just visualize exactly what the network is doing. For the second layer onwards, because of the face that the operation is now nonlinear… in the middle you do some nonlinear operation on each pixel, which makes it hard to actually visualize between the layers. [00:33:17] Soham: What kind of operation is that? [00:33:19] Bharath: So usually we do this called a rectified linear unit which, if you know signal processing, is often called half-wave rectification. So half-wave rectification means that if the output is positive you keep it, and if the output is negative you zero it out. So because this operation is now nonlinear, it’s hard to visualize what the second layer is doing. But second layer onward, what you can do is ask ‘How should my image change so that this particular pixel in this second layer output gets a really high value?’ And if you do that, you find what patches actually maximize a particular pixel value, you see blobs and corners and things. Beyond that, as you go deeper and deeper, it becomes really hard to understand what’s going on. Sometimes you see some patterns, like if it seems to like faces, or it seems to like particular colors in particular regions, but usually it becomes hard to understand. So it’s hard to figure out what the network is doing. Only thing you know is it learnt to do this to fit some labels in some training set. [00:34:34] Soham: I see. So let’s talk about a recent problem that you’ve worked on, that you published a paper on, and that you effectively solved, you’ve come up with some kind of solution to whatever you’re working on. From the beginning, to what you actually built. What was the problem, and what was the solution? What did you actually do to get there? [00:34:52] Bharath: That’s a good question. So given the context of where we are in this conversation, I’ll probably pick something that’s a bit older, which is from my grad school times. This was a problem we introduced, we started thinking about. The goal was basically this. I said that convolutional networks have basically been doing ‘Image in, label out.’ You can do a bit better, and you can say ‘Give me an image in, and tell me not just an image-level label, tell me where the different objects are.’ So this box, this contains a horse, and this box, this contains a person, and so on. For the purposes of this, it’s not important to know how you do this. Basically the idea is that you come up with a few candidate boxes, and then you pipe each box through a convolutional network. That’s basically the idea. So this problem is often called ‘object detection’. What we wanted to do is to say ‘Okay, we’ve figured out that this box contains a horse, but can I actually identify the boundaries of the horse?’ [00:36:05] Soham: Wait, so you’re provided an image with a box already drawn on it? [00:36:09] Bharath: Yes. For the purposes of this, let’s assume we’ve already solved that, object detection. In fact, we built on top of existing object detections. So we have an image, and we have a bunch of boxes, saying this box has a horse this box has a person, this box has a car. You can imagine, maybe, a cropped image where there is a horse bang in the middle of it enlarged. The problem is, you have a box, you know there’s a horse in it. What we want to do is, we want to identify the boundaries of the horse. Why is this important? This is important because for some applications, you really want to know these boundaries. For example for robotics, you may want to actually predict ‘How do I grasp this particular object?’ so you need the boundaries. Or in graphics, you want to crop an object from one image and place it into another, so you want to know the boundaries of the object. [00:37:04] Soham: So you want the minimal continuous boundaries of the object? [00:37:07] Bharath: Yeah. So you want to know ‘This is their head.’ and ‘This is their tail.’ and ‘These are the legs.’ and so on. The problem was, the way these convolutional networks operate, they collapse images – large, high-resolution images – to single labels. Boundaries are not single labels. We have to retain the resolution of the image, we need fairly precise localization. So the idea that we came up with… this was in discussion with my advisor. So we first posed this problem, and we had some initial techniques, and they were producing very imprecise boundaries. They were very blobby, it would look basically like a circle. So we thought ‘How can we get this to be more precise?’ We realized that, as I said before, the convolutional network is going through these stages of processing. Really early on, it has understandings of edges and line segments, which are very precise with respect to spatial location. You know exactly where the edge is. [00:38:20] Soham: In the first layer? [00:38:22] Bharath: Right, in the first layer. And as you go deeper into the layers, the abstraction of the model the network is producing increases. So you’re going from edges to things like object parts to things like objects. At the same time, the spatial resolution of these things decreases. And we want both. We want to know horse-ness, but we also want high spatial resolution. So what we’ll do is, we can actually take a trained network and actually tap into these earlier layers. So we can say, ‘Okay, I have a certain pixel in my image, and I want to figure out whether it lies on my horse boundary or not. I’ll basically go to each layer of my convolutional network and look at my output there.’ Early on I’ll ask ‘Does this pixel lie on this edge or not?’ In the middle I’ll be asking ‘Does this pixel lie in a horse leg or not?’ At the end I’ll be asking ‘Is this a horse or not?’ Putting that information together, I can basically solve this problem. I can basically tap into all the intermediate things the network has computed and combine that to do this kind of localization. And we did that, we really churned this out in an afternoon – the first results came out in an afternoon – and it was amazing. It was really precise localization of these boundaries that we’d not seen before. I sent the images to my advisor from a coffee shop and he was like ‘This looks great. The conference deadline is in six hours, can you write it up?’ [laughter] [00:40:05] Bharath: I tend to be someone who… my answer was ‘Um, no?’ So we went for the next deadline. That was the last thing I did in grad school, basically. [00:40:19] Soham: I see. And what you use to extract the boundaries, that was not actually machine learning? You just went in and deterministically extracted elements from each of the layers, to draw it on the fine limits. [00:40:32] Bharath: We just took all, we just took everything. Basically, what we did was, instead of just taking the last whatever the network has done at the very end, we take all the intermediate stuff and stack them on together. My advisor called this hypercolumn representation, but the general idea is also called skip connections. Usually skip connections went the other way, where people tried to use them for just saying whether it’s a cat or not, but our idea was you can actually get the spatial localization very well with this. So this was the idea. This was taking an existing convolutional network and using what it had already learned in an intelligent manner. [00:41:22] Soham: So when you say skip connection, were you connecting the earlier layer to another layer up ahead? [00:41:26] Bharath: You could describe the architecture in a few ways. The way we described it was that we were taking the intermediate features that the network had produced and concatenating them together, appending all of them together, to produce a new feature representation. And that was being fed to a simple machine learning model. [00:41:42] Soham: I see. So you had another model? [00:41:46] Bharath: We tried a few variants. But now if you put all of this together, it now looks like skip connections, where earlier layers are feeding into later layers. It’s like a different interpretation of… [00:41:58] Soham: Isomorphic architecture. I see – and it turned out that if you had all this information, then a simple model could easily produce very accurate bounding boxes with some labelled examples. That’s very neat, that makes sense. I think it neatly encapsulates that we don’t really know how these things work but we can exploit some things about their structure to produce the sorts of results that we want. Is this how you describe most of the vision research that you do now? You architect some kind of network to solve some kind of problem, and then you use features of that to do specific things in interesting ways? [00:42:50] Bharath: Actually, nowadays the kind of work I do takes a very different flavor. In this case, I took a trained network and tried to use my domain knowledge to extract information from it. Nowadays I take a reversed standpoint: how do I, from the start, teach the network domain knowledge? The machine learning way of looking at things is ‘Give me examples, and we’ll figure it out.’ It’s kind of like teaching a kid multiplication by giving nothing except number 1, number 2, number 1 times number 2… Giving a long list of tables. And sure, they’ll figure it out… well, I don’t know if they’ll figure it out, but if you have a million examples, you can figure it out. But that’s not how we teach kids. We teach kids ‘This is what multiplication means, and this is the algorithm to do it.’ And then they practice on a few examples and they’re done. This ends up being much more efficient. The same question applies here. Can we move beyond giving them a mass of datasets – because usually we don’t have a mass of datasets, except in very few cases – can we go beyond giving neural networks a mass of datasets and teach them some domain knowledge? [00:44:02] Soham: So what’s an example of this? Just in short? [00:44:05] Bharath: So one example that we recently did was… we want to train a neural network to recognize a new class with very limited training data. And we simply said ‘What if, in addition to a limited number of training examples, for maybe one image I tell the network where the object is in the image?’ So I say, maybe ‘This region of the image, this is where you should look at.’ [00:44:34] Soham: So you initially give it a bunch of training examples that say ‘The object is somewhere in this image’? And now you also put a bounding box around the object? [00:44:42] Bharath: On just one image. It’s a very simple, small additional thing, and it gives a massive improvement in performance. [00:44:47] Soham: And you don’t put a bounding box around every image because it’ll be more expensive to produce a training set? [00:44:51] Bharath: Yes. The idea is, an expert can give one image with one bounding box, and then it can get other examples through some other process. [00:45:01] Soham: So if you wanted to do this for diagnostics, you can say that you have all these cancer examples, and then in one case you have the diagnostician paint out exactly where the tumor is. [00:45:09] Bharath: Exactly. And that leads to significantly larger improvements, significantly larger performance. I don’t remember the exact number. [00:45:17] Soham: Does this work with just standard CNN architectures? [00:45:20] Bharath: There’s a few caveats to this. Right now we’re still exploring the full extent of what this can do. We’ve got one paper out of this and one submission that seems to suggest that this is actually a fairly general idea, but I wouldn’t go so far as to say this is something you should always do. It seems to be the case that it’s useful. [00:45:44] Soham: Interesting. So in this case you just changed something about the data representation for the training set without changing anything about the architecture? [00:45:50] Bharath: Yes. And that’s usually how I work nowadays. My mantra for a lot of my graduate students is ‘If at once you don’t succeed, redefine success.’ [laughter] Bharath: So can you change the problem setup to make it work better? [00:46:08] Soham: Okay, so two more questions. Can you describe any piece of work that you’ve seen recently that’s not your own work that you thought was interesting? And why did you think it was interesting? [00:46:18] Bharath: This is a line of work that I have initial results in, initial experiments in, but I’m not an expert in this. And again, this is not one paper but a class of techniques. This has been work that’s happening on 3D reconstruction again, but from a single view. So you get a single view of an object or an image, say a single view of a cup, and you have to reconstruct the 3D model. People have been using machine learning techniques for this, and that has seen, actually, some incredible results. There’s work out of Berkeley, there’s work out of Oxford on things where basically what they’ve done is, they’ve trained the machine learning model so that, from a single image, it predicts a hypothesis for what the 3D shape should look like, and then it tries to render the hypothesis and tries to match it with the corresponding image. That’s how it learns. And it manages to learn… I think there are results now on just producing 3D models… I think one of my friends has talked about this a long time ago. Learning single-image 3D without a single 3D image. You don’t get any 3D information during training, but somehow, using knowledge of geometry, you’re able to train the network to actually do this 3D reconstruction. Original work from CMU, from David Fouhey [https://web.eecs.umich.edu/~fouhey/] and colleagues. Then there’s work from Berkeley, and from FAIR, from Shubham Tulsiani [http://shubhtuls.github.io/] and colleagues, Stanford from Leo Guibas [https://geometry.stanford.edu/member/guibas/]’ group, and Oxford from Andrea Vedaldi [http://www.robots.ox.ac.uk/~vedaldi/]’s group. There are a bunch of techniques. And also Noah (Snavely) [http://www.cs.cornell.edu/~snavely/] – I don’t mention Noah that much, because you’d just think I’m biased towards Cornell – but Noah also has a bunch of this… [00:48:39] Soham: Noah Snavely, who you mentioned earlier, right. So what, in your mind, are the biggest open problems in computer vision still? What are the huge problems that someone might be inspired to take on in the field? [00:48:54] Bharath: The biggest problem right now is that… So there’s this impression, especially in the public domain, that computer vision is solved, that we can basically do any kind of recognition task, and kind of perception task. And that’s dangerous. We’re very far from any kind of solution, so it’s more like ‘What have we done?’ rather than ‘What’s left to do?’ The big thing that’s missing is that human perception… Humans can basically very quickly learn new things. They know exactly when they see something new, they can learn from very few training examples, and they can learn with very little computational effort. Whereas current techniques, they can only learn a small number of things with lots of data and lots of computational effort. That’s a big gap which causes all sorts of issues when you apply these techniques to the real world. One of the resulting challenges is what I think of as inequity, in the sense that, right now, the kinds of techniques we’ve produced are only applicable to Google-scale or Facebook-scale problems. [00:50:16] Soham: So you have to be in industry or have an industry partner to run these computations? [00:50:20] Bharath: Right. Even if… You are in academia, and you are solving those problems, but the kinds of challenges that we are focusing on, like ‘How can we learn to recognize these 200 classes that are relevant to Internet applications?’ are very different from a million examples. ‘How do I use a billion examples to build a good convolutional network?’ That’s very different from the kind of application where a radiologist comes to you and says ‘I want to recognize this specific disease, and here’s 5 images and I can give you 5 more if you give me a year.’ [laughter] Bharath: But I don’t have them here. And, you know, I have this Windows 98 machine lying around to train. It’s a very different kind of problem, and that’s sort of a big blind spot right now. [00:50:56] Soham: I see. So sort of moving closer to the human ability to learn things with few examples with not that much computation, so that we can move to smaller in scope but very important humanitarian and other problems like diagnostics. In a related question, do you think that it’s possible to do good research in computer vision outside of industry, or with no industry partnerships? [00:51:22] Bharath: I think it is definitely possible, but you have to have this kind of a mindset. If you talk to anyone in this area, you’ll get a similar answer which is if you try to beat industry at the industry game, that’s stupid. We have to be able to play to our strengths. And our strength in academia is (a.) Exposure to a much wider set of colleagues – people who are doing material science, who are doing agriculture, animal husbandry and all sorts of things – and (b.) The idea that we can explore ideas that don’t immediately lead to impact on benchmarks. That’s sort of the key thing that we have, this longer-term focus. As academicians, we need to focus on that, we need to push on that. [00:52:24] Soham: If you were to make a one-minute pitch to somebody who was considering doing computer science that they should work in vision, what would that pitch be? [00:52:31] Bharat: Woah. That’s a bit hard. I think the biggest… The most interesting aspect of computer vision is the fact that… To me, the biggest aspect of computer vision is this big, big gap between what I can do – or what a toddler can do – and what a machine can do. It’s so far away from what toddlers can do. I have a one year old niece… she was able to, when we’d come home, she would rush to us, take our suitcases, open the bag, and throw everything out. Getting there – this is both robotics and vision in this case – getting there in terms of perception, figuring out ‘How did she know, having never seen a suitcase, that this was a thing you could open, and this had things inside of it?’ [00:53:27] Soham: Without even any actuation, just coming up with the plan at all. [00:53:31] Bharath: Just coming up with the plan at all. One fine day, she just decides to do this. Getting there, that’s the key question. How do we get there? We’re very far, we’re nowhere near. So I think that’s the motivation. [00:53:42] Soham: Great, thank you so much! [00:53:44] Bharath: Yeah, cool. [ringing sound] [00:53:45] Soham: Thanks for listening to that episode of Segfault. You can find Professor Hariharan at his website, bharathh.info [http://home.bharathh.info/]. You can find me on twitter @sohamsankaran [https://twitter.com/sohamsankaran] and at my website, soh.am [https://soh.am]. If you liked this episode, listen to some more and subscribe at honestyisbest.com/segfault [https://honestyisbest.com/segfault] or via your favourite podcast app. We’d appreciate it if you shared this episode online, and in particular, with anybody that you think could benefit from it. Segfault is a production of Honesty is Best. Find more of our work at honestyisbest.com [https://honestyisbest.com]. Finally, as I mentioned at the top of the episode, Pashi is hiring someone with experience in compilers and programming languages. If you fit this description, or know someone who does, please feel free to email me at soham@pashi.com. [00:54:28]

Adrian Sampson, Alexa VanHattum, and Rachit Nigam of Cornell’s Capra group join me to discuss their differing perspectives on what the research field of Programming Languages (PL) is really about, what successful PL research looks like, and what got them interested in working in the area in the first place. We also talk about some of their recent research work on programming languages for hardware accelerator design. Consider subscribing via email [https://honestyisbest.com/segfault#subscribe_top] to receive every episode and occasional bonus material in your inbox. Soham Sankaran’s Y Combinator [https://ycombinator.com]-backed startup, Pashi [https://pashi.com], is recruiting a software engineer to do research-adjacent work in programming languages and compilers. If you’re interested, email soham [at] pashi.com for more information. Go to transcript [https://honestyisbest.com/segfault/2020/Jun/16/programming-languages/#transcript_anchor] Note: If you’re in a podcast player, this will take you to the Honesty Is Best website to view the full transcript. Some players like Podcast Addict [https://podcastaddict.com/] will load the whole transcript with time links below the Show Notes, so you can just scroll down to read the transcript without needing to click the link. Others like Google Podcasts will not show the whole transcript. SHOW NOTES Participants: Soham Sankaran [https://soh.am] (@sohamsankaran [https://twitter.com/sohamsankaran]) is the founder of Pashi [https://pashi.com], and is on leave from the PhD program in Computer Science at Cornell University [https://www.cs.cornell.edu/]. Adrian Sampson [https://www.cs.cornell.edu/~asampson/] (@samps [https://twitter.com/samps]) is an Assistant Professor in the Department of Computer Science at Cornell University. He works on programming languages and computer architecture. Alexa VanHattum [https://www.cs.cornell.edu/~asampson/] (@avanhatt [https://twitter.com/avanhatt]) is a PhD student in CS at Cornell who is advised by Adrian Sampson. She works on systems programming languages, applied formal methods, and usability for programming tools Rachit Nigam [https://rachitnigam.com/] (@notypes [https://twitter.com/notypes]) is a PhD student in CS at Cornell who is also advised by Adrian Sampson. He is interested in tools and languages for hardware design. Adrian, Alexa, and Rachit are all part of the Capra Group [https://capra.cs.cornell.edu/] (Computer Architecture and Programming Abstractions) at Cornell, which Adrian directs. Material referenced in this podcast: Logic for Hackers/Logic for Systems at Brown University: 2014 edition [http://cs.brown.edu/courses/cs195y/2014/], 2020 edition [http://cs.brown.edu/courses/cs195y/2020/pages/assignments.html]. The Structure and Interpretation of Programming Languages [https://mitpress.mit.edu/sites/default/files/sicp/index.html], commonly known as SICP, which is an influential introductory textbook based on MIT’s original Introduction to Computer Science course, taught in the functional programming language [https://en.wikipedia.org/wiki/Functional_programming] Scheme [https://en.wikipedia.org/wiki/Scheme_(programming_language)]. It formed the basis for introductory CS courses across the world, and though most such courses have been phased out, Soham’s intro to CS course at Yale, CPSC 201 [https://zoo.cs.yale.edu/classes/cs201/Fall_2019/old_index.html], continues its legacy to this day. Higher-order functions [https://en.wikipedia.org/wiki/Higher-order_function]: functions that take functions as input and/or return functions as results. Adrian’s 2010 OOPSLA [http://www.sigplan.org/Conferences/OOPSLA/] paper Composable specifications for structured shared-memory communication [https://dl.acm.org/doi/10.1145/1932682.1869473]. Full text of the paper available here [https://cs.wellesley.edu/~bpw/research/files/osha-oopsla2010.pdf]. Rachit’s Dahlia paper from PLDI [https://conf.researchr.org/series/pldi] 2020: Predictable Accelerator Design with Time-Sensitive Affine types [https://rachitnigam.com/publication/dahlia/]. Very High Speed Integrated Circuit Hardware Description Language, better known as VHDL [https://en.wikipedia.org/wiki/VHDL]: a hardware description language to specify digital circuits. Hans Boehm’s 2005 PLDI [https://conf.researchr.org/series/pldi] paper Threads cannot be implemented as a Library [https://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf]. Flatt et. al.’s Programming Languages as Operating Systems [https://www2.ccs.neu.edu/racket/pubs/icfp99-ffkf.pdf] from ICFP [https://dl.acm.org/conference/icfp] 1999. This paper was alternatively titled Revenge of the Son of the Lisp Machine. Caroline Trippel’s 2018 MICRO-51 [https://www.microarch.org/micro51/] paper CheckMate: Automated Synthesis of Hardware Exploits and Security Litmus Tests [https://mrmgroup.cs.princeton.edu/papers/ctrippel_MICRO51.pdf]. Daniel Jackson [http://people.csail.mit.edu/dnj/]’s Alloy [https://alloytools.org/], an “open source language and analyzer for software modeling”. John R. Ellis’ 1985 Yale PhD thesis – Bulldog: A Compiler for VLIW Architectures [http://www.cs.yale.edu/publications/techreports/tr364.pdf]. Credits: Created and hosted by Soham Sankaran. Mixed and Mastered by Varun Patil [https://www.youtube.com/channel/UC7_0sQfCXyofKxxfo0yTXLA/about] (email [varunpatil.audio@gmail.com]). Transcribed by Pratika Prabhune, Soham Sankaran, and Eric Lu. TRANSCRIPT [00:00:00] Adrian: There’s something about programming languages – it’s like a lens of looking at problems that is uncompromisingly precise. We’re all just huge nitpickers – we really want to actually understand how things work instead of getting just a rough notion of how things might go. [phone rings, number dials] [00:00:21] Soham: Welcome to episode one of Segfault [https://honestyisbest.com/segfault], from Honesty is Best [https://honestyisbest.com]. Segfault is a podcast about Computer Science research. I’m your host, Soham Sankaran. I’m the CEO of Pashi, a start-up building software for manufacturing, and I’m on leave from the PhD program in Computer Science at Cornell University, located in perpetually sunny Ithaca, New York. Computer Science is composed of many different areas of research, such as Operating Systems, Programming Languages, Algorithms and Cryptography. Each of these areas has its own problems of interest, publications of record, idioms of communication, and styles of thought, not to mention, one level deeper, a multitude of sub-areas, just as varied as the areas they are contained within. This can get extremely confusing very quickly, and I certainly didn’t know how to navigate this terrain at all until I was in graduate school. Segfault, in aggregate, is intended to be a map of the field, with each episode featuring discussions about the core motivations, ideas and methods of one particular area, with a mix of academics ranging from first year graduate students to long tenured professors. I hope that listeners who have dipped their toes in computer science or programming, but haven’t necessarily been exposed to research, get a sense of not only the foundations of each area - what work is being done in it now, and what sorts of opportunities for new research exist for people just entering, but also what it is about each specific area that compelled my guests to work in it in the first place, and what the experience of doing research in that area every day actually feels like. This is the map of CS that I didn’t even know I desperately wanted in high school and undergrad, and I hope folks who are just starting their journey in computer science will find within it ideas that will excite them so much, and so viscerally, that they can’t help but work on them. Without further ado, here’s episode one - Programming Languages. [ringing tone] [00:02:05] Soham: Welcome! I am with Alexa VanHattum, Rachit Nigam (via the magic of internet telephony), and professor Adrian Sampson, a subset of the CAPRA group at Cornell. If you want to just briefly introduce yourselves? [00:02:21] Adrian Sampson: I’m professor Adrian Sampson, no one has called me that in a year! [laughs] Adrian: Thank you! I’m a professor of computer science. I do a combination of programming languages in computer architecture. [00:02:34] Alexa: I am Alexa. I’m a second year PhD student here at Cornell. I study computer science. My stated interest is programming languages, but my real interest is a little more broad than that. I’m broadly interested in using formal methods to better understand how we use computer systems. I have a long-standing interest in computing education as well as the usability of programming languages tools, and I’m someone who spent a little bit of time, around 2 years, in industry working for Apple, so I’m also particularly interested in the intersection of compilers in programming languages in research and real world compiler hacking. [00:03:14] Rachit: I’m Rachit. I’m also a second year PhD student in the CAPRA group. I’m broadly interested in using programming languages to understand Computer Systems. I did my undergraduate at UMass Amherst where I worked on Javascript. I have built many compilers which do various sorts of things by exploiting and misusing language abstractions. At Cornell, I worked on this thing called high-level synthesis which is the idea of taking high-level programs and turning them into hardware accelerators. That’s what I’ve been working on for about a year and a half now. [00:03:53] Soham: Great, okay. So, programming languages is the field of computer science that I know the least about. Why don’t you explain to a hypothetical idiot viewer a.k.a me, what programming languages actually is, for someone who’s used one before but doesn’t know what the research field actually entails? [00:04:13] Adrian: I think there’s a probably good reason why programming languages seems like an obscure field to you, Soham, because the name doesn’t tell you very much about what it is. It sounds like it should be about inventing new programming languages, the practice of designing a new language. We do that, but it’s actually a tiny fraction of what programming languages people do. So it’s surprisingly hard to describe what we actually do, but it all has to do with how you express.. [pause] Adrian: … definitely isn’t right. I was going to say how you express your intent to a computer, but a lot of times people do programming language research that has absolutely nothing to do with humans typing code and more has to do with the representation of programs. So when I think of it, I would probably have to characterize it as all the stuff that has to do with the representation of programs; what computers are capable of doing and combined with that, this very weird theoretical tradition, this formal theoretical tradition that has to do with building up symbolically notions of computation from things like.. traditionally, it looks like algebra I guess. So there’s this whole area of theoretical programming languages that answers questions like what computers are capable of in a way that is different – from a totally different tradition – from how complexity theorists and algorithms people view it, and that underpins all the stuff we do even when it’s really practical, that view of what computation is. [00:05:52] Soham: How would you distinguish that view of what computation is from the complexity theorist view? [00:05:57] Adrian: Okay this is in a really cartoonish way, which is that there are two fundamental formalisms for how computers work. The theorists use one - Turing machines. If you’re familiar with Turing machines, they’re like a notional machine that no one actually builds. They were invented by Alan Turing for everything a computer could possibly do. It works by having this magical infinite tape that’s its memory, that reads and writes symbols on it. It has a state transition diagram that tells us what it’s doing. There’s a different formalism that also represents everything a computer can possibly do called the Lambda Calculus that was invented before computers, that was originally invented to sort of distill down notions of symbolic algebraic reasoning. I think it’s a really surprising fact that those two models of computation are actually equivalent, but everything that we do in programming languages has descended from the latter, from the Lambda Calculus in a certain way, or has some resemblance to this algebraic manipulation stuff as opposed to this state-diagram tapey stuff that theorists do. [00:07:08] Soham: Got you. And so, sort of aesthetically different, stylistically different, but fundamentally mathematically equivalent? [00:07:13] Adrian: Yeah, mathematically equivalent. I guess the consequence is that programming languages people are usually much more concerned about what programs actually mean, and are much more precise about the meanings of programs and what computers can actually do in the first place, as opposed to, we have almost nothing to say about asymptotic bounds on running time. [00:07:41] Soham: I see, which is the bread-and-butter of complexity theory. Excellent! Do either of you have anything to add to that? [00:07:49] Alexa: I’d say yes. Another part of the field of programming languages that broadly interests me is also thinking about the fact that there are an increasing number of professionals in the United States and ‘programming languages’, the field, is really closely related to ‘software engineering’, the research field. I was broadly interested in, what are the tools that we need to best be able to explain computation in a very practical way? So I think that’s something really interesting to me about trying to say, okay, there are millions of software engineers in the world. How can we unite this long history of very theoretical tradition of understanding how computation works, to actually align that with what people do on their day-to-day. How do we find a middle ground between what we theoretically want computation to look like, and what computation looks like currently in the hands of software engineers, and unify them to be a better system of programming and computation. [00:08:41] Soham: So this is sort of like, usable abstractions for expressing programs in some sense? [00:08:46] Alexa: Yes. In that way to me, it’s very closely related to the field of systems, off operating systems and networked systems and like that, and thinking of how we think of abstractions that are actually usable as well as performant. [00:08:59] Soham: Rachit, do you have thoughts? [00:09:01] Rachit: Yeah. I have a different take on the philosophical musings of PL as a field (I’ll use PL as an acronym for programming languages). So the following statement comes from Matthias Felleisen who is a professor at Northeastern (University in Boston). He has been studying PL for a long time, and he said to me when we were talking about this, “Every application domain comes with an ontology of its own, and ontologies often come with their own languages.” But I think the distilled form of this for PL as a field is that PL people, at least in my eyes, don’t do work in isolation or shouldn’t do work in isolation. They should go to a field, for example, in our group we do architecture and hardware abstractions, so we go into a field and we figure out what the abstractions really mean. If you have functions, right – people have had functions for a really long time in every language, but the meaning of functions is not well understood and PL people have been trying to formalise functions for a really long time – understanding them allows you to build more powerful abstractions and think about what your programs really mean. I keep saying ‘what programs really mean’ and I really want to stress this point because once you know what programs really mean, you can do all sorts of cool things like verifying the programs and trying to automagically synthesize parts of your program. But to do any of that, you have to understand what your programs mean, and I think that’s what PL people do fundamentally. They go to a field – they can go to networking, they can go to security, or they can go to architecture – and they can pick a language and figure out what the language is actually trying to say and what ideas it tries to capture. [00:10:48] Soham: I see. So in a caricatured way, what you’re doing is going to people and saying ‘Ah! I see what you’re doing but there’s a broader organizational principle to what you could be doing that we can demonstrate to you’ which I’m sure makes you very beloved in the fields you [laughs] want to attempt to do this in. [00:11:09] Rachit: I think, when you can successfully do it, you can really change fields. I think a lot of, like if you look at the history of computing, it’s a history of languages. When you can really express the languages, you can really express the ideas, and you can build bigger and better stuff quickly. I think you can do it, it’s just hard. [00:11:29] Soham: It is very hard. [00:11:31] Adrian: I just want to point out that the fact that you got such a long-ended and varied answer to your question, Soham, highlights Rachit’s point that programming languages is, these days at least, mostly defined by its connections to other fields. That is, there are certainly people who do programming languages purely for programming languages’ sake, but I feel like they’re in the minority these days, and there’s a lot more outwardly facing work going on in PL. [00:11:59] Soham: I see. It seems like these different conceptions of PL are united by reference to this sort of historical tradition. A lot of sub-fields have evolved like this, but there may not be a single unifying methodological thread necessarily across all of the work. Would you probably agree with this? [00:12:18] Adrian: Yeah totally. [00:012:19] Soham: That’s interesting. It’s interesting to conceive of what it would look like in a different universe where the history looked different, and his field, software engineering and systems had different boundaries with each other. Okay, so from each of you now, I want to hear what got you interested in programming languages in the first place? A research problem, a paper, or experience, and why you thought that was interesting enough to push you into the field? [00:12:47] Alexa: So for me, I took a very roundabout way to coming to this research area, which is that I started college as a Physics Major. I had never programmed, I didn’t have much interest in programming but I took a single CS class and I really liked it. I then went off and did my first bit of research in Computational Physics. I was doing a summer program where I was trying to run simulations of particle depositions, work that was grounded in material science and solid state physics, and tried to model it in a programming language. I was given the option of using Fortran or C at the time, and little 20-year-old me decided to hack on C for the summer. This was extremely hard. I was doing these kinetic Monte Carlo – which is just a kind of simulation – these giant 2 dimensional C matrices, and found that I just kept getting incoherent results. I would think I was on the right track and then I would realise I had some pointer arithmetic error. So I had this summer of a lot of frustration where I did interesting work and got interesting results but felt fundamentally the tools I was using weren’t very good. This stayed with me when I later took some applied formal logic classes and intro to programming languages at my undergrad institution, realising that there are ways to formally study this as a field, to think about usability of programming tools that we use. I came about it from the sense of having actually taken first this formal logic class at Brown, where I went to an undergrad, called Logic for Hackers at the time, it’s now called Logic for Systems, where you use a lot of different tools to model software at a high level and try to find bugs in your reasoning and implementation. So I came at programming languages from frustrations of usability of existing tools and came at it that way, rather than from a fundamental mathematical background or from a research-first background. I came at it from the side of the applications themselves. [00:14:45] Soham: So to a pragmatic ‘How can I use this to improve the thing that was broken for me?’ [00:14:50] Alexa: Yeah. And then found out that it was actually fun to work on compilers and do the work that’s necessary to realise those theoretical implications. [00:15:00] Soham: Rachit? [00:15:02] Rachit: I’m going to give a very cheesy answer, which you’re not supposed to do. If you ever want to tell someone why you ever got into a field you’re supposed to give all these very scripted… Anyways, never mind! So, I seriously took up programming when I was in 11th grade. I used to program in C++ ‘98. The details of what C++ looks like are not important here. It’s just like a procedural language. You can write all your pointer manipulating code in there. Then I came to college and I was taking an intro to Java, and I wasn’t particularly enjoying that. Someone recommended I go read this book called ‘Structure and Interpretation of Computer Programs’ which is this really old book from MIT’s introductory (CS) course on this language called Scheme. Scheme is this funky weird language that’s filled with a lot of parenthetical syntax, but the important part with Scheme was that it had something called Higher-order Functions. The idea was that you can write a function that, as an input, takes another function, and it can return functions. The first time I really understood this idea, I remember vividly. I was sitting in some economics class and I was instead coding in Scheme. And the first time I wrote my first higher-order function I was like ‘Wow, this is really cool. I really need to learn more about this.’ Then I reached out to professors and started doing research, and the more I learned about this – and there’s a lot of deep technical work, in-the-weeds work that I did – every time I did something that fundamentally explained a new abstraction to me, I found that to be really cool. That’s what hooked me in the first place, and that’s what keeps me hooked to the field. Every time I can look at a paper, look at an idea, or look at a program and figure out what that program is really doing, I get very excited. That’s why I do PL. [00:17:10] Soham: Got you. So your mind was completely blown by these higher-order functions and this was the entry point… [00:17:15] Rachit: Totally blown! Okay so, if you write C++ programs, you’re forced into believing that all you can pass in it is data, and data looks like numbers, right? You have numbers and maybe you can put them in some structures, but fundamentally you’re just passing around numbers, and the idea that you can pass these higher-order functions, higher-order ideas that represent some computation and not just data, and the realisation that ‘data and computation are so fundamentally related’ was just mind-blowing. It was such a cool idea. Still exciting till today. [00:17:53] Soham: It is a very neat idea. [00:17:55] Rachit: Yeah. If you look at modern languages, they’ve all sort of in their own time, 20 years later, adopted this idea – every major language from Javascript to Python, even C++, because it’s such a good idea. [00:18:11] Soham: Wait, C++ has higher-order functions? [00:18:13] Rachit: It has Lambdas. [00:18:15] Soham: Oh, it has Lambdas, sorry. Fascinating! [00:18:18] Adrian: C has function pointers. [00:18:19] Soham: C does have function pointers. Alexa: Yeah. [00:18:20] Rachit: C has function pointers. But I’m very glad that I didn’t learn of higher-order functions through function pointers because I’d be very angry… [laughter] Rachit: …as I was when I learned about function pointers. [00:18:30] Soham: I messed around with function pointers in high school, it wasn’t fun! [laughter] [00:18:36] Soham: Would not recommend! Rachit: Whoops! [giggles] [00:18:40] Soham: Okay Adrian, what about you? [00:18:41] Adrian: I started grad school trying to decide between two different sub-fields of CS - computer architecture and theory. I’d done theory, algorithms stuff - algorithms for optical networking - in undergrad, and I thought that would be pretty cool, but it turns out I’m very very bad at theory. I tried doing it and I’m super glad that I don’t do that! I’m glad there are people in the world who are actually good at it I guess… [laughter] Adrian: …who are not me! I was sort of simultaneously discovering that and also trying to get started on architecture research. I was talking to my advisor, and – I think it’s fairly standard – my advisor thought with this new grad student, he asked me ‘which of several ongoing projects would I like to try getting involved in?’ I decided one that would sound super cool was one that was addressing this fundamental tension in computer architecture, which is the difference in parallel processors between message-passing and shared memory, that you can either have machines that have access to one big memory and they all communicate through that, or they actively send messages to each other and receive those messages to communicate. They have a fundamental set of trade-offs but all the real machines that we use today are all shared memory machines. The multi-cores all have the one gigantic main memory. So the idea was, maybe we can remove some of the sharp edges that everyone knows about the program ability of those machines, that is you accidentally write something that someone else was using in that shared memory by restricting who can read and write shared variables after the last person who read and wrote it. So you’re sort of restricting the communication through shared memory in a message-passing style, and this seemed really interesting to me. It had a nice formal graph processing properties that seemed cool to me, and as we got deeper and deeper into it I realised that what we were actually doing was designing a set of annotations to Java and a one-time checker that gave you exceptions when you violated the policies that you’d written down, which is definitely programming languages research. Ed note: Adrian is talking about his 2010 OOPSLA [http://www.sigplan.org/Conferences/OOPSLA/] paper Composable specifications for structured shared-memory communication [https://dl.acm.org/doi/10.1145/1932682.1869473]. Full text of the paper available here [https://cs.wellesley.edu/~bpw/research/files/osha-oopsla2010.pdf]. So by accident, I ended up doing programming languages research in order to solve what I thought of, as a first year grad student, as an architecture problem. To look back on it, I now do actually believe that programming languages is the right way to solve architecture problems. I came out originally from thinking that all these problems, building and defining what a computer is from the silicon up seemed really cool. But there are not many interesting problems that are solved just at the level of designing processors, just fiddling around with gates, or trying to make existing ways that we build computers a little bit faster. All the interesting problems have to do with looking up the system stack to make things more programmable or getting collaboration from the compiler, for example. All that stuff has so much more to be done than the traditional areas of computer architecture that are locked in their own abstraction level. No offence to anybody who does that kind of research! Again, I’m glad there are people who do that, but I think it’s way more interesting and there’s a lot more potential to change the world with this sort of multi-level view on how hardware works. [00:22:01] Soham: Is this because a lot of the fundamental hardware work happens in industries? Like, you can’t really do chip design in academia (anymore)? [00:22:12] Adrian: I think that is definitely part of the reason. Modern CPUs, for example, are so wildly complex, that you can’t possibly say whether your branch predictor is better or worse than Intel’s branch predictor because you have no idea how Intel’s branch predictor works. But I think even in a situation where everything were open source and we could inspect them, I still think that traditional architecture work that does not attempt to do some sort of programming languages or compiler stuff is tying itself to a legacy that comes from the 60s in terms of the way that we design the interface to processor. It is fundamentally trying to say ‘we’re building this machine that runs instructions one after the other or at least appears to, and we’ll just try to take that thing and make it incrementally better.’ That approach is fundamentally limited. It’s like you are hiding behind an abstraction boundary that can only get you so far. That is how architecture has worked up until about 10 years or so ago, but suddenly things are beginning to shift, and we have the chance to actually change how things look and work in the hardware, as opposed to just changing things underneath that abstraction level. [00:23:23] Soham: That makes sense. Actually continuing out from this, can you talk about a recent research problem that illustrates what you mean here, that you’ve worked on? [00:23:32] Adrian: How recent do you want? [00:23:36] Soham: More recent than your PhD maybe? But not necessarily in the last 25 minutes. [00:23:44] Adrian: Sure, yeah! Can we talk about Dahlia stuff, Rachit – is that okay? Ed note: he’s talking about Rachit’s Dahlia paper from PLDI [https://conf.researchr.org/series/pldi] 2020: Predictable Accelerator Design with Time-Sensitive Affine types [https://rachitnigam.com/publication/dahlia/]. Rachit: Yeah. [00:23:50] Adrian: Cool. Okay. So, one thing we’ve been working on in the lab a lot lately is using a Programming Languages view on generating specialised hardware for particular applications. The motivation being that this traditional approach to designing hardware that I’ve been talking about - which is you use a standard instruction set to express your program and then hope it goes pretty fast - has fundamental limitations, in part because it is trying to do everything with a single piece of hardware. There is a sea change in the way that computers are built going on right now – it has to do with the designing of special focused hardware that can do individual tasks far more efficiently than a general purpose processor can do them. So there are great stores of efficiency to be unlocked if we had the ability to generate specialised hardware accelerators for many new domains. Unfortunately, this is ridiculously difficult to do with current programming tools. The languages people use to design hardware at many different levels of abstractions are all… If you are a programmer, you wouldn’t even recognize these languages or these compilers. They are so incredibly difficult to use, and the compilers are so proprietary and buggy, that they are a huge impediment to getting anything done in the area of designing hardware accelerators. So, our notion is that it might be nice if we could design specialised hardware and reconfigurable hardware in a way that more resembled ordinary software development. If we had languages and tools that actually worked and produced the results you thought you would get from the source level, then we could unlock the potential of millions of human designers to build hardware accelerators that the tools currently prevent them from doing. Is that a fair summary, Rachit? [00:25:49] Rachit: Yes. [00:25:52] Soham: That’s really cool. So when you talk about these hardware design languages, you’re talking about FPGA stuff, or VHDL, or something like this? [00:26:00] Adrian: Yeah. FPGAs are Field-Programmable Gate Arrays which are a kind of reconfigurable hardware. They are actually a nice way to get to the properties of hardware specialisation without paying the, like, millions and millions of dollars that it would cost to manufacture actual silicon. Soham: Right. [00:26:17] Adrian: So, a lot of our research currently focuses on using those as a general purpose compute unit; having programming languages that are hopefully higher level than VHDL which you mentioned, to express what you want and how you want to compute it on an FPGA. [00:26:35] Soham: So currently you’re competing with things like VHDL that do, roughly speaking, work at the gate level or with slightly higher level than gates but not all that much? [00:26:43] Adrian: Yes, that’s right. The other thing that we’re competing with is a very strange notion that some people had a couple of decades ago, to compile C to languages like VHDL that work at the gate level. That is, to take in – according to them – arbitrary C programs and try really really hard to generate a good accelerator. It is perhaps not surprising that these compilers, which are called high-level synthesis or HLS compilers, which Rachit mentioned a bit ago; these existing compilers don’t work, they don’t do what I just said. They don’t take arbitrary C programs and generate good accelerators. They take a funky, ad hoc, weird subset of C that sometimes works and compile it occasionally to an accelerator that might or might not be good. [00:27:26] Soham: Occasionally as in, it’s not deterministic whether it compiles at all? [00:27:30] Adrian: Well, not deterministic is a little too far. Depending on the… Let’s put it this way: small changes in the input program could cause, and do cause, these compilers to reject your program or to generate very bad hardware. Soham: I see. [00:27:46] Adrian: So, I think it’s totally unsurprising that if you try to repurpose a language which is designed for CPUs several decades ago, and repurpose it as something for generating accelerators, they would not work as well as it does for a CPU. [00:28:00] Soham: Is there some kind of heuristic-based translation to hardware design? [00:28:03] Adrian: Yeah, you got it. It’s totally a big pile of heuristics that tries to guess what the program, how it might map onto hardware and generate an architecture that looks like that. [00:28:13] Soham: Seems somewhat wrong-headed. [00:28:14] Adrian: It is, like I don’t blame people, you know? [laughter] Adrian: But it is an obvious mismatch for where we are today. It might have made sense a few years ago to say ‘yes, let’s take legacy software and compile it to good accelerators’, but I think it’s clear that it doesn’t make sense. [00:28:30] Soham: Ah! I see, so that’s what the notion was. The notion was you wouldn’t have to rewrite it. You’d just take some pre-existing (software)… Yeah… [00:28:33] Adrian: …maybe like sprinkle in a few annotations here and there, and then get a pretty good accelerator. But, I think it’s clear from the modern perspective that it’s not the right way to go, but that the dream of having some sort of higher-level way to write down, or let competent software programmers design hardware, that shouldn’t be dead. We should abandon the idea of compiling legacy software, obviously. But there’s something to be done in the design of languages and compilers to make it more accessible. [00:29:02] Soham: Got you. So, Rachit, are you working on this project? [00:29:06] Rachit: Yeah. I worked on this project as well. Do you want me to talk about it? [00:29:13] Soham: Yeah. Can you talk about how, what you’re actually doing to build this thing? What is it that you’re actually building? [00:29:21] Rachit: Yeah. So before I talk about that, I’m going to add a bit about why I think the problems occur from a more detailed level, and then maybe I can talk about what we’re going to do to solve the problems. [00:29:34] Soham: Go ahead. [00:29:35] Rachit: I think there’s a general trend of what you see with a language that’s hard to program, or how unpredictable behaviour doesn’t really make sense when you compile them down to a target. There’s this notion of something called the semantic gap in programming languages literature, which is the idea that you’re compiling from a language A to some language B; language B could be representing some sort of hardware or some sort of virtual machine. When the gap between these two languages is big, it becomes really hard to reason about the behaviour of your programs when you write a program in some target source level language A and then compile it to a language B. [00:30:18] Soham: Can you give me an A and B, some languages people would know where the gap would be large? [00:30:23] Rachit: So, a canonical example of this is you write C++ code and it turns into, say, x86 Assembly that’s then executed by the processor array. So Assembly and C are surprisingly very close, when they were built first, surprisingly very close to each other. They were like a thin wrapper layer on top of Assembly that gave some set of nice structure programming primitives, but it was really meant to be at the lower-level Assembly programming language. So the semantic gap between C and Assembly is not very high. Now on the other hand, you sort of ran into this problem that your primitives in the language are very low-level. C famously has pointers, and pointers famously have a lot of bugs, because programming with them is hard for some definition of hard and some definition of who programs them. So people have rolled other languages. Another example is Java which compiles to the JVM or the Java Virtual Machine. The Java Virtual Machine itself has its own instruction set called the Java bytecode. The Java bytecode is again higher-level than Assembly, but lower-level than Java. So, if you think about programs and computers, there’s this giant hierarchy of languages in the middle, and when you’re jumping between this hierarchy - when you’re going from a higher level to a lower level - if your gap is too big, you’re going to run into problems with the programmability, predictably and understandability of your programs. [00:32:01] Soham: So you’d say that Java and x86 Assembly have a big semantic gap? [hesitates] [00:32:06] Rachit: Yeah, I would say that. Java has things like virtual dispatch, which you have to compile a lot of code to them. You have to build up a lot of runtime support and you have to compile a bunch of code to make that a reality for something like Assembly which really doesn’t have virtual dispatch. It’s like an idea of dynamically selecting the right object to execute some piece of code on. Assembly doesn’t even know what objects are, much less what selecting the right object means, right? So the semantic gap is high. But it’s nowhere close to high-level synthesis as the semantic gap between high-level synthesis in the problem of synthesizing a higher-level programming language and to lower level hardware it’s much higher because, you’re trying to take some program that represents a computation and you want to compile it down to a computer, right? You want to take some physical notion of resources and you want to use that as a basis for implementing your computer? This is a totally wild idea. Everytime you write a +, it doesn’t mean an instruction execution like it usually does in a programming language. It means ‘build me a physical adder that’s connected to my inputs and then returns a result to my output.’ The expression A = B + C is a physical circuit that you need to think about. The way we’re looking at it, and the reason this is really hard to reason about, and the reason there’s so much complexity when you compile computations down to computers is because you don’t reason about the physical resources. You don’t think of the + as a physical resource, and that’s what causes it to be really complicated. If you had some notion of your physical circuits being physical, real and usable in your code, then maybe you can reason about it. But again, if you go too low-level you end up with something like Verilog, where yes, you do get real physical resources, right? You can build an adder and you can connect wires to it. Then you can execute your circuit in it. But if you go at that level of granularity, you can lose all your abstraction benefits. Then you’re just programming at a lower level. So there’s this interplay that’s central to field research, at least the kinds of research we do, where you have to trade off expressiveness for some sort of notion of capturing the semantic gap. Does that make sense? [00:34:43] Soham: Sure. [00:34:46] Rachit: Yeah. So, the way we’re trying to solve this problem is using fancy typesystems. If you come from a traditional programming language background, if you program in Java or C++, you don’t really think a lot about types. You sort of look at them and you’re like ‘okay I have a type that says I’m an animal, and this is a dog, and I can replace all instances of animals with dogs’. This is like a canonical inheritance-based typing system in something like Java. That’s usually straightforward to think about but there’s more complexity that you can add in your type system to capture more ideas about the language you’re working on. So for example, Rust is a new systems programming language that has been recently very successful in building what are called Safe Systems Programming. It’s been enabling people to write some very complicated systems code that is concurrent and uses memory in very fine-grained ways and yet manages to be safe. Rust accomplishes this with a culmination of good language design - just good primitives and good design decisions - in conjunction to having an interesting type system that captures some semantic notion of your program. For example, one thing the Rust type system tries to guarantee is that, if you have pointers to memory, you’re never in a state where you have multiple pointers pointing to some memory, and then all of them are trying to mutate that memory. Rust hypothesizes that’s one of the biggest causes of unsafe memory bugs. To some large extent, this has been true because people using Rust have been able to build really big software like Firefox’s rendering engine using Rust, and using mostly safe subsets of Rust. The higher-level point being, you can use typesystems to capture properties about your program. So we’re sort of taking this approach, we’re using a type system to capture physical resources as an idea in hardware programming but we’re doing it at a level of C or C++, or even higher. [00:37:04] Soham: So the resources here are like, an adder would be a resource, for example? [00:37:09] Rachit: Right, so when you write your program ‘A + B’, + is a physical resource. + would be given a type that says it’s a resource. This is important because when you build circuits, one way to build circuits, if you come from a background where you, like a naive way to build circuits, would be, everytime you build an adder you just connect it. Like every time you want to do A + B you just build a new adder, right? And you can build a really massive circuit that can do all of the computation in one go but takes all of the area on your FPJ or all of the area on your silicon. A more common and more interesting way of building circuits is, you have some number of adders that are available to use. Let’s say you have five adders and you want to do 100,000 adds, and you want to use only five adders to do it. So, you do this thing where you try to use as many adders as possible in a given timestamp, and then when you want to use more adders you finish your computation at that point, save it into some memory and move on to the next timestamp. Now you have five more resources to use again. This is something called time multiplexing where you have a fixed number of resources and you want to use them to do more computation than they would allow in a single timestamp. You multiplex them over time. You reuse them over time. That’s how you get to do big computations with a fewer number, like a few resources. [00:38:44] Soham: I see. So you want to enforce, you want to, in your type system, express that something like this - multiplexing can be possible while ensuring that you’re not using more resources than you have - or something like this? [00:38:58] Rachit: Exactly. That’s precisely what we want to do. [00:39:01] Soham: I see. That makes a lot of sense. Is the approach to this project that you sort of sit down, you think about this from a more top-down perspective like ‘I have this problem’ and you design a type system to make it work? And then you sit down and crank out the details of the type system in theory and then you implement it? [00:39:16] Rachit: Yeah. That’s the fun part of research. You can have simple notions of what and how each project will go but then they don’t really go like that! So usually in essence, the way it works is that there are periods when we think really hard about the things we want to express, and then there are periods when I’m just cranking out a lot of code. We switch between these. There’s no sort of clean separation between the time when we’re thinking and the time when we’re building because we think of something, we build something, then we run some experiments, or we write some code, and try to figure out ‘can we actually express the properties we want to express in programs in a reasonable way’. If we realise we can’t or if we find it a bug, or if we find that we have mistakes in our reasoning, we go back to the drawing board and we build it again. So there’s a lot of iteration going on, fundamentally sort of a design trade-off. You’re designing something that is expressive and safe. These are two competing forces. You can have something that’s safe by just rejecting every program, and you can have something that’s expressive by accepting all programs, and you don’t want either of those to end. So, you’re trying to find a design point that allows you to express as many programs as you can while also keeping the programs you write safe. Safe here is sort of a technical term in the PL world where safety means you’re capturing some property of the program wherein you’re saying every time I typecheck, I can prove safety, which means I can prove this property holds true for your program. [00:40:49] Adrian: Just for fun, I think it would be nice if our minds were organized enough that we could do it, you said, and sit down and figure out all the theory, and then take that into an implementation and be done with it. Like Rachit says, it is in reality a wild flip-flop back and forth between those two modes, which I think is pretty fun. At least this way I don’t get bored I guess, with the thinking and the doing happening all at once. But, I think it is really true that in research projects that at least I have done, I have not really figured out what we are actually making until we have mostly made it. So it would be really hard to figure that all out upfront and then take our dreams and turn them into a reality all at once. [00:41:45] Alexa: I would argue though, that even in an ideal world, I wouldn’t necessarily want this intense separation between all the theory and the thinking first, and then just all the implementation. I think that you know a problem is important if you run into it yourself to some extent? So doing all the thinking ahead of time can mean you’re not actually solving the right problems. That sort of the advice I had gotten from an undergrad advisor was that, go off and do industry for a while, or actually embed yourself a little more in communities of programmers and figure out what are the things that people find difficult and you use that to inform what you do. I think that’s kind of true for research too. Even if you have a really beautiful theoretic idea, I, at least, am actually motivated by seeing - how does it feel when you actually try to implement it and play with it? What are the challenges there? And then use that help to also figure out what theory you need next to support that. [00:42:39] Soham: Adrian, can you describe one instance of this kind of back and forth, where you have theory, and then you implemented it, and then it broke, and then you went back? [long pause] [00:42:52] Adrian: Give me a second, that’s interesting. [pause] [00:43:00] Adrian: Yeah. I feel like this happened about 10,000 times in this very project that we’re talking about, where we had a very simple notion of the theory for this language. So for the typesystem we’re going to develop, but the main problem was that it was too restrictive. Like Rachit was saying, it is always safe to reject every program, and we were rejecting most programs. So, we would sit down and like, take that theory, make a very very simple checker that implemented that theory that we had in mind. And then, to absolutely nobody’s surprise, no useful programs could be written in that language. The point was, we had to go back and say ‘oh we really want to allow this specific thing, that it must be programmed to be able to do this’ which is obviously okay, but our type checker can’t currently certify it. [00:43:49] Soham: What’s an example of the ‘this’ that you wanted it to be able to do that it couldn’t? [00:43:54] Adrian: Okay, it’s a little hard to get into this without getting very detailed about our type system, but one thing that we needed to be able to allow… Okay, one thing we needed to be able to prevent is that an accelerator that we generate tries to write two things into the same place in the same memory at the same time. Soham: You wanted to prevent this? [00:44:14] Adrian: We wanted to prevent that. Exactly. So, we had something that ruled that out. However, we discovered that it would be actually kind of cool if programs were able to read the same place in the same memory at the same time, which sounds very similar, but is totally different because it’s absolutely fine for a circuit to get one value out of the memory and just split it to copy it, and send it to two different places and use it in two different computations. [00:44:42] Soham: Right. You can imagine doing some pre-computation and then having a bunch of different, other parts of the circuits read it to do some further computation on it that’s distinct. [00:44:49] Adrian: Exactly, yeah, right. A simple version of our type system would rule out both of them. It would say ‘you can’t touch memories at the same time from two different places.’ [00:44:58] Soham: Right. [00:44:59] Adrian: But it turns out we actually had to discriminate between reads and writes to make sure that this was allowed to do simultaneous reads. [00:45:04] Soham: Got you. And this probably introduced additional complexity into the type system which you had to do because you wanted these to be useful programs? [00:45:10] Adrian: Exactly. Our whole thing is totally worthless if we can’t write useful programs. [00:45:14] Soham: Right. That’s really fascinating. There’s a similar back and forth that happens with systems research. Unless you’re Leslie Lamport, a lot of practical systems research doesn’t start with a lot of theory. It starts with these notional heuristics of what we think is possible, and then some specific pieces of theory like consensus or something like this, that act as constraints against what we can and cannot do. There’s a lot of relaxation of the models we have. Say, like, ‘Oh! we can’t provide linearizability, or strict serializability’ which is a very clear notion of all events in a database happening sequentially with respect to each other. But we can provide this slightly lower property or this lower isolation level to this particular thing. So it seems very similar in that sense, also in the sense of developing this right abstraction, this sort of Goldilocks abstraction that you want, that is powerful enough but also not too powerful that it’s just misusable by your median programmer, while also being implementable in performance. It seems like these are two sides of the same coin. In this vein, for any of you, what do you think are the distinctions between doing PL research like this, and doing systems research like this? Why would you do one or the other? [00:46:35] Rachit: Can I jump in? Soham: Of course. [00:46:38] Rachit: Okay. This is a very interesting question about PL as a field. And this is something I think people in our position struggle with a lot, because to me PL is a field that straddles between theory and systems, right? I did and I do identify as someone who does systems and PL together. I don’t do PL in isolation. I do PL for systems of some form. But it means different things to different people. I also have friends that do very pure, from my point of view, theoretical work, that is just about building really clean, really theoretically strong foundations for PL research. Then there’s people I know who work on memory allocators, for example, who work on building really nitty-gritty systems. I call all of these people PL people because the unifying step between all of them is they’re thinking about languages. They’re thinking about abstractions and that’s the loose definition of PL I work with, which ends up with me classifying a lot of systems work as systems PL-adjacent work anyways. I don’t know if people in the PL community who’ve actually been in the community for a long time or a longer time than me would agree with this, but this is one useful way of thinking about PL, because I do think it’s this field that sits in the middle. It thinks about abstractions and tools for building abstractions and without a field, it’s not very useful to think about PL, without another field to apply PL to. I also think most fields can benefit with the ideas from PL. So, there’s this nice interplay where if you really enjoy doing some research and you learn about some ideas from PL, you can probably apply them in your field and build something cool. [00:48:31] Soham: So Alexa, earlier you were talking earlier about this notion of usability, right? And there is this tension, at least as an observer, in programming languages where a lot of arguments that people make in the introduction to their papers are about usability and about how this is sort of more due to abstraction; it’s cleaner, and produces less areas and so on. Well in practice, there’s not a lot of empirical work actually measuring whether this is true with real programmers. So is this changing? What do you think is the future of doing research and actual usability in programming languages? [00:49:03] Alexa: Yeah. I think I’m quite new to the community overall. I’ve been in the PhD program for only about a year and a half. I think my perspective from having read a lot about, and talked to people who’ve been into the field for longer is that, the tides are changing in this and there is a broadening interest in the intersection of programming languages and the field of human-computer interaction. So HCI is the independent tradition of how we understand how people use technology in a very user-centric way which sounds kind of repetitive, but understanding if you build a new interface, does it actually operate, and are people’s mental models of that interface aligned with what you intend them to be? Traditionally, this is done with interfaces that don’t exactly look like programming languages but I would argue, and I think a lot of people are arguing, that every programming language is an interface and if you want to make a really strong claim about how usable your tool is, it would be good to actually - maybe not in every case - do a user study, but have some way to demonstrate this usability more empirically. There are workshops now that are at the intersection of PL and HCI, people have gotten NSF career awards that are focusing on this intersection, and I think there is more of a push to say if your paper’s going to claim something really strong about usability, you better at least have some data to back it up. This can really range. I’ve seen papers that say ‘Oh, we think our tool is usable because we had a grad student try to do it the old way and it was slow, and then they used our fancy new tool and it was fast’ and I’d say that’s a really low bar for doing anything usability-oriented, but at least is saying, trying to have some idea of how people are ultimately going to use this. And is it necessarily the case that a particular research team finds something more usable or cleaner? Do you actually know that’s true for the intended end programmer in the situation? I think you need actual data to back that up rather than just saying ‘we think this is the best’. [00:50:58] Soham: Is there a specific trajectory you see for how the field might best progress in this way? Is there a set of guidelines you can see in your mind for what you should do if you’re starting a PL project that has a human interface component? [00:51:13] Alexa: That’s a good question. I’m not sure I have an exact answer there. In the same way how we’ve talked about how it’s not a precise cycle, how you go between theory and implementation, I also don’t think there’s a precise silver bullet to understanding theory to implementation to usability. I think trying to learn from the HCI community, for example, one thing I’ve learned through taking classes and talking to people is that there really is a big difference between two types of user studies you can do. You can do what’s called a summative study, which is maybe what you think of as - you give people a tool, you ask them to engage with it and then you actually measure - could people do the tasks you wanted? Did people understand what they’re doing? Did they rate the experience highly? That assumes you have a finished system and you have people who are able to use that system in isolation and treat it as a black box. What you can learn from the HCI community is that’s not the only way forward. You can do studies that are more formative, that are more need-finding-based, where you find a body of users and as we’ve been saying, PL is all about intersections. You find a group of users who are actually at the other side of the intersection, who are ingrained in the domain you want to study, and you ask them what they need, you ask them to give their perspective on your high-level ideas before the system is actually designed, and have them play with a prototype you don’t assume is completely done. You don’t need an ‘n’ for these studies to show your massive statistical significance. You need a smaller group of people who can actually inform their perspective on the domain that you might not have. I’d say that’s recognizing that user studies don’t need to be massive to the extent of black boxes, but can actually be ongoing with the project as it develops. [00:52:57] Adrian: Yes, I think that’s exactly right. Like Alexa says, I think in terms of designing studies for usability, most of the answers can be found in HCI. The HCI community has worked really hard to develop methodologies and best practices for how to run studies by usability. This should work just fine for programmers as they do for non-programming HCI questions. I think a big thing the community will have to address as it embraces this further is finding ways to determine when is the right time to do your study, and what the right questions are to ask when interacting with real people. So I think there’s so many ways in which it would be possible to ask bad research questions and then turn them into your studies that don’t give you a meaningful result. Some of them are really important questions. For example, I think it would be a really bad idea to try to do many more user studies on the question of whether statically or dynamically typed languages make programmers more productive, which in some sense is a really important question. Programmer productivity, you can argue, is a really important thing, and this is a difficult question to answer. I think it is too diffuse of a question to answer all by itself. There’s so many different levels of experience of programmers, there’re so many different things that programmers want to accomplish, and there’s so many differences between languages that are statically and dynamically typed that conflate that question, that make asking that thing precisely all by itself, is not a really good thing to get a bunch of funding and do a user study about. What we need to do is develop a better science for how to take these big broad difficult questions and distill them into things that empirical methods interacting with humans can actually answer. [00:54:45] Soham: Yes, and this is a hard problem in all social science allied disciplines; finding these nice questions where you don’t have confounders that destroy the chance that you get any useful data. What you’re saying suggests that there’s a good opportunity to do good social science work in programming languages, to work at the intersection of these things. Not to throw any shade at the HCI community, I don’t know if they themselves are doing sufficiently methodologically principled empirical work. I wonder then if it makes much more sense to import things from behavioural economics and other disciplines that have more replicated work with large numbers of people. It’s hard to get large numbers of people to do HCI or PL studies, especially PL where you need people with some level of experience as you were saying earlier. It seems like a difficult question, but I certainly hope there’s some progress on it. So, switching gears a little bit. I want to ask each of you to highlight a piece of research that you’ve seen at any point in your career - it can be as old as you want it to be or as new as you want it to be - that you think this is interesting and successful at what it wants to do, in PL. [00:55:57] Adrian: Sure. I have an example. I think it’s going to sound like a cliche to both Rachit and Alexa, but I have this paper that I really love that changed the course of an area in systems and programming languages. It’s a paper by Hans Boehm titled ‘Threads cannot be implemented as a Library’ [https://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf] (Ed note: in PLDI [https://conf.researchr.org/series/pldi] 2005) which is fun not just because it has a really intriguing title that sounds like it can’t possibly be true, but also because it launched an area of investigation that is both beautiful and important. The thesis of the paper is if you take a normal programming language and you try to make it multi-threaded by adding on what looks like a set of library calls, like create a thread and synchronize with this thread and destroy a thread, for example, that can’t possibly work because the compiler will make assumptions that those are normal function calls even if they are black box function calls that can cause it to do optimizations that will break your program. CPUs, similarly, will see those as normal functions calls and do things that are illegal in the context of the multi-threaded program. They’re just fine in the context of the single-threaded program. Soham: What can they do? [00:57:07] Adrian: They can introduce non-determinism into a program that is perfectly deterministic. They can optimize your program to add writes in where programs never did writes and turn that into something where two different threads are trying to read and write the same variable without synchronization. This can cause a program to do arbitrarily bad things. The lesson of the paper is, it’s not sufficient to just bolt this on to existing languages. It is fundamentally necessary for compilers and the semantics of languages to change in order to support any sort of shared memory multi-threading. It launched an area of research into memory consistency models that is still with us today, where we’re trying to figure out what multi-threaded programs even mean at the level of source languages and ISAs. [00:57:59] Soham: Fascinating. When was this published? [00:58:01] Adrian: This was in the 90s or early 2000s. Something around there. [00:58:06] Soham: Was it true that, at the time, people were bolting on threading libraries to existing languages? [00:58:12] Adrian: It’s absolutely true. In this paper, it’s not just a theoretical worry. There were actually compilers out there trying to do exactly this thing, and you could demonstrate actually wrong results. [00:58:21] Soham: And now people don’t do this as much? Now people build threading into the language model? [00:58:25] Adrian: In part. Hans got onto the C standards bodies and the new C standards as of 2011 actually give us semantics that has to deal with threads. [00:58:35] Soham: Wonderful! That seems like a wonderful encapsulation of lots of pieces of this. Rachit, do you have one? [00:58:42] Rachit: Yes. I was trying to think of a way to explain it. There’s this paper from the early 2000s, and the paper has two titles - the first title is ‘Languages as high-level operating systems’ or something like that. But the title I remember is ‘The Revenge of the Son of the Lisp Machine’,which is a very cool title for a paper. I think people should title their