Alexander Stathis: Scaling a Modular Rails Monolith at AngelList

[00:00:00.000] - Robby Russell Welcome to OnRails, the podcast where we dig into the technical decisions behind building and maintaining production Ruby on Rails apps. I’m your host, Robby Russell. In this episode, I’m joined by Alexander Stathis, a Principal Software Engineer at AngelList. Alex’s team has helped evolve a complex Rails monolith over the last several years, from deep survey integrations to merging large apps into a modular engine structure. Angellist’s back-end power some of the most intricate business logic and investment investing, accounting, and banking, and they’ve done it all while staying committed to Ruby on Rails. Alex joins us from Chattanooga, Tennessee. All right, check for your belongings. All aboard. Alexander Stathis, welcome to On Rails.

[00:00…

[00:00:51.380] - Alexander Stathis Hey, Robby. Happy to be here.

[00:00:53.600] - Robby Russell I want to ask you a quick question. What keeps you on Rails?

[00:00:57.940] - Alexander Stathis Yeah, it’s a good question. I I think I actually had another engineer, we were working on a new product, message me this weekend. He said something along the lines of like, Oh, there’s nothing quite like Rails new. I think Rails is just really high quality. A lot of the gems have been around for a really long time or been well-tested. Everything just works. You can’t beat Active Record when it comes to interacting with the database. That would be why I’m on Rails.

[00:01:22.600] - Robby Russell Active Record is very much a special thing. I mean, admittedly, it’s one of my favorite things about Ruby on Rails as well. Something I wish I had had prior to my experience with Rails, what other types of frameworks or stacks do you have some experience in prior to Rails? Presumably that you did work with other programming languages.

[00:01:42.400] - Alexander Stathis Yeah, in the distant past, maybe 10 years or something ago, I’d used Entity Framework. Netcore, C#. Honestly, I like it. Link has a cool, expressive way of building queries. More recently, we’ve used Prisma quite a bit. We have some node back-end. I don’t like Prisma as much. Try not to say anything bad on the internet. But yeah, working in Active Record just has an ergonomics to it that is pretty hard to beat for sure.

[00:02:10.040] - Robby Russell Definitely. So one of the reasons I wanted to have you on the podcast was I want to talk about how AngelList is approaching using Rails. I believe you have maybe one or two Rails, monoliths there. For our listeners, could you paint a picture of the architecture you’re working with today at AngelList and approximately how large is the engineering team at this point in time?

[00:02:28.360] - Alexander Stathis The engineering team, I think, is maybe 40-ish engineers. Don’t quote me on that, give or take 10, but not super large. Our core product is entirely Rails-back. More recently, we’ve acquired a company that had a product built in Node, and so we’ve got some code there as well. But the core product is all Rails. So we have a Rails monolith with a few Rails microservices floating around. We power all of our authentication that way. We use device, we use GraphQL Ruby, and then we’ve got some next front ends that interact with that.

[00:03:01.460] - Robby Russell Are you doing that where the... So with the next app separate, like repositories on a basic level, or are those all- Same repo, obviously different application, right?

[00:03:09.910] - Alexander Stathis But the next app sit in a, slash frontend or whatever, and the Ruby app, the Rails app is in a, slash back-end. And we use a couple of different Rails engines that we serve up through a single application.

[00:03:21.640] - Robby Russell Interesting. Was that all intentionally from the beginning put into a... Is that a monorepo or are those separated deployment processes?

[00:03:30.000] - Alexander Stathis Yeah, definitely not. So we had one main monolith, and then we had some microservices. In the last year or two, we’d actually taken one of those microservices and refactored it into a Rails engine and then moved it into the monolith application. I would say the reasoning for that was less technical. It was because over time, the two had become commingled in a way that we wanted to undo. And there was a lot of complexity to do networking and transient failure, student network stuff. And it was just easier just to be able to call the Ruby code directly. We still have them as separate modules. We use Pacwork, which is a static analysis tool that I think Shopify built. And we use Pacwork to manage the dependencies between the various modules inside of our Rails monolith. We also have a few other small products that we’ve built and shipped that are also built as separate Rails, engines inside of the same monolith.

[00:04:23.340] - Robby Russell You mentioned that you acquired a Node app. And so what was that process? Did the team come along with that then to some extent?

[00:04:30.960] - Alexander Stathis Yeah, a few. But most of the people I would say working on the Node app now or we’ve hired more recently, we have some engineers that were pre-existing at Angelus that are working on it as well. When you acquire a company, obviously, the two things are totally distinct. And over time, part of what we’re working on now is killing the duplicative parts. So we’re moving some things one way and some things the other way and trying to really clearly define the boundaries between the products, whereas before there was some overlap, which is what made it such a nice spit for us.

[00:05:00.000] - Robby Russell Were you around when the decision to start using packwork came about?

[00:05:03.920] - Alexander Stathis Yeah, I drove a lot of that. The reason it was a microservice to begin with was primarily because it really was supposed to be a separate set of concerns. I’m trying not to get overly specific about the details. I’m trying not to put anybody to sleep, but the separate microservice was really intended to house a lot of our legal and economic. Angelus is a platform that helps founders and general partners raise and deploy venture capital. And so we track a lot of legal and economic structures. Our ERD is very complex, and our separate microservice was largely intended to house a lot of that. That’s distinctly different from our marketing site or a lot of the stuff that happens before we formally define a fund or collect capital into it. We were very intentional about separating those two things, and we’re still very intentional about it. And over time, as you build your startup, you ship stuff. And that That stuff doesn’t necessarily always conform to the very precise, preconceived notions of what should live where. You make concessions to meet timelines, you do stuff that’s hacky, you make it work. So part of why we consolidated the two things was not necessarily because we wanted to merge them together and turn them into a bigger ball of mud was actually because they’d already over time turned into a ball of mud.

[00:06:20.730] - Alexander Stathis And it’s much easier to unwind that when you’re working in the same repository versus managing it across an API when it’s become deeply integrated already. That was a large motivating factor, reducing complexity, increasing simplicity. Back to the original question, we kept packwerk and we implemented packwerk specifically to hold those conceptual boundaries very neatly.

[00:06:42.540] - Robby Russell When you’re thinking about setting those boundaries, how much of that is strictly just the architecture or the data related to those different domains versus how your team might separate boundaries of teams themselves? Who’s working on what?

[00:06:59.500] - Alexander Stathis Yeah, Yeah, a really good question. I can’t remember what it’s called, but it’s the law that you ship your org structure or whatever it is. I would say we really don’t have a culture at AngelList of, I’m a this engineer. I mean, it certainly happens that way, and You can’t have totally fungible engineers all the time. It’s really expensive to do that, especially when you’re working across many sacs. I wouldn’t say that the point of it was separating out the teams or being able to have one team operate in a silo from another. It really was mostly designed around separating the concepts and the business domain and keeping things organized in that way. This is probably a naive statement, but we work in a very complex domain. I’m sure others out there are like, Ha ha, right? Mine’s more complex or whatever. But we deal with a lot of legal requirements, a lot of economic requirements. The relationships are very complex. Our data graphs are very highly connected, highly interconnected. I’m a big believer in getting the ERD right. And if you get the ERD right, then the code falls out of it. But if you get it wrong, then you have to fight really hard to make the logic work.

[00:08:09.720] - Alexander Stathis I think the major driver here is really around establishing and maintaining the conceptual, the business domain boundaries, and holding firm on that.

[00:08:17.400] - Robby Russell For the few listeners who might be wondering what ERD means.

[00:08:20.460] - Alexander Stathis Oh, yeah. Energy Relationship Diagram. Just the graph of our models. In Rails speak, you have your models, your Active Record base, and then you have your associations. The associations are the edges and the models would be the nodes.

[00:08:34.740] - Robby Russell Thanks for helping clarifying that. When it comes to using things like packwerk, I think it’s just been a trend that since I’m speaking with a lot of people, and this is something I want to clarify for the audience is that I’m talking with a lot of companies that are maybe a little larger in scale. So I want to make sure that everybody’s thinking about when do you think there’s a benefit of using packwerk versus should everybody be using packwerk? Because everybody that’s been on on rail so far has mentioned packwerk, but that might not apply. Do you think there’s a point where an organization or an application gets to a certain scale or a certain level of complexity that something like that can be really helpful? Do you feel like it would have hindered the project earlier on in the process? It’s something you should tack on later if and when you maybe need it? Where’s that distinction in your mind?

[00:09:15.900] - Alexander Stathis Yeah, it’s a good question. I wouldn’t be too prescriptive here. I guess my general philosophy here is if you can, you should enforce a rule with a static analysis tool, something you can run in CI. This is why we use formatters. This is why we use linters. This is why at AngeList, we use Sorbet, right? Because it allows us to not have to be as disciplined when we approach these types of problems. I think packwerk fits into that category very nicely. If you have different product verticals or if you have different, I don’t know, domains like we do, it’s really helpful to just have something that consistently checks and enforces those boundaries. Packwerk has gone through its own evolution. They tried to do this It’s a privacy thing, and then they pulled it out. At the end of the day, what it does is it just defines a dependency graph between modules in a Ruby application. For us, we use that to make sure that we aren’t calling across. We maintain almost a linear dependency tree, and that’s what we use it primarily for. And it helps, it catches things. It’s like type checking also.

[00:10:23.820] - Alexander Stathis Most people are pretty good about not shipping bad stuff, but every once in a while, you make a mistake, and it’s nice to have that CI running to keep you honest. It’s easier to catch it with a tool than it is in review. Let’s put it that way, right?

[00:10:35.760] - Robby Russell You just mentioned Sorbet. How does using something like Sorbet influence your Ruby style as an organization?

[00:10:41.820] - Alexander Stathis Yeah, good question. I think it was Jake Zimmerman, one of the core Sorbet guys, said something along the lines of, and I’m totally poorly paraphrasing, but the point of Sorbet is not so that you can keep writing Ruby the way you want to write it. Sorbet, the type shift system is supposed to influence the way you write the code, and I find that to be very true. For instance, something that I’ve had great difficulty trying to add typing to is Active Support concern. The included hook, just like anything that uses class eval or whatever, it’s not easy for Sorbet to understand at least statically. Meta programming is such a big part of the Ruby culture, and that type of stuff can be so prominent in Ruby code, but it just doesn’t fly in Sorbet, or it can, but you have to get pretty mean with it. And so, yeah, we tend to write very different, I would say, Ruby code than your general Ruby. We don’t use a lot of multiple inheritance. We don’t even really use a lot of instances. I mean, of course, our Active Record models are instances of classes, but we use a lot of class methods on modules, right?

[00:11:44.570] - Alexander Stathis To define service logic to try to make it more functional. One, it works better with the typing system, certainly, and then it also makes it a little more functional for us, which is a little easier to understand.

[00:11:55.140] - Robby Russell We had a previous call for our listeners to talk through maybe some of the conversation topics for this interview. One of the things you mentioned in that conversation is that your team intentionally avoids a lot of, I’m air quoting, rail’s magic. What does that look like in practice?

[00:12:10.240] - Alexander Stathis Yeah, in practice, there’s a pretty funny video of some engineer from five years ago or something, describing his experience, debugging a series of Active Record callbacks, where you make some change somewhere, and then it’s like, how did this occur? And you’re jumping callbacks for many chains. Calls. So yeah, we don’t really use callbacks. We’re not ultra dogmatic on it. There are times when callbacks are appropriate, but we really try to avoid anything that isn’t explicit and obvious. So callbacks, I would argue, are a little more implicit behavior. Things happen and then you have these side effects. It’s not like you have a clearly defined method. In order to understand callbacks really well, you have to understand how all the books work, what order they call in. You have after commit, after save, after validate. Another component of this, a lot of our engineers coming in don’t know Rails or Ruby very well. And so I think we’ve partly developed a style that caters more to a more imperative or like, procedural coding. So we try to avoid meta programming. Monkey patching is amazing, but also a little bit evil. It’s definitely a two-sided sword. We avoid callbacks if possible.

[00:13:24.560] - Alexander Stathis We don’t use a bunch of multiple inheritance. When you look at Rails, for instance, you see these deep basically nested includes, and you have to figure out where this method is coming from. Chasing references around like that. It’s a little easier with Sorbet now, but definitely at the time we adopted the practice, it was pretty intense to try to figure out what was going on unless you knew already how the internals worked. I think that’s what people really struggle with with magic. It’s like you have to have this deep knowledge of how the internals work, and you can’t see it, and that’s why it feels magical. We write in a way that tries to avoid that.

[00:13:58.120] - Robby Russell What does that look like on let’s say, a typical crud type process? If you’re saving some data, whether that’s coming from your next app, one of your next Js apps, or you have a Rails interface or some view in your Rails application and you’re saving some data and it needs to trigger some additional things that happen, whether it’s syncing to another system, maybe notifying people. How does your team think about that process then if you’re not going to lean on the callbacks? Are you doing that primarily within code and the controllers? Do you use something like a service object type approach? When you say procedure, talk us through that.

[00:14:30.900] - Alexander Stathis Yeah, good question. We use GraphQL Ruby, so everything comes in as a query or a mutation. Then we use thin models with, I don’t know, you might even call these anemic models, right? If you were Martin Fowler, I’m not sure. And then we use pretty thin controller, pretty thin presentation logic, and try to keep our business logic very clearly in the business service layer. So the majority of our business logic lives inside of Slash services, and we have a lot of modules with class methods defined on them that implement the various business logic methods. An example of that might be like, it’s very common pattern to have a create service. So you might call like object create service, widgets, colon, colon, create service, right, dot create, and pass in the things that you need, and that would create the model. I wouldn’t say we’re dogmatic, but we’re pretty consistent about using dot create bang. We use a lot of bang methods, and none of the instantiate and save, I mean, we do this a little bit. And then anything that needed to happen after that, we would structure it as like sequential calls inside of that logic. In our most complex cases, we might have something where we have an after commit helper method, where it’s like a callback, but it’s just like an actual method in the service.

[00:15:40.680] - Alexander Stathis Maybe it’s private in the class or in the module with the create method, and then we would call that after the transaction wrapping the things that we’re creating. We would generally structure this as just like sequential calls. And then if we need to make something asynchronous or if we want something to fire and forget, or a callback would typically be something where we would fire and forget it, shoot off a sidekick job or shoot off a good job job or whatever, and let it run and not have it impact the end user experience at all. We’ll recover it separately if it fails or whatever, and it’s not mission critical. So that’s about how we do that.

[00:16:13.220] - Robby Russell So is the controller in, say, a create action, in your controller. Is that then you’re not directly interfacing with the model then, you’re interfacing with create type thing, and then it’s a little layer between?

[00:16:24.340] - Alexander Stathis Yeah, exactly. We don’t end up with a lot of thin wrappers, though, because like I said, our business domain is very complicated. It’s very complex, and the graphs are highly interconnected. And so typically, it’s hard to create something in a void. You’re creating some object. It usually has follow-on objects that you need to create with it for that state to be consistent. Our models, we try to map very closely to the business domain online. That just makes it a lot easier. It closely ties the business concepts to the code, which is a feature for us because the business concepts are very complicated and hard to understand. And so being able to read them out of the code is helpful for engineers to figure it out. The point is that We do have a lot of logic that is included in those create services, and it’s not just a lot of thin wrappers that call Active Record, create, and then return. It’s usually like, create a couple of things, maybe wrap it in a transaction, maybe kick off a sync job or something and update an index or whatever it is.

[00:17:17.680] - Robby Russell When those things fail, how do you handle when things aren’t saving properly and bringing things back? Does that come back to the controller level? Are you rescuing things or raising exceptions and things of that nature? Or can you rely a little bit on Active Record in those situations?

[00:17:30.220] - Alexander Stathis I would say generally when we encounter errors, like when our create fails, it’s usually due to some validation, right? And we’d probably just surface that backup to the user, either in an explicit error message coming from the validation, right? Or through some translated error domain. It depends on how friendly the error message is. Some error messages are not super friendly, and you don’t really want to show them to your end user. You just want to show them, I don’t know, what I would say is the 500, the red screen of death thing. But we want our errors to be relatively loud because for us, it’s actually pretty bad when we get these inconsistent states because our data graph is so complex and there are so many objects that all have to be kept together. When you get into a bad state, it’s actually more expensive to recover than it is to just roll back and be like, Hey, that didn’t work. I’d rather debate why it didn’t work and then just redo it, make an item put in right, then try to recover midway, which can be very expensive when you have a lot of pretty complex business logic that’s running and needs to run in the right order.

[00:18:31.030] - Alexander Stathis There’s a lot of assumptions throughout the system that these things exist or that they exist in a certain way. When you allow them to get created halfway and then either swallow that error or don’t transactionalize, don’t make it atomic, then other parts of your system start behaving badly as well. And that’s definitely not good. I’d rather one user see an error than take down the admin view or whatever.

[00:18:51.910] - Robby Russell Sure. So you’re typically wrapping those in a transaction, and that way, if something triggers it, it’ll just roll back everything and then have to figure it out and solve that problem and then go through that process again.

[00:19:02.960] - Alexander Stathis Yeah, definitely when we have these states that need to stay consistent, if we’re creating multiple models and they need to be created all at once together or updated all at once or whatever, definitely wrapping those in transactions. We actually have multiple different databases, so we run into some issues where we need those actually to be atomic, and then that gets very complex because you start having to do some gnarly things like stacking transactions. You might start a transaction on one database and then immediately open another transaction on the other database and nest them that way. You also get a lot of complexity from this for nested transactions, which are automatically flattened. So if, record, if you open a transaction and then maybe you call some other service method and open another transaction inside of that service method, those get flattened into a single database transaction. And so if they fail deep, right? Then it unrolls the whole thing and you can get into weird, goofy states that way. So I guess what I would say is I personally try to use a lot of very narrow transactions, ones that are very tightly to just the actions that we’re trying to perform and keep in sync.

[00:20:03.000] - Alexander Stathis That, of course, is totally conceptual thing that does not work in practice, and not everyone’s so disciplined about how transactions are used, and we’re trying to ship stuff. We’re trying to deliver value, right? We’re not trying to be ultra nitpicky about our use of transactions in our application. We have a lot of atomic states that we keep glued together with transactions, and then when those fail, we roll them back. We also just have a lot of logic that runs sequentially, and it’s okay for it to fail in the middle. It is recoverable. It It’s just item potent by nature. We don’t need a pessimistic walking or whatever it is to try to keep that in line.

[00:20:37.140] - Robby Russell Out of curiosity, what scenarios do you think having multiple databases came into play? Is it because there’s sensitive data in some area where we need to keep this more locked down? Or is it more like just companies I talk to? Like, well, we have some HIPAA compliance related, so they isolated that. It’s a different database, but still interact with it, but we just keep privacy information. What does that look like in your world?

[00:20:56.280] - Alexander Stathis So, yeah, we have a few different services. And like I said before, we actually consolidated one of them into the same repo into a monolith. So not just the same repo, but the same application actually as an engine. And those as separate services naturally had different databases. We also have a bit of tech debt where a lot of our older databases are MySQL. I’m not personally a huge fan of MySQL. I have had plenty of fun times, debugging 5.7 query plans and trying to deal with that crap. So we also have a Postgres database that’s floating around and we’re trying to do this long-staged migration. And so we have some models living in one and some in the other. So I would say one of our services has multiple databases because we want to eventually... Well, really, we adopted it because we wanted to use GoodJob. Although in hindsight, if I had known about Solid Queue, I would have used Solid Queue instead. But I guess even... I apologize, I forget her name, but she even said from episode one or earlier, she even said, I think they use it for databases at... I can’t remember any of the specifics.

[00:21:58.640] - Alexander Stathis I’m sorry, I’m totally floping this, but...

[00:22:00.420] - Robby Russell That was Rosa, Rosa Gutierrez.

[00:22:02.340] - Alexander Stathis Yeah, Rosa. Rosa, author of Solid Queue, I think she said that they use separate databases for it, although I forget, it sounded like they had gone back and forth a little bit. And so we originally adopted Postgres in one of our services so that we could use GoodJob. And it was a means to an end. Sidekick was not doing what we wanted it to do. We wanted some observability, right? Why are these things failing? Where are they going? We wanted to be able to query and have a little more interactiveness with the queue itself. And then we also had these separate services going on that had separate databases. And so now we have just a few databases floating around, and we have to try and keep all of those in sync. And honestly, it’s a real pain. I hate it. But I think I would still keep separate databases for the separate services, like what you mentioned. There’s a conceptual domain thing. You almost want a separate split there. If I could, I would consolidate to a single instance and just use different databases in the same instance rather than actual physical different databases, like in AWS or whatever.

[00:23:03.140] - Alexander Stathis But we have a lot of other problems to solve before I think we unlined that one.

[00:23:08.300] - Robby Russell I know that you mentioned using Sidekiq. GoodJob. And I think in our previous conversation, you mentioned Temporal. Is that right? That you’re using there? Yeah. So it sounds like you’ve experimented with multiple background job tools. So what have you learned along the way?

[00:23:22.400] - Alexander Stathis I’ve learned that there is no one-size-fits-all solution for this type of work. Sidekiq has a lot of packages and gems that add on to it that provide different functionalities, and you can install them at will and whatever. We’ve had some trouble with Sidekiq Unique Jobs, for instance, not to try and pick on anybody, but we’ve just had trouble with the digest getting stuck, and then you have to go and manually clear the digest, and you’re like, Why didn’t this thing update? It’s like, Oh, because the job didn’t run, and then whatever. It’s hard to recover from that type of stuff because it’s hard to even know that it’s happening. We also had to observe ability. Why is this even happening? I think adopting GoodJob was primarily because there wasn’t a lot of transparency. I always say it wrong, but everybody gives me hard time. Redis. Redis is an incredible database, incredible tool, but you can’t look at the jobs after. They’re just gone. The queue is gone once it’s processed. Having the ability to introspect on what happened before is a really useful tool when you’re debugging jobs that have more complex stuff going on.

[00:24:31.540] - Alexander Stathis You want to know when they failed or why are they failing for all of these reasons. And it was just hard for us to do that in Sidekiq. It was a lot easier if we just had an actual database table backing that. So we pulled GoodJob in. The problem with Good Job is that it’s not super performant when you’re interacting with a bunch of really small jobs, like fire and forget type stuff. There’s an active issue in the GoodJob dashboard. You install GoodJob in the gym and you can add this dashboard to your routes.rv , right? And it’s like an admin panel, and it’s incredibly slow for us. It never loads. You have to load it a bunch of time. You have to warm the DB cache up so that it can actually serve the query. And it’s just a pain. And then we run into issues with it will acquire a lock on the table, and then suddenly the whole table is locked up. And now we got to figure out why the lock is still holding and what’s running and how it’s going. I would say that where we’re leaning now is we’re going to try and move, I think, and this is totally really speculative because we haven’t actually done any of this yet, and everything is the best laid plans.

[00:25:34.370] - Alexander Stathis I would like to see us try to move any of our more complex work that we’re doing asynchronously in a worker, in a job or whatever into Temporal. And so Temporal is, I think it came out of Coinbase. It’s just a really powerful workflow orchestration tool. We use Temporal Manage solution. I think it’s Temporal io. The downside there is it’s expensive, and it’s also the ergonomics are not great. It’s not Active Job. I mean, this is the thing, why am I on Rails? Active Job, right? Active Job is awesome. I think it’d be cool to build an Active Job connector for Temporal. I’m going a little tangential here, but I think it’d be really cool to see an Active Job connector come out of that that would work for Temporal. It’s something I’ve noodled on for a while, but just haven’t sat down to actually try to write. But yeah, I’d like to see us move a lot of our more complex workflows, things where we need, like guaranteed delivery, things where we need observability or stuff It’s stuff that we’re not comfortable just firing and forgetting that we need to run for maybe a little longer, right?

[00:26:34.890] - Alexander Stathis Because Sidekiq, you don’t want your jobs to run forever anyways. I’d like to see us move that into Temporal. I think we have a couple of cases where we might either keep GoodJob or adopt. I think we also have delayed, although I’ve been less involved in that in some other services. We’ve really gone through them all. And I think it would be cool to maybe consolidate those with Solid Queue, maybe when Rails 8 comes out, and then use something used that for our quick fire and forget stuff, or use something like Solid Queue or GoodJob, or even Sidekiq for our stuff that is going to be quick, queue up this email or whatever. But yeah, I mean, coming back to the original question, I think as much as it offends me conceptually, there’s no perfect asynchronous solution, at least that I found. I don’t know. Maybe the listeners will write in and complain that this other thing is perfect. But for us, I think we’re going to end up probably with multiple solutions. And I think coming to terms with that more recently, I think it’s okay. There’s probably some ergonomics we need to figure out, and then you have to define for people when to use what.

[00:27:36.370] - Alexander Stathis But ultimately, I think it’ll be the better solution.

[00:27:38.860] - Robby Russell Yeah, that was going to be my next question. How does your team then go about deciding what to use where? I hadn’t heard of Temporal before our conversation recently. That’s a platform people can. I think there’s open-source versions that you can use as well. They’ve got a bunch of example. You can use this with lots of different programming languages. Are you actually triggering anything with your next apps? Or does Almost all of your background processing, Async jobs, get triggered by something within your Ruby or Rails processing.

[00:28:06.200] - Alexander Stathis Yeah, this is an active conversation in our node apps as well. It’s like, what do we do there? We do have a temporal setup, but again, we’re running into the same issue where you have this more complex processing job versus a short fire and forget type job. So temporal works by defining workflows and activities. Workflows are basically comprised of some logic, although it needs to be totally item potent, and then activities that they can call to execute actual work. So I think it’s possible to set up, and I don’t know, I haven’t done this. This is totally conjectural, to set up a workflow from one service that cobbles together activities from other services, I think. But I’m not sure. I think that’s true. We’re not doing that. We’re explicit, we’re straightforward, we’re like no magic. Right now, we’re just using workflows and activities defined locally in the same service and running those. But we do have a lot of complex workflows that we need to run asynchronously in these jobs, like processing jobs, basically. And it’s important that they don’t fail in the middle or that we know when they do so we can recover or whatever it is.

[00:29:11.360] - Alexander Stathis And Temporal is really useful for that.

[00:29:13.540] - Robby Russell Was that something that the Node app that your organization acquired, were they already using that? Was that how it got introduced to you or is it something you implemented after that process?

[00:29:22.660] - Alexander Stathis I don’t think they were using it prior. No, we implemented it after. I think the idea was we need an async solution, and like processing worker solution, and we’re already using Temporal, so let’s use it. There’s been some issues with trying to adopt it. We’re getting out of my expertise here, but we have some patches to make it work on an NPM packages because I don’t know, this is like technical, this is like normal stuff. You have to do this stuff to make stuff work.

[00:29:51.100] - Robby Russell Sure. That is always the case. There are always a little bit interesting edge cases or things you have to figure out how to make work with whatever you’re doing your apps and using some third-party service. Does using a third... I’m air-quoting outsourcing something like that work to something like temporal. How does that then translate back to if they run some processing or is it then orchestrating and triggering some API endpoints in your system? Be like, Okay, once this is finished, tell us that this is done. What does that end up looking like? Or are you just checking temporal without having used it or look at too much into the thing? But if you trigger a thing in a temporal, are you then just checking the status if it’s finished or not and then continuing? Or is it sending some our callback and let you know that it’s finished doing whatever we’re trying to do? Or am I coldly misunderstanding how it works?

[00:30:35.400] - Alexander Stathis It’s just like Sidekiq or GoodJob or probably Solid Queue, although I haven’t used Solid Queue yet, but you still have to run worker instances. So these are deployed on our own and from in AWS, in the internet, like their pods in our Kubernetes cluster. And these actually take the workflows, take the activities and process them. This is actually our code running. The database that actually stores the queue, right, is managed. Managed through Temporal io. You can self host, you can do all of that. It’s just a lot of complexity from a DevOps perspective, I guess, to do that. And so for us, it was easier to just have a managed solution that you just connect to it and push your jobs there. They have a really nice UI. I think it’s super cool. There’s a burn down or a time chart. It shows you where each activity is running and what stage it’s in and why it failed and this and that. I think you can also query on it. You could certainly have it do callbacks. We don’t do a ton of that. We just need to make sure that the jobs actually execute and that they execute in the right order or whatever it is.

[00:31:37.040] - Robby Russell Do you think that Rails needs more, say, stronger opinion around asynchronous processing?

[00:31:43.360] - Alexander Stathis I don’t think so. I mean, this is like one of the beauties of Active Job, right? It’s like the thing just plugs into whatever. I think it is an interesting part of the ecosystem. So in NPM land, you get a thousand packages that do the same thing, and they’re all tiny forks of each other. You’ve got Joe’s whatever, and you’ve got Frank’s whatever, and they do the same thing slightly differently. In Ruby land, I think that’s not quite so true. You generally don’t, right? You have actually these long-standing... These packages, these gems that have been around forever, and they just work and they do the thing that they’re supposed to do. The Ruby community is very good at consolidating on these efforts and dumping a lot of energy into them and making them very good. If I can be totally just talk totally unfounded way, I think the reason there are so many different solutions here is partly because Active Job is pretty flexible, but I think it’s partly what I said before, which is that I think depending on what you’re doing, asynchronous work comes in a lot of forms. It’s asynchronous work, and so it’s similar in that way.

[00:32:42.320] - Alexander Stathis But depending on whether you’re doing long running processing jobs, you might want a different solution than if you’re doing quick little fire and forgets that you don’t care about. If you’re sending a two-factor code email, you want that to land. You don’t want that one to go missing. If it’s, I don’t know, something else, maybe you don’t care. If you’re just updating an elastic search index that you know that later today the cron will run and fix it or whatever, maybe you don’t care as much. Versus if you’re pulling in a file off of an SFTP server from a bank to process, because This is still how a lot of the financial industry works today. Then you’re going to do some heavy compute on that and pull a bunch of records into the database and you need to know if it fails, you need to know where and why, and you need to be able to recover and all of this. Then you want a different solution than maybe sending a transactional email. I guess if I were to just guess why this is the one area in the Ruby ecosystem where you see a lot of diversity, I think it’s because asynchronous jobs probably have less in common than they have in common.

[00:33:48.360] - Alexander Stathis They tend to be more different than they are the same. It’s just that they’re all asynchronous work. They’re all in some kick off the job and run it later. I think the opinionatedness here would probably hinder people and would probably not be so useful. I think the work defines the tech and the tooling, right? And I think that’s a pretty basic engineering principle. And you want to build the right things with the right tools, right? And I think that’s how Async workers work. And I think that’s what we’ve definitely found at AngelList. We keep adopting new ones, trying to hope it’ll solve all the problems. But at the end of the day, we’re just going to need two or three or whatever it is, right? Different solutions for different problems.

[00:34:26.340] - Robby Russell I think that’s a good point. I always feel like most of the teams I talk to and developers I talk to, they’re always trying to be like, which one do you recommend so I could just start using that one or what have you? And so, Rails is now obviously shipping with Solid Queue and things like that as well as maybe a good reliable default for this type of work. But knowing that you can then use with Active Job a lot of different types of platforms to accomplish that work. So I’m always a little hesitant to be like, This is the one you should be using, this is the best one, or which one is the most popular to be using. We run a survey every couple of years. We can see how the landscape is landscape changes there. But hearing someone actually say, No, we probably actually need to run a couple of different ones for very different reasons, I think is something I haven’t heard that much from a lot of people I’ve talked about because they’re always trying to just find that one solution. Is that something you feel like you just have come to understand and concede, or is it just like, no, this makes the most sense for us and maybe for other companies that are...

[00:35:21.840] - Robby Russell Do you think that they should be thinking about embracing multiple different paths rather than just be like, We need to migrate all over to X because we’re done with Y?

[00:35:29.640] - Alexander Stathis Yeah, I mean, we’ve certainly tried to just migrate everything to X because we’re done with Y. The thing about migrations is you end up migrating forever. So doing that is not super successful. I mean, it can be, but it takes a lot of concerted effort. Migrations are hard. Everybody knows this. I think what we found is that we have really diverse, asynchronous work. Back to the point here. I think we have a lot of different stuff going on in workers, and a lot of it tends to be expensive compute, stuff that’s not super cheap. And so it’s not well suited to something like Sidekiq, which is backed by Redis, Redis, however you say it. So I think, yeah, I mean, as someone who really values clean, organized, really, really precise stuff. It’s a little offensive to me that I need multiple of these things, right? And I think that’s probably what you’re detecting also. I think engineers are very similar in that way. We all want the solution that works. We don’t want to have to cobble together stuff. But I think for us, we just have different work And it requires different tools.

[00:36:31.980] - Alexander Stathis And asynchronous processing, asynchronous workers, asynchronous jobs, whatever, is a tool, but it comes in a lot of different shapes, right? So you wouldn’t use a sledgehammer in the same way you’d use a framing hammer, right? And I’m sure if you knew a lot more about hammers than I do, that there’d be even different framing hammers you might use. I think using Sidekiq to run jobs that may run for hours is like using a framing hammer to try to run the thing you need a sledgehammer for. Whereas Temporal, great solution can literally do it all is a little bit abstract. It’s the ergonomics are not super great. And maybe is it great for us with a managed solution? Cost is a factor. We have to think about you pay per workflow run or whatever it is. And so you got to think about that as well. It’s maybe not the best solution for a little fire and forget, update this cache jobs that you’re not actually that worried about landing. We just have very distinct, different needs in that realm. And we’ve tried the let’s adopt this one and let’s adopt that one and let’s push all the new jobs and let’s migrate, let’s do it.

[00:37:37.880] - Alexander Stathis And then we keep running into the limits of that system, of whatever system we’re migrating into. And It’s really tough to do a migration like that and then be like, Oh, man, did we mess up? Should we have left this work in Sidekiq? Because it’s just better suited for it and would make more sense. That’s a painful thing. That’s a pretty awful thing to feel. But yeah, I’m coming around to the fact that I think we just need multiple solutions. We just need multiple different types of asynchronous work. Then you asked earlier, how do you decide? I don’t know.

[00:38:10.580] - Robby Russell How do you onboard people into this world? They’re like, Well, we’ve got three different things, and having even mentioned delayed job there, which has been around and maybe not so in vogue in a long time, I’m curious about, has any of those decisions been based off of... Is anything hindering you from keeping things relatively up to date with, say, Ruby and Rails versions themselves?

[00:38:30.780] - Alexander Stathis Yeah, no, not at all.

[00:38:32.140] - Robby Russell That’s good.

[00:38:32.840] - Alexander Stathis I think we’re on Ruby 3.4 . We’re on Rails 7.2. I think we are on 7.2.2 something. We’re pretty leading edge, I would say. In a related realm, we’ve also adopted Falcon in one of our microservices. I don’t think so. We’re pretty good about really grinding this stuff out. Honestly, this is fun. I like this stuff, personally. I did a lot of this stuff as side projects in parallel with product work, where I would just be like, Okay, I’m going to upgrade this thing from Rails 6 to Rails 7. And it’s a little sketchy. It’s a little painful. Like, stuff breaks on you. You don’t anticipate, but it’s fun work, and I like it anyways. And it gets you really deep into Rails itself or Ruby itself because you’re forced to contend with these idiosyncratic things that you never knew. And I like that. So no, it hasn’t really held us back. We’re pretty cutting edge with all of that stuff.

[00:39:30.740] - Robby Russell This episode of OnRails is brought to you by Application Controller, where all roads lead and no logic escapes. Do you need to share a method across all controllers? Add it here. Once you run a callback before every action, you know where to put it. Do you need a place for auth logic, flash helpers, and a little panic code? It’s waiting for you. Application controller is the one file that’s holding your app together. And also maybe holding it a little hostage. Side effects may include confusion, long scroll sessions, and blaming something in here for half of your bucks. Application controller, because if you don’t know where it goes, it probably goes here.

[00:40:08.580] - Robby Russell I’m curious, if you’re willing to answer this question, do other developers and engineers on your team have much experience also participating in those upgrades, or is it primarily you doing a lot of the heavy lifting there?

[00:40:23.260] - Alexander Stathis We have a couple of people, me and another person, that I would say drive the majority of it or have driven the majority of it because I think I like it. It’s easy for me to just do it, and it’s not like something that you have to do that often. Rails isn’t releasing new versions every day. I mean, honestly, Ruby ships more versions than I realized. If you’d ask me five years ago before I started working in Ruby, How often does a new Ruby version come out? I’d probably told you every 10 years or something, right? But it’s actually every few months or every couple of months or whatever. But it’s not that hard with patch versions with minors, usually, to upgrade. You usually don’t run into that many issues. And so really, it’s just the majors that are tricky. Rails is a little different. Rails will ship stuff in minors that requires some effort. But typically, I would say the hardest part about upgrading is there’s two really key facets of it. One is fixing all the tech debt and monkey patches to whatever internals of whatever gems, Rails or otherwise that you’re reliant on.

[00:41:20.400] - Alexander Stathis And the other is figuring out how to get the dependencies to come along with it. Because usually when you bump a Rails version, you also have to bump a bunch of other packages. And that can come with other stuff for sure.

[00:41:32.480] - Robby Russell Do you have any team guidelines or at least scenarios when you evaluate, when it makes sense to bring in an external, say, Ruby gem dependency into your applications versus just doing something yourself so you’re not so dependent on that gem being potentially a bottleneck for keeping things updated or giving you too much or not?

[00:41:54.340] - Alexander Stathis Yeah, no, I don’t think so. Historically, AngelList has been very high trust engineering Sure. And so if you have some problem and the gem solves it, let’s pull that in. It’s nice in Ruby, like I said, because the ecosystem is pretty high quality. In NPM, you get these packages that are eight lines of code, and it’s like, Oh, I’m pulling in this one method through a package. Each one function through a package. In Ruby, it’s less like that. I mean, we certainly have adopted our fair share of technical debt from some packages. I think like mind magic, we have our own fork of mind magic. We have some forks and stuff of these core dependencies that are not updated very often and you end up with it. I’m a pretty big proponent of upstreaming stuff when you can. It’s hard to do, right? And then also the gems themselves have to be maintained, wh

Similar Posts