Real World Elixir Umbrella Projects

 

kisspng-umbrella-blue-royalty-free-illustration-blue-umbrella-5a89526f802b58.989132451518948975525.png

This article is a quick collection of my notes and references for working with Elixir umbrella projects in the day to day. It first covers my usual workflow and tools I use, and then moves into a more thorough discussion on the concepts and heuristics that I use as guiding principals around decisions of what to place in an umbrella app, where to break things down, and how to interact with umbrella apps. I briefly discuss querying data for presentation, although that topic deserves its own post.

Umbrella Projects

Elixir umbrella projects are an excellent option for building and growing applications over time. They are a good middle-ground between the monolith and micro-services based approaches, with less overhead than micro-services, but with the benefit of cohesion that micro-services enforce. Umbrella projects are a single project that contain multiple isolated apps (what might be called a module in other technology stacks). It can be difficult to understand exactly where the boundaries should be around apps in an umbrella project, and exactly how big or small they should be.

Creating a new umbrella project can be done with mix like so:

mix new my_project --umbrella

Inside that project, you can then create a new app with a supervised process. Simply move into the apps folder and ask mix to do it for you like so:

cd apps; mix new folder_name --module ModuleName --sup

That will give you a basic supervised process in a folder called folder_name in a module called ModuleName. Obviously change those to suit your needs!

There is a sketch of a module with hello world that can be discarded or added to, but further inside is an application.ex file that has the supervision and process description:

defmodule ModuleName.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  def start(_type, _args) do
    # List all child processes to be supervised
    children = [
    # Starts a worker by calling: ModuleName.Worker.start_link(arg)
    # {ModuleName.Worker, arg},
    ]

  # See https://hexdocs.pm/elixir/Supervisor.html
  # for other strategies and supported options
  opts = [strategy: :one_for_one, name: ModuleName.Supervisor]
  Supervisor.start_link(children, opts)
  end
end

Now you can create a module with genserver behaviour and add it to the child list to get up and running. When you run the project, each of the application.ex files for each app in the umbrella project will be invoked, and the processes will be started and managed. They exist independently, yet can find and communicate with each other as needed through message passing. OTP and discussion of genserver and actor model implementations are outside of the scope of this discussion, and, if you’re here, I assume you have at least a basic understanding of the technology.

Working with Umbrella Apps

I’ll discuss this a little bit more shortly, but when I’m working in an umbrella app, I typically treat it entirely independently, running the tests inside that folder, and opening that as a separate project in my editor (emacs.) For me, re-opening the project means running a little snippet to produce a .projectile file in the root of each app‘s folder:

cd apps; find .-maxdepth 1 -type d -exec touch {}/.projectile \;

From there, I can open an app as a stand-alone project so I only see its contents. Alchemist will run the tests for only that app. I can then switch between the apps and their files quickly and easily using a blend of buffer management and projectile.

Granularity: When to Break Things into Separate Apps

When designing an application, you’ll commonly encounter the question of how big or how small to make each app in an Umbrella project. If you organized a system on post it notes describing all state-changes (events) that can occur, you would likely find that they can be grouped together around a few distinct topics. For example, an e-commerce site may have events such as ProductCreated, PriceChanged, ProductDescriptionChanged, as well as CustomerAccountCreated, CustomerInformationUpdated, and maybe ProductAddedToCard, CartEmptied. If we logically grouped these, we’d see that there are a few distinct entities that the events are related to. In Domain Driven Design terminology, these entities are generally referred to as aggregate roots. A little box can be drawn around these, and they can stand alone and independently without any coupling between them. In Elixir/OTP, it’s easy to see how we might send instances of these Commands and have them respond with the state changes (EmptyCart -> CartEmptied.) In general, these entities that our events exist around are the perfect place to draw a line and have in their own umbrella app. The little box we draw around these pieces, in DDD terminology, we refer to as the bounded context.

If you’re into Domain Driven Design, then you’ll have an intuition around what boundaries to draw around an aggregate root (a bounded context). If you have this level of thinking, then you likely know already exactly how big or small an umbrella app should be. I would suggest that, occasionally, it may make sense to have multiple bounded contexts within the same umbrella application if they are closely related, although it’s perfectly fine to make a rule that a bounded context always has its own umbrella application as well. In general, the actor model is an excellent fit for Domain Driven Design. If you’re struggling with where to break things down, I would strongly recommend investing time in reading Eric Evans’ or Vaughn Vernon’s works on the subject of DDD.

How to Work with Umbrella Apps and Bounded Contexts

Once you have some bounded contexts represented by different umbrella apps, you’ll likely not need to interact with many of them at the same time. Messages may be passed between them, so implementing a protocol between the contexts may require wiring from one to the other, but they will otherwise exist quite independently of one another. If they don’t, you may have demarcated at the wrong place. If you have the right granularity, then you should only have to put one bounded context in your head at a time.

Because these contexts exist so independently of one another, my preference for working on any one project is to treat it as an entirely unique project in my editor. For me, this means only having one project open at a time using projectile in emacs. For you that might mean opening each folder in the app as a separate project in sublime or VSCode but you can use the same approach.

 

What to Share Between Projects

Generally, we don’t share anything between projects, but I do like having a couple projects that are foundational and shared. No utilities or anything like that should be shared if at all possible. We don’t have any common util or shared project or anything like that. And if you want to be quite extreme about this, you may decide to not share any data between projects either, requiring that the aggregate root deal with any requests for data.

We have chosen to share db between the apps so that different apps can query a table through that module. We are not deploying micro-services in our use case at FunnelCloud, but if we were, we would insist on no shared data.

So we have a db project that’s used to interface with Postgres. We keep basic ecto models there and any then only very general queries. For example, to de-duplicate Kafka messages through restarts and deployments, we store the offset of the last processed message for the consumers within a bounded context. This offsets table is shared, so the model and queries are in this common Elixir module.

defmodule Db.Offset do
  use Ecto.Schema
  import Ecto.Query

  @primary_key {:id, :string, []}
  schema "offsets" do
    field(:value, :integer)
  end

def upsert(offset = %Db.Offset{}) do
  Db.Repo.insert(offset, on_conflict: :replace_all, conflict_target: :id)
end

def get(id) do
  qry =
    from(
      o in Db.Offset,
      where: o.id == ^id
    )

    Db.Repo.one(qry)
  end
end

We also have a protocol project that’s used to describe all commands and events. Because commands and events are shared between projects, it’s simpler to have the contract described and shared between them all, rather than sharing specific projects that might indirectly lead to inappropriate coupling.

We try to limit any other sharing between apps as much as possible. Otherwise, if you need to interact with a bounded context, you do so only through message passing.

What Kind of Messages Do We Send?

Working with the actor model in a purely functional language is a bit of a different paradigm, and, from a high level design perspective, it tends to look more like Object Oriented design than it does Functional Programming. Of course, in the details of the implementation, it is functional.

In Object-Oriented programming, the heuristic of good design is to bring data and behaviour together so that you tell objects what to do. Interacting with objects causes them to interact with other objects, and to change their state.

val car = new Honda.Prelude()
val person = new Person.Programmer()
person.getInCar(car)
person.lookAtOdometer(car) // 0 km/h
person.pushGasPedal(car)
sleep(1000)
person.lookAtOdometer(car) // 10 km/h

We never said “car.speed = 10.” We only told the objects what to do by issuing commands (getInCar, pushGasPedal). The objects do the the rest of the work by responding to those commands which can cause effects (state change). And the state changes that occurred could be described with events. The events, if they were emitted somewhere, might look like this:

class PersonGotInCar(person, car) extends Event
class CarAccelerated(car) extends Event

This approach of telling objects what to do by message passing is what Object Oriented programming was supposed to look like. Alan Kay, who coined the term Object Oriented, purportedly said:

“I invented the term object-oriented, and I can tell you that C++ wasn’t what I had in mind”. – Alan Kay, OOPSLA ’98

The underlying principal here is that we should tell objects what to do, not ask them about their state and changing it from outside of the object. Objects are not just data, they are the marriage of data and behaviour. The heuristic to remember is “Tell, Don’t Ask”.

TELL, DON’T ASK!

Now, functional programming looks different. In functional programming, there are no objects (save for multi-paradigm languages like Scala.) In functional programming paradigms, data and behaviour exists separately, such that functions act on data. State changes don’t occur, instead new instances of data are created by passing data into a function, and having data come out of the other side:

def push_gas(car), do: %{car | car.speed + 10}

old_car = %Car{speed: 0}
new_car = push_gas(old_car)
assert new_car.speed == 10

Purely functional applications are built by composing functions that accept and transform data without side effects or state changes. But real applications have state, and state changes over time. Enter processes/actors in Erlang/Elixir/OTP. Here, we can marry the two paradigms together because a process can hold onto some data, waiting to pass it to a function, along with a message, whenever a message is received, and then holding the output of that function and waiting for a message again. Messages can be passed to other processes that are also holding onto data, waiting to receive messages as well.

That heuristic of “Tell, Don’t Ask” that we discussed a few moments ago? It turns out that this is the heuristic that we want to use with our processes/actors in Elixir/OTP too. By sending Commands to processes, we allow a process to encapsulate state and behaviour and can build loosely coupled modules by adhering to this principal. It takes a little bit of getting used to but it’s a great way to build systems.

How to Read and Display Data?

(This is a difficult topic and there are many ways to handle this. I’ll quickly discuss my thinking, but there is no universal truth, only what works for you and your team with your knowledge and experience.)

If we need to be aware of the data inside of other processes for any reason, such as presentation, we can either choose to listen to events emitted from those processes (eg using something like Kafka to produce a queue) or we can otherwise have a read model somewhere that we can read from. For example, we could write the current state somewhere on every state change. Or we could directly query the process if we absolutely must. But separating the read concerns allows us to have a very succinct expression of the domain in the bounded context.

For a little more information on our approaches at FunnelCloud, we have event listeners set up in Rails to update the read model for pieces of the application (your usual event sourcing + CQRS architecture), and then the read model is displayed to the user. In other places, we’ve opted for a simpler approach where rails treats the data written by Elixir as read-only data for presentation to the user, while that same data is used as a recovery mechanism from the bounded context in Elixir. This approach is simpler than a pure event-sourcing implementation and works well for our use case there without the overhead of needing to maintain a journal of events. Both approaches are fine – while we do use event sourcing in some areas, real-world experience has made me a bit cautious in choosing where to use it as the journal needs to be maintained and migrated over time.

Published by

Jay k-xs

Avid quant, lover of life.

Leave a comment