Event Sourcing is a concept that becomes increasingly popular day by day. Even ThoughtWorks has brought it into it’s latest Technology Radar. Let’s do a quick overview of ES one more time.
In essence event sourcing is about persisting data in a way that preserves every single bit of information. It’s about representing objects as a sequence of events that took place through the time and led them to the current state.
For instance, if I were to persist information about my pocket money (i.e.: 67 EUR), I could simply save the latest state somewhere in a variable or database:
Balance: 100 EUR
Now, whenever there is a change, we would overwrite this value with the new value (discarding the previous one). Then at some point in time we will have something like this:
Balance: 67 EUR
Simple and elegant (and works perfectly in a large number of scenarios). However, we are performing a logical compression here (lossy one) and discarding some information. Let’s see what would happen if we were to preserve all the changes:
Got from ATM: 100 EUR
Bought metro tickets: -12 EUR
Grabbed a lunch: -8 EUR
Found a coin: 1 EUR
Took taxi: -14 EUR
Obviously, if we have such a sequence of events, we can always “reconstruct” the current balance, by doing a total:
Balance: 100 - 12 - 8 + 1 - 14 = 67 EUR
In essence, the final state (Balance) is a left-fold function of the previous states (equivalent of IEnumerable.Aggregate in .NET, std::accumulate in C++ or array.reduce in JavaScript).
Now, you might ask yourself a question - What’s the point in storing all these intermediate steps, when you can just save the final balance? This way of persistence via event sourcing possesses some really interesting features. Below are some of these.
By the way, if you are asking yourself about the performance of such an approach to storage, don’t worry. It can easily beat relational databases in both scalability and throughput (ceteris paribus).
It is really easy to save data as a stream of events. All we need is to define POCO (Plain Old CLR Object) classes (one for each event) and then serialize them to one of many formats available: Google ProtoBuf, JSON, Binary, XML etc.
Now, before you say that it takes to much code to define events:
GotMoneyFromAtm! (amount, transaction, time)
BoughtMetroTickets! (count, amount, machine, time)
GrabbedALunch! (amount, cost, time, menu, place)
FoundACoin! (amount, gps, time)
TookTaxi! (amount, rideDuration, taxiCompany, route, time)
That’s how the definitions could look like for C# language, if you used some T4 of Visual Studio (see code contracts explanation).
Given the sequence of events, we can project them to any desired structural representation. This is an extremely important feature. For instance, we could write a projection that would summarize all our expenses and produce the latest balance.
However, we can do much more:
What’s more interesting, we don’t need any really complex queries in order to do that. Writing event projections (at least in C#) is something that is quire boring. Try doing that if you have just a single field of Balance, or even if you had a list of changes (credit/debit).
However, as long as you have a steam of events, you can project it to any form, even a conventional SQL database. For instance, my favorite approach is to project event streams into JSON documents stored in a cloud storage Read more…
Events represent a serializable and immutable data structures that are appended to an append-only stream. As such, they share all the capabilities of messages. So we can:
Here are these lines of code from the production system (we are replicating from remote to cache):
var next = _cache.GetCurrentVersion();
while (true)
{
var items = _remote.ReadRecords(next, BatchSize);
if (items.Length == 0)
break;
next = items.Max(m => m.Version);
_cache.AppendNonAtomic(items);
logger(string.Format("Loaded {0} records", items.Length));
}
Of course, in a more conventional system (that does not employ event sourcing) you can leverage something like SQL Replication or Microsoft Sync Framework.
Truth to be told, performance and scalability aspects are a by-products of inherent capabilities offered by event sourcing approach. In essence, we can get almost-infinite scalability on reads with blazing throughput and no deadlocks. All this is attributed to the following facts:
What is a read model? “Balance” is one sample of a read model; “List of favorite Gaga’s restaurants” is another one. Essentially read model is some view (precomputed result of an SQL Query in SQL World).
Since we have more flexibility with projecting events and passing them around, we can easily do more interesting things, reaching up to the speeds of LMAX (which was described by Martin Fowler):
There are a few more interesting aspects of event sourcing:
There also are some financial and political benefits for project stakeholders to be interested in. They all revolve around the ability to have better flexibility in project delivery, managing resources and risks.
Ability to keep things simple, defer important decisions and adapt business solutions can be a powerful enabler in large conservative organizations. Smaller companies (such as lean startups) can also gain more competitive advantage and reduce time-to-market with such approaches.
However, here we are getting already in the area of synergy effects with CQRS/DDD methodologies and their practical application to distributed environments (esp. clouds). This is a topic for a different blog post or a talk.
Obviously, Event Sourcing is not a silver bullet, it is just a different approach to think and represent changes and data. If you are a C# or C++ developer, then this feels like going back to assembler. If you are a project manager - it’s like consciously going back from Microsoft Project Server to task lists and custom budgeting software.
This explains why there are quite a few problems with this approach.
Defining these events is a complex art of it’s own, which requires skills in domain modeling (hint: if you have a lot of events with following words in their names, then you are doing something wrong: Create, Insert, Update, Delete, Set, Change, Add). Domain-Driven Design (as both a book and a body of knowledge) is an entry point into this skill.
There is little software and hardware to support event sourcing. Luckily, we need much less of that (as compared to SQL/NoSQL), but still. In the next few years we will see interesting solutions in this field.
At the time being, there is even bigger lack of information and guidance on this body of knowledge (to be fixed within the next year).
Since we have limited information, acceptance and software, naturally there is a limited number of experienced developers with true DDD/ES skills.
All these downsides are quite surprising, since the actual principles behind event sourcing are extremely old; they have been discovered and applied in multiple areas over and over again. Even the replication of SQL (transaction logs) uses similar principles.
There are a few additional concerns that might look like downsides of approach, but in fact are not that important.
Extra storage costs - are usually negligible, when compared to the business value that might be created. For instance, cost of storing 200k events in the cloud is roughly 10 cents per month. Oh, and I’ve counted this one 10 times just for the sake of having 10 replicas in different data centers for redundancy. If this negligible cost would have saved me at least a few days of development, then this could be a bargain. However, event sourcing saved much more than that.
Slower performance is not an issue, since we can optimize IO via snapshotting and persistent read models. And leveraging push-based nature of events, we can get immediately invalidating caches. In short, there are multiple technical solutions that could be plugged later, if there is such need.
Fragility (loosing an event in the past causes the entire stream to be corrupt) is not an issue, since you can determine yourself the levels of SLAs to go for (via replication and redundancy). Corruption in any single replica can be reliably detected using git’s approach: event includes SHA1 signature computed against it’s contents and signature of the previous event.
Versioning is sometimes perceived as a problem, since our systems tend to grow and event contracts (schemas) can gradually evolve to new formats that are incompatible with the old saved events. Yet, if approached consciously this can be solved (and solution is more elegant and simple than SQL migration scripts). I use a combination of 3 elements here:
If I haven’t scared you enough with the downsides of Event Sourcing, here is some further reading.
CQRS Info by Greg Young:
A few of my own articles:
Some more materials:
As nicely said by Mike Nichols in DDD/CQRS mailing list:
My experience has been that ES and its promotion of business semantics over technological terms has a way of bending my mind toward modeling behaviors rather than essence. It also lets me avoid the ceremony of modeling state that doesn’t contribute to behaviors. I see this as a good thing and as a side effect I probably use language the business person understands more. I just can’t find a reason to use ORM in my domain anymore. ES seems to let me more rapidly model … having a change log/audit trail is about the furthest thing from my mind when I reach for it.