DevOps

Recap: DevOpsDays Cape Town 2018

This year I finally was able to attend the 3rd DevOpsDays in Hometown Cape Town (Woodstock). As the name suggests, DevOpsDays is a conference about all DevO(o)psie things over two days (20-21 Sept 2018) in Cape Town. Since I was speaker (tricked by Kyle 😜), it was more of an 2.5 days event, which is why I am quite happy it ended into a long weekend.

DevOpsDays Cape Town

First, lets give the event some love:

Then, just as important, big/huge thanks and shoutout to this years organisers:

Additionally there were quite some volunteers who did an amazing Job. I unfortunately completely forgot to introduce myself properly – bad me / apologies – but I do hope they are all being credited here: https://www.devopsdays.org/events/2018-cape-town/contact/. Eugene was at least quite prominent there – so huge thanks to him.

Both days are available to watch via YouTube – and as to my understanding there will be still some final editing done to be able to watch every talk separately.

 

DevOpsDays Cape Town 2018 – Day 1 – Full Video:

DevOpsDays Cape Town 2018 – Day 2 – Full Video:

 

My talk: Serverless with Google App Engine (or how I delivered 15M page views for only 10 USD)

If you are interested in my talk, I uploaded my slides on to SlideShare (does that make me old-school? What do the new-kids of the block do nowadays??).

(100% Elastic Beanstalk Bashing free troll)

 

Some words to the event itself

It was A-M-A-Z-I-N-G! Very well organised, great talks, great people. If you were a speaker the conference started already a night before (19th of Sept) with the speakers dinner at The Wild Fig.

It was my first time at the DoubleTree Hotel in Woodstock – so entering it felt more like going into a Mall, with some shops, a small Restaurant and even a corner shop. Some arrows or banners would have been great here. It was also not really clear to me that the actual conference rooms were on the second floor.

There were two foreign speakers, one from Facebook, Marek D, about Monitoring – which resonated very much with my mentality (avoid noise + only actionable alerts) and Joe N. from Portland with a very very nice Keynote.

There were some great toys from awesome Sponsors, which I shamelessly bagged in for my Kids (also this is most probably the only picture I have from the event 😅):

Networking

In between the talks, and latest at the “Afterparty” at The Woodstock Lounge one had the opportunity to connect. The Afterparty was in a great location with free beer & food. It was completely reserved for DevOps Days and lots of AWSome chats happening in the inside and outside area.

Shoutouts to some of the amazing people I (re-)met:

 

Conclusion

As Adrian said nicely in the closing session: DevOpsDays feels like a family reunion. The talks are a great reminder in what boat we are all in together. But the chats in the hallway or the chats at the “Afterparty” is where you really should put your money at.

If you are new to all of this but would like to get started & involved, I highly recommend joining the Cape Town DevOps Meetup and our beloved ZATech Slack Chat. See you all next year 👋!

 

(please contact me if you did a recap as well so we can link each other!)

Deploying DB migrations with confidence

What role does your database play in your CI/CD?
How long does it take for your devs to get a running database?
How long does it take to recover a dev-database in case of an accidentally destruction?
How current are the database snapshots your devs use?
How confident are you with schema updates going to production?

These days DevOps conferences and talks are filled up with containerisations, docker, k8s, auto-scaling, auto-healing, ci/cd, agile, etc – disappointingly however, most of them only touch stateless environments and far too seldom do engineers share their knowledge on running a database in a CI/CD environment & workflow.

In this blog article I will give some info on how we solved it at my current place – which mainly consists of running a web-application based on Laravel on a traditional “LAMP”-stack.

I’ll be honest, we had a rough start. We used to have a shared AWS RDS for our QA & Staging environments which was also then be used by developers to connect their local workspace to that remote MySQL instance and be able to view the webapp locally with proper, non-seeded data – which sometimes is also just essential to debug and fix certain types of reported bugs.

So, our current state kinda worked, but was super unreliable. It was enough for a dev to accidentally drop the database or a botched staging deploy to suddenly kill the workflow of whole team. Restoring took over an hour (mix of larger-than-your-usual-wordpress-database and budget-restrictions-on-dev-instances).

Obviously this was super annoying and had to change. So I went over to my friends at the ZATech Slack channel but I was quickly hitting a wall, and in contrary it seems like I stepped on some people toes: I learned my lesson, never mention “on-premise” nearby a DevOps engineer (causes hefty allergic reactions)…

Basically the following two statements were made:

  • everything is in the cloud, no on-premise or no local databases
  • DB should be part of the CI/CD

It was difficult for me to agree on the first point, being based in South Africa there are absolutely no proper cloud providers – next hop is AWS London. And anyone who has ever connected his local webapp to a remote MySQL knows how quick a higher latency (>10ms) can make working locally a pita.

While I do agree that the DB should be part of the CI/CD, there is still a huge benefit (especially in efficiency and speed) when developing locally – and also not having to rely on seeded data.

Disappointed of no solutions I decided to go on my own against all odds, and with the support of our CTOs + wonderful person in our finance to allocate some budget for on-premise hardware (specs for the geeks like me: i7-6700 / 64GB RAM / 4x 256GB SSD @RAID10 / UPS).

Step 1: create a database service

We will use the database service to actual host the databases. I use Jenkins to nightly run a simple downstream job that mysqldump‘s production database (it ignores some larger tables that are not needed), anonymises the data (emails + mobile-numbers), pushes the dump to a predictable location (which is accessible internally by devs).

From there, the database service will launch three (one shared amongst devs, one for experimental tests cases / usage, one for our automated builds – see step 2) VMs that have MySQL running on them, import the above dump, then create a snapshot of the storage drive. I use Virtualbox as I had extensive experiencing using it in a programmatic way, but if I’d redo the architecture I would most probably do it with libvirt/qemu.

I created a small web interface as well:

database services (dbs)

With database services (dbs) the following goals have been achieved:

  • a developer has access to an anonymised production database that is never older than 24hrs
  • the dev can either download the dump and run it on its local machine, or directly connect to “dbs” (database services) – which will be especially fast from within the office
  • due to the usage of snapshots, should anything happen to the database it is possible to restore the state of last night in less than a minute (!!) – which is much faster than any AWS RDS snapshot restore and it does not involve any config changes (e.g. in-place restore)
  • Staging & QA still use a shared DB in the cloud, however due to the separation, issues on either side do not interfere with the whole team

Dbs has been running for quite a while and it solved a good amount of issues. However we were still getting the occasional botched staging deploy or failed master-build due to us only running a very optimistic/superficial check on database migrations.

This is due to us only running artisan migrate (laravel.com/docs/5.4/migrations) against a empty database in our CI builds (for predictability reasons). Meaning, builds would only fail if there was a PHP or SQL syntax error, not if the migration itself were faulty on production data. The easiest way to demonstrate a fail would be to add a unique-index on a column – perfectly fine on a empty database, not so much on production with potential duplicate values already existing.

Step 2: run builds against prod data

The safest way to make sure that your database migrations are sound & proof is to actually run it against production data, as that is what will be ultimately the case on a production deploy anyways.

Fortunately we do not need to run every build against prod snapshot, as we are only interested if anything within the /database/migrations/ folder changes.

I created an additional Jenkins job that runs on every PR and with the help of a little bash + the Github API, I can check if a migration was actually part of the code changes or not, and only then will the build further proceed.

I am taking advantage of dbs from step 1, which due to the fast restore capability I can run artisan migrate nearly every minute without the DB losing its original state, which is important for repeatable builds of course.

Once done, it will report back the time it took, which is a nifty indicator if the db migration is something heavy where a elevated error rate might be expected or not:

github build statuses

The console output of the job gives a little more indication of what is happening and why the build got triggered:

dbunit.sh

 

Setting up a proper db build pipeline and fully integrating it in our CI brought in the following goals:

  • full confidence in any database migrations being introduced
  • full visibility on the duration of database migrations as a “pre warning” on potential problems later on the production deploy
  • due to the usage of “dbs” (e.g. real restorable snapshots) this can be done cheaply and fast (3 min builds) even for larger databases (>10GB)

 

So curious: what problems did you have to solve for your database workflow / environment, and with what solutions did you come up with? 🙂