Uncategorised

Recap: DevOpsDays Cape Town 2018

This year I finally was able to attend the 3rd DevOpsDays in Hometown Cape Town (Woodstock). As the name suggests, DevOpsDays is a conference about all DevO(o)psie things over two days (20-21 Sept 2018) in Cape Town. Since I was speaker (tricked by Kyle 😜), it was more of an 2.5 days event, which is why I am quite happy it ended into a long weekend.

DevOpsDays Cape Town

First, lets give the event some love:

Then, just as important, big/huge thanks and shoutout to this years organisers:

Additionally there were quite some volunteers who did an amazing Job. I unfortunately completely forgot to introduce myself properly – bad me / apologies – but I do hope they are all being credited here: https://www.devopsdays.org/events/2018-cape-town/contact/. Eugene was at least quite prominent there – so huge thanks to him.

Both days are available to watch via YouTube – and as to my understanding there will be still some final editing done to be able to watch every talk separately.

 

DevOpsDays Cape Town 2018 – Day 1 – Full Video:

DevOpsDays Cape Town 2018 – Day 2 – Full Video:

 

My talk: Serverless with Google App Engine (or how I delivered 15M page views for only 10 USD)

If you are interested in my talk, I uploaded my slides on to SlideShare (does that make me old-school? What do the new-kids of the block do nowadays??).

(100% Elastic Beanstalk Bashing free troll)

 

Some words to the event itself

It was A-M-A-Z-I-N-G! Very well organised, great talks, great people. If you were a speaker the conference started already a night before (19th of Sept) with the speakers dinner at The Wild Fig.

It was my first time at the DoubleTree Hotel in Woodstock – so entering it felt more like going into a Mall, with some shops, a small Restaurant and even a corner shop. Some arrows or banners would have been great here. It was also not really clear to me that the actual conference rooms were on the second floor.

There were two foreign speakers, one from Facebook, Marek D, about Monitoring – which resonated very much with my mentality (avoid noise + only actionable alerts) and Joe N. from Portland with a very very nice Keynote.

There were some great toys from awesome Sponsors, which I shamelessly bagged in for my Kids (also this is most probably the only picture I have from the event 😅):

Networking

In between the talks, and latest at the “Afterparty” at The Woodstock Lounge one had the opportunity to connect. The Afterparty was in a great location with free beer & food. It was completely reserved for DevOps Days and lots of AWSome chats happening in the inside and outside area.

Shoutouts to some of the amazing people I (re-)met:

 

Conclusion

As Adrian said nicely in the closing session: DevOpsDays feels like a family reunion. The talks are a great reminder in what boat we are all in together. But the chats in the hallway or the chats at the “Afterparty” is where you really should put your money at.

If you are new to all of this but would like to get started & involved, I highly recommend joining the Cape Town DevOps Meetup and our beloved ZATech Slack Chat. See you all next year 👋!

 

(please contact me if you did a recap as well so we can link each other!)

Deploying DB migrations with confidence

What role does your database play in your CI/CD?
How long does it take for your devs to get a running database?
How long does it take to recover a dev-database in case of an accidentally destruction?
How current are the database snapshots your devs use?
How confident are you with schema updates going to production?

These days DevOps conferences and talks are filled up with containerisations, docker, k8s, auto-scaling, auto-healing, ci/cd, agile, etc – disappointingly however, most of them only touch stateless environments and far too seldom do engineers share their knowledge on running a database in a CI/CD environment & workflow.

In this blog article I will give some info on how we solved it at my current place – which mainly consists of running a web-application based on Laravel on a traditional “LAMP”-stack.

I’ll be honest, we had a rough start. We used to have a shared AWS RDS for our QA & Staging environments which was also then be used by developers to connect their local workspace to that remote MySQL instance and be able to view the webapp locally with proper, non-seeded data – which sometimes is also just essential to debug and fix certain types of reported bugs.

So, our current state kinda worked, but was super unreliable. It was enough for a dev to accidentally drop the database or a botched staging deploy to suddenly kill the workflow of whole team. Restoring took over an hour (mix of larger-than-your-usual-wordpress-database and budget-restrictions-on-dev-instances).

Obviously this was super annoying and had to change. So I went over to my friends at the ZATech Slack channel but I was quickly hitting a wall, and in contrary it seems like I stepped on some people toes: I learned my lesson, never mention “on-premise” nearby a DevOps engineer (causes hefty allergic reactions)…

Basically the following two statements were made:

  • everything is in the cloud, no on-premise or no local databases
  • DB should be part of the CI/CD

It was difficult for me to agree on the first point, being based in South Africa there are absolutely no proper cloud providers – next hop is AWS London. And anyone who has ever connected his local webapp to a remote MySQL knows how quick a higher latency (>10ms) can make working locally a pita.

While I do agree that the DB should be part of the CI/CD, there is still a huge benefit (especially in efficiency and speed) when developing locally – and also not having to rely on seeded data.

Disappointed of no solutions I decided to go on my own against all odds, and with the support of our CTOs + wonderful person in our finance to allocate some budget for on-premise hardware (specs for the geeks like me: i7-6700 / 64GB RAM / 4x 256GB SSD @RAID10 / UPS).

Step 1: create a database service

We will use the database service to actual host the databases. I use Jenkins to nightly run a simple downstream job that mysqldump‘s production database (it ignores some larger tables that are not needed), anonymises the data (emails + mobile-numbers), pushes the dump to a predictable location (which is accessible internally by devs).

From there, the database service will launch three (one shared amongst devs, one for experimental tests cases / usage, one for our automated builds – see step 2) VMs that have MySQL running on them, import the above dump, then create a snapshot of the storage drive. I use Virtualbox as I had extensive experiencing using it in a programmatic way, but if I’d redo the architecture I would most probably do it with libvirt/qemu.

I created a small web interface as well:

database services (dbs)

With database services (dbs) the following goals have been achieved:

  • a developer has access to an anonymised production database that is never older than 24hrs
  • the dev can either download the dump and run it on its local machine, or directly connect to “dbs” (database services) – which will be especially fast from within the office
  • due to the usage of snapshots, should anything happen to the database it is possible to restore the state of last night in less than a minute (!!) – which is much faster than any AWS RDS snapshot restore and it does not involve any config changes (e.g. in-place restore)
  • Staging & QA still use a shared DB in the cloud, however due to the separation, issues on either side do not interfere with the whole team

Dbs has been running for quite a while and it solved a good amount of issues. However we were still getting the occasional botched staging deploy or failed master-build due to us only running a very optimistic/superficial check on database migrations.

This is due to us only running artisan migrate (laravel.com/docs/5.4/migrations) against a empty database in our CI builds (for predictability reasons). Meaning, builds would only fail if there was a PHP or SQL syntax error, not if the migration itself were faulty on production data. The easiest way to demonstrate a fail would be to add a unique-index on a column – perfectly fine on a empty database, not so much on production with potential duplicate values already existing.

Step 2: run builds against prod data

The safest way to make sure that your database migrations are sound & proof is to actually run it against production data, as that is what will be ultimately the case on a production deploy anyways.

Fortunately we do not need to run every build against prod snapshot, as we are only interested if anything within the /database/migrations/ folder changes.

I created an additional Jenkins job that runs on every PR and with the help of a little bash + the Github API, I can check if a migration was actually part of the code changes or not, and only then will the build further proceed.

I am taking advantage of dbs from step 1, which due to the fast restore capability I can run artisan migrate nearly every minute without the DB losing its original state, which is important for repeatable builds of course.

Once done, it will report back the time it took, which is a nifty indicator if the db migration is something heavy where a elevated error rate might be expected or not:

github build statuses

The console output of the job gives a little more indication of what is happening and why the build got triggered:

dbunit.sh

 

Setting up a proper db build pipeline and fully integrating it in our CI brought in the following goals:

  • full confidence in any database migrations being introduced
  • full visibility on the duration of database migrations as a “pre warning” on potential problems later on the production deploy
  • due to the usage of “dbs” (e.g. real restorable snapshots) this can be done cheaply and fast (3 min builds) even for larger databases (>10GB)

 

So curious: what problems did you have to solve for your database workflow / environment, and with what solutions did you come up with? 🙂

ELK on AWS ElasticSearch + ElasticBeanstalk + Laravel

NewRelic is a fantastic tool to get great insights of your application happenings and services surrounding it. It collects a massive amount of data and makes it easy accessible. Almost every metric and dashboard they offer is crucial to any DevOps or Cloud Engineer.

Now that Elastic acquired Packetbeat, which is essentially similar in the functionality to NewRelic’s agent (e.g. you can now collect data not only anymore from log files, but system metrics and external services via network sniffing), can the ELK stack, as open source alternative, replace NewRelic?

tl;dr: almost 🙂

I already did a post back in 2015 when I first got in touch with the ELK stack, this time however I will go a little more in detail and offer a full installation guide bringing together the following components:

  • ELK (ElasticSearch, Logstash & Kibana)
  • AWS ElasticSearch Service
  • ElasticBeanstalk (via ebextension)
  • Laravel (exception logs)
Conveniently Amazon Web Services now offers ElasticSearch as a Service, so it is no longer necessary to maintain a self-hosted version on EC2.

1) Create ElasticSearch Domain

The setup is pretty boring, but you might want to do something along the following screenshots:
Set the name of the ElasticSearch instance
 Set the ElasticSearch cluster dimension/size.
 Set the ElasticSearch storage.
 
In our setup we will not communicate directly to ElasticSearch, but instead instances will communicate via filebeat (formerly known as logstash-forwarder) to a Logstash instance. Hence we only need to whitelist the public and internal IP of the Logstash instance (see step 2).
 We end up receiving our ElasticSearch endpoint. Remember: AWS ships with Kibana pre-installed – for your convenience.

2) Create SSL certificate

We will need a SSL certificate to establish a secure and authenticated connection between agent/instance and Logstash. This might not be needed if you are running everything within the same VPC, though.
The next few steps get very surreal.. but trust me, it works. Please set the correct IP of your Logstash instance:

(openssl.cnf)

[ req ]
#default_bits  = 2048
#default_md  = sha256
#default_keyfile  = privkey.pem
distinguished_name = req_distinguished_name
attributes  = req_attributes
req_extensions = v3_req

[ req_distinguished_name ]
countryName   = Country Name (2 letter code)
countryName_min   = 2
countryName_max   = 2
stateOrProvinceName  = State or Province Name (full name)
localityName   = Locality Name (eg, city)
0.organizationName  = Organization Name (eg, company)
organizationalUnitName  = Organizational Unit Name (eg, section)
commonName   = Common Name (eg, fully qualified host name)
commonName_max   = 64
emailAddress   = Email Address
emailAddress_max  = 64

[ req_attributes ]
challengePassword  = A challenge password
challengePassword_min  = 4
challengePassword_max  = 20

[ v3_req ]
subjectAltName=@alt_names
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer
basicConstraints = CA:true

[alt_names]
IP.1 = XXX.XXX.XXX.XXX

And then do the following steps:

 

$ sudo mkdir -p /etc/pki/tls/certs
$ sudo mkdir /etc/pki/tls/private
$ sudo openssl req -x509 -nodes -days 3650 -newkey rsa:4096 
-keyout /etc/pki/tls/private/logstash.key 
-out /etc/pki/tls/certs/logstash.crt 
-config /etc/ssl/openssl.cnf 
-extensions v3_req

$ sudo chown logstash: /etc/pki/tls/private/logstash.key /etc/pki/tls/certs/logstash.crt
$ sudo chmod 600 /etc/pki/tls/private/logstash.key /etc/pki/tls/certs/logstash.crt

 

The whole custom configuration is necessary so the certificate can be correctly verified by both the Logstash and beats. Basically we are creating a self authorized certificate with the IP of Logstash as SAN (Subject Alternative Name – IP).

3) Logstash

Next we will need an EC2 instance that will run Logstash, thus be responsible for receiving logs & metrics from our application servers and passing them through to our ElasticSearch endpoint.
It won’t need a lot of resources, so you can start with a t2.medium and work yourself up if needed.Additionally we are going to host a nginx reverse-proxy for the Kibana endpoint. This will allow us to “bridge” the auth-system of AWS and instead replace it with our own simple http-auth.
Logstash is a Java application, so you will have to install it first – if you are on Ubuntu or Debian you can use my java ansible role to do so 🙂
Use something similar to the following as your nginx vhost config:
(nginx-vhost.conf)

 

server {
  listen 80;
  server_name kibana.acme.com;

  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;

  auth_basic "/dev/null";
  auth_basic_user_file /etc/nginx/htpasswd.conf;
  proxy_set_header Authorization "";

  location /.kibana-4 {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com;
  }

  location ~* ^/(filebeat|topbeat|packetbeat)- {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com;
  }

  location ~ ^/_(aliases|nodes)$ {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com;
  }

  location ~ ^/.*/_search$ {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com;
  }

  location ~ ^/.*/_mapping$ {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com;
  }

  location / {
    proxy_pass https://search-webapplogs-xxx.eu-west-1.es.amazonaws.com/_plugin/kibana/;
  }
}

 

Now download and install Logstash:

$ wget https://download.elastic.co/logstash/logstash/packages/debian/logstash_2.1.1-1_all.deb
$ sudo dpkg -i logstash_2.1.1-1_all.deb

The following Logstash config files have to be put under /etc/logstash/conf.d/

$ wget 
https://raw.githubusercontent.com/elastic/beats/master/topbeat/etc/topbeat.template.json 
https://raw.githubusercontent.com/elastic/beats/master/packetbeat/etc/packetbeat.template.json 
https://raw.githubusercontent.com/logstash-plugins/logstash-output-elasticsearch/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json
$ mv elasticsearch-template.json /etc/logstash/filebeat-template.json
$ sed -i 's/logstash/filebeat/' /etc/logstash/filebeat-template.json

(01-beats-input.conf)

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/pki/tls/certs/logstash.crt"
    ssl_key => "/etc/pki/tls/private/logstash.key"
  }
}

This will accept connections from beats on port 5044 if SSL certificate matches.

(10-syslog.conf)

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    
    syslog_pri { }
    
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

Simple syslog configuration/grok.

(11-apache.conf)

filter {
  if [type] == "apache" {
    grok {
      match => { "message" => "%{IP:clientip} - - [%{HTTPDATE:timestamp}] %{HOSTNAME:domain} "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} %{NUMBER:bytes:int} "(?:%{URI:referrer}|-)" %{QS:agent}" }
    }

    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    }

    if [clientip] {
      geoip {
        source => "clientip"
        target => "geoip"
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
      }
      
      mutate {
        convert => [ "[geoip][coordinates]", "float" ]
      }
    }
  }
}

Apache access-log configuration. Will also try to resolve the clientip to a geolocation.

(12-laravel.conf)

filter {
  if [type] == "laravel" {
    multiline {
      pattern => "^["
      what => "previous"
      negate=> true
    }

    grok {
      match => { "message" => "(?m)[%{TIMESTAMP_ISO8601:timestamp}] %{WORD:env}.%{LOGLEVEL:severity}: %{GREEDYDATA:content}" }
    }

    mutate {
      replace => [ "message", "%{content}" ]
      remove_field => [ "content" ]
    }
  }
}

Multi-line Laravel exception logs parser.

(30-es-output.conf)

output {
  elasticsearch {
    hosts => ["search-webapplogs-xxx.eu-west-1.es.amazonaws.com:80"]
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
    template_overwrite => true
    template => "/etc/logstash/filebeat-template.json"
    template_name => "filebeat"
  }
}

Finally push it to our ElasticSearch endpoint.

Lets give it a try:

$ sudo /etc/init.d/logstash restart

Manually set index templates for topbeat and packetbeat:

$ curl -XPUT 'http://search-webapplogs-xxx.eu-west-1.es.amazonaws.com/_template/topbeat' -d@topbeat.template.json
$ curl -XPUT 'http://search-webapplogs-xxx.eu-west-1.es.amazonaws.com/_template/packetbeat' -d@packetbeat.template.json

4) ElasticBeanstalk ebextension

As with my other ebextensions, I like writing the heavy part in pure bash, this also allows me to enable certain ebextensions on a project to project basis by setting the activator params/envvars.

(12-beats.config)

# beats
#
# Author: Gunter Grodotzki 
# Version: 2016-01-18
#
# install and configure beats
# BEATS: enable
container_commands:
  01-beats:
    command: ".ebextensions/beats.sh"

(beasts.sh)

#!/bin/bash
#
# Author: Gunter Grodotzki (gunter@grodotzki.co.za)
# Version: 2016-01-18
#
# install and configure beats

set -e

if [[ "${BEATS}" == "enable" ]]; then

  export HOME="/root"
  export PATH="/sbin:/bin:/usr/sbin:/usr/bin:/opt/aws/bin"

  # lets do everything inside .ebextensions so it will clean itself
  cd .ebextensions

  # set optimized LogFormat
  sed -i '/^s*LogFormat/d' /etc/httpd/conf/httpd.conf
  sed -i '/^s*CustomLog/d' /etc/httpd/conf/httpd.conf

  cat <<'EOB' > /etc/httpd/conf.d/10-logstash.conf
SetEnvIf Remote_Addr "::1" dummy
SetEnvIf Remote_Addr "127.0.0.1" dummy
LogFormat "%a - - %t %{Host}i "%r" %>s %B "%{Referer}i" "%{User-Agent}i"" combined
CustomLog "logs/access_log" combined env=!dummy
EOB

  # add bash_history logging
  echo 'PROMPT_COMMAND='"'"'history -a >(tee -a ~/.bash_history | logger -t "$USER[$$]")'"'"'' > /etc/profile.d/logstash.sh

  # add key
  mkdir -p /etc/pki/tls/certs

  cat <<'EOB' > /etc/pki/tls/certs/logstash.crt
ENTER HERE THE CONTENT OF THE SSL CERTIFICATE WE CREATED
EOB

  # install beats
  packages=( filebeat-1.0.1 topbeat-1.0.1 packetbeat-1.0.1 )
  for package in "${packages[@]}"; do
    if ! rpm -qa | grep -qw ${package}; then
      rpm -i ${package}-x86_64.rpm
    fi
  done

  # configure filebeat
  cat < /etc/filebeat/filebeat.yml
filebeat:
  prospectors:
    -
      paths:
        - "/var/log/secure"
        - "/var/log/messages"
      document_type: syslog
    -
      paths:
        - "/var/log/httpd/access_log"
      document_type: apache
    -
      paths:
        - "/var/app/current/storage/logs/laravel*"
      document_type: laravel
output:
  logstash:
    hosts: ["IP.OF.LOGSTASH:5044"]
    tls:
      certificate_authorities: ["/etc/pki/tls/certs/logstash.crt"]
EOB

  # configure topbeat
  cat <<'EOB' > /etc/topbeat/topbeat.yml
input:
  period: 10
  procs: [".*"]
  stats:
    system: true
    proc: true
    filesystem: true
output:
  logstash:
    hosts: ["IP.OF.LOGSTASH:5044"]
    tls:
      certificate_authorities: ["/etc/pki/tls/certs/logstash.crt"]
EOB

  # configure packetbeat
  cat <<'EOB' > /etc/packetbeat/packetbeat.yml
interfaces:
  device: eth0
  type: af_packet
protocols:
  memcache:
    ports: [11211]
  mysql:
    ports: [3306]
  redis:
    ports: [6379]
output:
  logstash:
    hosts: ["IP.OF.LOGSTASH:5044"]
    tls:
      certificate_authorities: ["/etc/pki/tls/certs/logstash.crt"]
EOB

  # start + enable beats
  /etc/init.d/filebeat restart > /dev/null 2>&1
  /etc/init.d/topbeat restart > /dev/null 2>&1
  /etc/init.d/packetbeat restart > /dev/null 2>&1
  chkconfig filebeat on
  chkconfig topbeat on
  chkconfig packetbeat on

fi

5) Kibana

The first time you visit your Kibana installation in your browser you will have to add the beats inputs (filebeat-*, topbeat-* and packetbeat-*) as seen here:

6) Curation

The way how ELK works, data will keep on growing. Mainly because of costs you might want to throw away older logs.

You can easily do this with curator and a cronjob:

$ sudo apt install python-pip python-dev
$ sudo pip install pyasn1
$ sudo pip install --upgrade ndg-httpsclient
$ sudo pip install elasticsearch-curator

Run at midnight:

$ curator --port 80 --host search-webapplogs-xxx.eu-west-1.es.amazonaws.com delete indices --older-than 35 --time-unit days --timestring '%Y.%m.%d'

DONE! Phew… wowses.. Creating all those fancy dashboards are out of this scope though. You can try to bootstrap your Kibana with ready made configurations: elastic/beats-dashboards.

As of now I wasn’t able to get packetbeat working with RDS. And there are still some features missing to fully replace NewRelic (though other features are much better – like actually searching for logs) – but I am very keen on seeing what might still come this year.

Update (2016-02-03):

I actually forgot to do some stuff which meant geo_point and some stuff on topbeat/packetbeat were not working 😉

The Rise and Fall of PHPclasses

I started playing around with PHP somewhere between 1999 and 2000 which does not necessarily make me a good PHP developer, but I have seen a lot of PHP History. One of them is PHPclasses, which existed way before GitHub, Packagist and all the Fanboying in the PHP community.

It was my go to destination for PHP libraries as I had a early dislike for PEAR (for no real reason actually) and there was no real alternative (except maybe SourceForge??). I even “contributed” a shitty class (SOCKS5 Client) in 2008 (did a modernization on GitHub), but even though PHPclasses was in my bookmarks (and I occasionally stumbled on it via Google) it was always very annoying to use:

  • forced registration to view source code
  • some great talented developers, but also a lot of low quality “libraries” (usually mixed up with hardcoded HTML)
  • nothing against advertisement, but the placement combined with the “layout” did not make it fun to browse
  • shitty layout was shitty back then (even then it felt like from the 80s), the recent relaunch still feels like from the 80s, but with more graphics
  • no repository
  • mainly something like “one author = one library” as opposed to “collaboration” which you will often find on GitHub

Today Phil Sturgeon, whom I had the pleasure to meet at the PHP mini conference in Cape Town this year (no idea who he was before that, but funny outgoing guy, and seems to know more about PHP programming than I do) started a wave that many of us developers were not saying out loud:

PHPclasses.org sucks!

 

Do me a favour. Tweet this way with your opinion; A) You use and love @PHPclasses. B) You wish it would fuck the fuck off. C) Other.
— Phil Sturgeon (@philsturgeon) August 4, 2014

The response on Twitter is huge, and you can most probably guess which answer was voted most.

Funny enough Manuel Lemos decided to chip in, turns out he sucks* as well. Very stubborn and failing to acknowledge feedback, he still insists that forcing user registration for viewing source code is the right thing to do.
Sites like GitHub actually give a perfect example that you can perfectly have guests view source code and only force registration for “ratings” or “subscribtions”. And there is not less “fame” for the developers.

@philsturgeon That is your opinion and it is OK for you to disagree with the other near 1.3 million registered users of @phpclasses .
— Manuel Lemos (@manuellemos) August 4, 2014

Anyhow, the same reason why PHP: The Right Way should be preferred over old sites like tizag or tutorialspoint, if you are new to PHP, please do not go to PHPclasses.org but use Packagist (you do not even need to use composer) or GitHub to find useful PHP libraries that will help you accelerate development on your next project.

I do hope PHPclasses.org will get a grip and do a massive relaunch, but I doubt someone as stubborn as Manuel can pull it off. The decline of his Alexa Ranking should indicate that “the-PHPclasses-Way” will not last forever.

*Update: Matthias N. noted in the comments that it sounded mean. I just wanted to clarify: this should not be meant as a personal attack on ML, as I do not know him personally and therefore would have no reason to attack him personally. So saying “he sucks” is my simplified opinion about him based on how he has been reacting to criticism for a long time. It is a shame since his website had quite some large potential & reach.
Additionally, while I am very “harsh” in my post, I am also not shy to wish him the best with the project and hopefully a “turn” in his current path.

Killing MySQL Slow Queries with Xcache

I currently manage a high traffic Image Hoster with 10 million Page-Impressions per day causing high load on the Web Frontend Server and the DB Backend Server for some months now. My budget did not allow me to scale horizontally so I had to optimize the web application by killing a slow MySQL query with the usage of Xcache. Due to the website’s structure I was not able to use the Smarty Caching function as this would easily generate 2 million files and cause high disk i/o.

Pictures are just more than words.. so have a look at the screenshot of the MySQL-Server’s load before and after the optimizations (which went online on 6th October) – its like day and night 😉

Our Web Server uses Lighttpd 1.5 / SVN + PHP-FPM 5.3.3 (guys.. spawn-fgci is deprecated 😉 ) + Xcache (PHP Accelerator and varcache) to deliver static files and dynamic pages which connect to a separate MySQL 5.0 Server.

Unfortunately with a load of 5-10 (8 CPU cores) and 60-100% CPU usage on each core (!!) our MySQL Server was pretty much overloaded 😉 .

A big downside when having bottlenecks in your PHP-Script – usually caused when relying on external resources (like file_get_contents, cURL, massive non-asynchronous DNS-Lookups, MySQL queries, etc.) – is obviously the much higher execution time. This results in having a lot of PHP (or even worse Apache) processes being spawned or in use. You will easily get a over filled backlog or in worse case your Web server will start swapping – either way your website will slow down dramatically and you will lose a lot of visitors.

At first I checked the php-fpm.log.slow for scripts with too long execution times, just to make sure that this was not a PHP problem. There were a lot of scripts hanging during mysql_query() – so it was pretty clear where to look next.

Next I took a look at the MySQL Slow Query log and summarized queries which appeared most of the time. I was able to filter out the following query (simplified):

1
SELECT DISTINCT col1, col2, FROM table WHERE col3 = col4 AND id IN (SELECT id FROM table2 WHERE x = $variable) AND (SELECT id FROM table3 WHERE a = $variable) OR col5 = 1

A query with two sub-SELECTs and DISTICNT did not sound fast to me – especially not in that frequency it was requested – which was the key factor, as querying it on an empty MySQL-Server did not cause any problems .

Before putting the query into Xcache I checked all conditions and figured out that “… OR col5 = 1″ was never true, as currently no data had that value. I decided that if some feature / function based on that condition was not used since years, it will not be needed in future anyway, so I removed it.

Now I was finally ready for Xcache. This is only a very simple example how to cache individual SQL-Queries like

1
SELECT * FROM table WHERE name = '$variable'

in your PHP-Script:

1
2
3
4
5
6
7
8
9
if(xcache_isset("prefix_" . md5($variable)))
{
    $result = xcache_get("prefix_" . md5($variable));
}
else
{
    $result = mysql_query("SELECT * FROM table WHERE name = '$variable'");
    xcache_set("prefix_" . md5($variable), $result, (60 * 60 * 6));
}

So from now on, every SQL-Query will only be done once every 6 hours. Remember: we don’t want to fill up our Xcache for no reason.. so try only to SELECT columns which we really need. The biggest advantage with Xcache in contrary to other caching systems (e.g. Smarty Cache) – it has a garbage collector! So you don’t need to worry about zombie cache entries. Just try not to go out of memory, e.g. assign enough memory for your needs in the php.ini under the xcache section.

And set a reasonable time-to-live, not too short so enough data gets cached and load goes down, but not too long which could cause a too high memory usage.

Thats all 🙂

I was able to lower the load and CPU usage of our MySQL server by approx. 850%! How about you, were you able to optimize your website? Show off your awesomeness! I did by committing with following comment into svn 🙂