QCON London 2017
I found this experience really nice – to be able to get some insights on how people have solved big issues and how they got pwned. What is more, it was good to see that I am personally going in the right direction (or at least, let’s say, getting on the trends)
These are technologies and trends that got my attention as the most used:
- Luigi/Airflow – Tools for pipelining jobs
- Java 9
- Data Science
- Machine learning for prediction and fraud detection
- Data Science
I chose to follow two main paths, with some extra things from here and there: Machine Learning/Data science, Microservices, and let’s call it “miscellaneous”. Below, an overview of the most interesting things:
Machine Learning/Data Science
Lots of exciting things were presented here: From beginners stuff like how to clean the data, to high level scientific languages, like Julia, used mainly at universities around the world for research, but finding its way to the industry.
Here, we saw in a really high level, how google used deep learning to help the mobile gmail users response to an email, by composing candidate responses based on recognition of patterns. Spoiler alert: they used a top secret number of emails to train the system, with a top secret architecture, with a top secret algorithm to anonymize the data 😉
On the more basic/introductory topics, we got some advice on how to manage pipelines with Luigi/Airflow, in order to make the models ready for production. There were two more things that surprised me: first, we heard about use cases where the data was not even properly typed – true type can be different than schema type. Never thought that this would happen in the industry. Secondly , the fact that the scientific community is highly reticent to work with some engineering common practices, like unit testing. The only way the speaker found to convince them to use it, was by telling them that the unit testing works like a safe net to upgrade the libraries – They love to use the most recent library (And I don’t blame them!). Lesson learned: Identify the strengths of each practice, and match them to create added value.
The key learnings from here are:
- See the data as immutable as possible.
- All the data is important: Keep it, transform it, don’t overwrite it. Of course, this comes with costs, we got to learn how to deal with them.
- Do pipelines, do unit testing, do routines that are easy to take to production. Corollary: Do data science as the engineer you are, not the scientist that you are not.
- Data science seems to be 80% time clean the data, and 20% complaining of cleaning the data. Corollary: Data is always dirty until proven clean (no matter what the provider -client, project leader, producer- says).
The most relevant thing here was a talk given by people from the BBC, about how they migrated monolithic to decompose it. The main issue was the low speed of some of their dependencies – systems accessed by http requests that took up to 5 seconds to respond, and never less than 1 second. Add that to the processing of the actual request, plus 25,000 requests per second and you have a problem. The breakpoint for this was a day when, coincidentally with the broadcasting of the final episode of a cooking TV show, the system went down, and the public was furious. Five stressful hours went on and on, serving 500’s to the majority of the requests, without being able to do anything, since the issue was on the backend dependencies, on which their product relied. Long story short, after analyzing the data, they saw that it didn’t change a lot – for example, metadata for episodes: This kind of data is created every time an episode is released, meaning a day or even a week, so they built separate systems for caching, separating the read and the write of data, and the parts with higher load were easier to scale. Finally, after iteratively decomposing the monoliths, they ended up with independently deployable, scalable, and failable components, with a reduction of response time from 5 seconds to 10 milliseconds, but also with some drawbacks, like unrequired services and the delay caused by slow https connections. The technologies they used are node.js, redis, ELK, and Amazon web services.
During the first session of each day, the keynote handled some interesting topics. The day one the topic was the security on the IoT, where some failed events were exposed: A hotel in Austria which suffered a ransom attack, locking out all the rooms – 2 times. Yes, same hotel, same attack, 2 times. Or the recent case of some internet connected Teddy bears, that exposed thousands of conversations between children and their parents to the internet. Or an attack to automatic pet feeders that made them stop working for a whole day. “The cat dies should never be a failure recovery method”, said the speaker. Having said all these things, we quickly saw how the industry is failing to provide even basic security to the customers using IoT stuff. Three are the main challenges: Security, refresh rates, develop standards.
The keynote for the second day was called “Engineering you”, rather a self-improvement talk. The main points were:
- What to learn and master? Data structures, math, algorithms, communication.
- How to learn it? Get time by automating repetitive stuff, use feedback cycles, experiment and measure, be honest, revisit and redefine.
On the third day, the keynote “concurrent past, distributed future” reminded us of how “old” these concepts are: they come from the 1960’s. The takeaways were:
- Have the communications in mind
- Take care of imbalanced producer/consumer queues.
- Safety is elusive.
- Use patterns, and design the architecture for failure.
We got some insights on the new features for Java 9:
- Better memory management.
- Improved performance for locking and graphics.
- Integration of Reactive programming – Publisher/subscriber, reactive streams, multiple consumers.
- Factories for collections.
- Private methods for interfaces.
- Removing deprecated code. Yes, embrace yourself, code deletion is coming!
What I didn’t like? The talks of some big companies were a bit disappointing: Lack of details and full of common places / empty phrases (phrases like “trust your people”, “empower your people”, “embrace change and adapt” – I didn’t fly to London to hear that yet another time). I understand that some stuff is top secret – Don’t make a conference about it, then 😉
What did I like? To get firsthand experience from people leading the industry, and to get a perspective on machine learning.
Regarding the “conference user experience”, it was good – nice coffee breaks, nice events at the end of the day, rooms with enough seats places, and nice food. Bad thing: there was no place to eat, that made it a little bit uncomfortable. I would come back if possible, and I will recommend it.
Trackback from your site.