QCON London 2020

March 17, 2020

Architecture

The last week I attended to the QCon London 2020 Conference (thanks mimacom for the opportunity!). It was a cool experience where I had the chance to learn from others' experience on software engineering. These are the notes I made for the talks that caught my attention the most:

Why Distributed Systems Are Hard (by Denise Yu)

Why do we use distributed systems anyway? Well, they provide a series of advantages, e.g. :
- By having different systems, clear boundaries are set to separate domain concerns, which enables the separation of the domains to different teams
- Teams can release independently
- The resilience of the systems can be improved
When using monoliths, the scaling of the computing is vertical, but there are physical limitations on how fast a CPU can run. But distributed systems allow horizontal scaling: adding more nodes, thus improving the availability of the system.
But what are the hard parts about it? Latency, memory access, and partial failure. Also, the fallacies of distributed computing.
One other risk is that some part of the system is always at risk of failing, making distributed total consensus impossible. In order to mitigate this, we could limit who can write to a shared resource at any point in time. If you are using leader-follower (aka slave-master) pattern, make sure to use a proper consensus algorithms, e.g. raft
One last recommendation was given: Optimize for the humans operating the system: From the abstract of the talk, "The opinions and biases and assumptions of the humans that designed the system are baked into the system. So if you remove the humans, you don't remove that humanness. But by zooming out and focusing on human factors and acknowledging that the humans who build the systems, designed the systems and run the systems are part of that. And they are also a point of failure."

If you want to learn more, the references used for the presentation are at https://deniseyu.io/qcon.

Even if this talk was nice in general, I got lost sometimes: I missed some structure on the presentation of problems vs solutions.

Managing for Mental Wellbeing in the Tech Industry (by Michelle O'Sulivan)

What is Mental Wellbeing? It is when you are able to:
- Realise your own achievements and potentials
- Cope with the normal stress of life
- Work productively and fruitfully
- Making a contribution to the community
It was claimed that the cost of mental ill-health annually for a business is equivalent to 2300 GBP per employee.
Good work has a key positive effect on the mental health.
What makes up Workplace Wellbeing?
- Demand
- Control
- Support
- Relationships
- Role
- Change
Talk to the people! good and bad mental health look different on each individual, i.e. ask whether they are ok or not.
Learn more: The study "Thriving at Work: a review of mental health and employers" was mentioned in this talk. In their own words, "Thriving at Work sets out what employers can do to better support all employees, including those with mental health problems to remain in and thrive through work."
Other recommendations on what to do next are:
- Assess psychosocial risks
- Make a plan that includes management of workplace risks
- Train your line managers on managing mental wellbeing
- Get a senior leader champion - storytelling

I found the latter "Get a senior leader champion - storytelling" particularly important: Once a someone perceived as a senior in their role gets to tell how they have suffered mental illness and how they managed it, other people on the organization becomes more receptive and open to talk about their experience, creating a snowball effect that can change the work culture of a company.

How to Debug Your Team (by Lisa van Gelder)

What do you do when your software is not performing well? you debug it! you can also do that with your team.
As per the book Drive: The Surprising Truth About What Motivates Us, high performing teams have:
- Mastery
- Autonomy
- Purpose
- Safety
Mastery
- Stop assigning stories to the specialized members off the team, and instead let the story with highest priority be taken by the next available team member. Of course, this will cause a slow down on the beginning, but the beneficial part of it is leveling up your team on the long run, and it avoids having bottle-necks caused by the unavailability of a given specialized team member.
- Pair programming.
- Have a skill matrix to provide a clear path of progression and identify what skills are missing. Also soft skills should be considered.
Safety
- Promote a no-blame culture.
Learnings
- There cannot be autonomy without mastery and purpose.

Speeding Up ML Development with MLFlow (by Hien Luu)

I found this talk particularly interesting because on my experience with Machine Learning projects, it is quickly noticeable that a tool to handle the lifecycle of the models is required, for example, to track which model was trained with which data set, or to perform experiments on the parameters to train the model and track the results for later comparision, being this true also for choosing which data has more influence on the output.

It was not mentioned on the talk, but some of the alternatives to MLFlow are PipelineAI and Polyaxon. Here you can find a nice discussion regarding the differences against other tools from the point of view of an MLFlow contributor.

It was claimed that on Machine Learning (ML) projects, 5% of the code is actual ML code whilst 95% is infrastructure.
The best improvements on performance come from a quick iterative loop: evaluate models and feature engineering.
MLFlow is an open source platform for the machine learning lifecycle. It provides:
- Tracking - To record and query experiments, with references to code, data, configuration, and results.
- Projects - Packaging format to enable reproducible runs on any platform.
- Models - Simplify model deployment (e.g. per REST API, or deploying to different cloud vendors), specified details like what framework was used to create the model
- Model Registry - manage the lifecycle of a model

Keep Calm and Secure Your CI/CD Pipeline (by Sonya Moisset)

Cybersecurity is the techniques of protecting computers, networks, programs, and data from unauthorised access or attacks that are aimed for exploitation.
An example of a security problem was seen during the presidential campaign in the US on 20 - - on a donation website, a javascript file was being downloaded directly from a Github repository, making it vulnerable to a simple pull request.
Another example is regarding how engineers can be social-engineered: an open source NPM module was injected with malicious software in 2018
How to get started with cybersecurity? Take a look at the OWASP, "a nonprofit foundation that works to improve security of software", where, for example, a list of proactive controls is suggested:
- Define security requirements
- Leverage security frameworks and libraries
- Secure database access
- Encode and escape data
- Validate all inputs
- Implement digital identity
- Enforce access controls
- Protect data everywhere
- Implement security logging and monitoring
- Handle all errors and exceptions
Some suggestions to prevent security holes
- Remove unused dependencies and components
- Continuously inventory the version of components and dependencies
- Only obtain components and dependencies from official sources over secure links
- Monitor for components and dependencies that are unmaintained or do not create security patches for older versions
Tools to integrate to the github - CI/CD Pipeline to improve security
- Codecove, codacy, codefactor, deepscan - For static analysis and test coverage
- LGTM , guardrails, Snyk - code analysis platform to identify vulnerabilities
- Pull request size - to measure the size of the pull requests
- Datree - enforce coding standards and security policies
- DepShield, dependabot - Detect vulnerabilities on dependencies
- Rollbar - Error tracking
- Lighthouse (Google developers webtools), guardrails, webhint - Audit or scan your website for vulnerabilities
Use Content Security Policy (CSP) to mitigate certain types of attacks, like XSS or data injection
Use Subresource Integrity to enable the browser to verify that resources they fetch are delivered without unexpected manipulation
Overview
- Opensource is an attack vector
- Leverage existing tools available on github
- Do not push keys on github (or any other version control system)

Other nice thoughts shared or provoked during talks

As IT Professionals, we are part of sociotechnical systems
On Machine Learning projects, most of the value comes from the parts that are not automated.
On Machine Learning projects, it is important to ask: are the purpose and the implementation of the project ethical?
Legacy code is the price of success: it means the project survived long enough to have it.
Legacy code teaches us about leadership: it describes how old leaders solved the problems they had.
Designing Software Architecture is like placing bets on the aspects we want to optimize for - add some principles/patterns to optimize the outcome.
Software Engineering is the art of making compromises.
How the backend is structured also shapes the user experience.

Conclusions

On this blog post, I posted some of the notes I made while attending the talks that I found the most interesting at the QCon 2020. It was a good "conference experience" - The organizers set the tone properly to make sure everyone feels welcome by setting a clear code of conduct. Also, measures were taken to reduce the risks regarding the current corona virus situation.