People have written entire books about better code, and I certainly still have plenty to learn myself. Nonetheless, I do have some advice for improving your code. This is aimed both at writing code for take-home tasks and at writing code more generally.
Writing Code from the Inside Out
Over time, I’ve come to think of code as something I write from the inside out, rather than something I write from the top down. The core of my code is “what’s the stupidest way I can solve this problem”. You should start there (and commit your work there). Once you’ve done that, you can make things nicer. You can clean up what you wrote and add on some nice bells and whistles. You break things up into smaller functions and maybe multiple files. You take the variable you hard coded and pass it as a parameter. You write some tests.
It’s true that, with experience, your starting point may get a bit less clunky. However, I would caution against chasing this perfect starting point! Trying to get things nice on the first pass is asking for trouble. I’ve seen quite a lot of take-home tasks where people start out really carefully investigating and cleaning the data, with many comments and tests, and then just run out of time to solve the main problem. I would describe this, in contrast, as writing from the top down. This is a bad habit not just for take-home tasks, but more generally. When you write code professionally, you have many things that need doing and only so much time to do all of them. If you write the perfect version of one task, you may just not finish the other five things you were supposed to do that week. As our Head of Analytics Lucy Griffin says, perfection is the enemy of excellence.
Writing Clean Code
It can sound a bit trivial, but clean code is really, really important. Fundamentally, code is harder to read than it is to write. Code that’s easy to read can last a long time. Code that’s hard to read will die in obscurity. Clean code means that when someone else (including yourself three months from now, who basically counts as a different person) comes to look at your code, they’re not totally lost. If you write code that’s truly a one-off, you can write the messiest, weirdest code you like. But most of the time, when you’re writing professionally, this isn’t the case.
If you’re not familiar with writing clean code, have a look at something like Refactoring Guru.
Using Version Control
Even if you’re working solo, git is your friend! When you first start coding, especially for academia, you tend to write bits of code that run once and then disappear into the aether. When you’re writing code that runs more than once, or that really needs to be right, you want to keep track of what you did over time. This is particularly useful if multiple people are working on a project, especially when they’re tackling different tasks at the same time. This is generally called “version control” and tends to be done with a tool called git. If you’ve never had a look at git before, there’s no time like the present! This is a great resource for getting started.
Working in an IDE
On a related note, have a look at an integrated development environment (an IDE) like IntelliJ or VSCode. These can not only help you work with git but can do many other really useful things for you. Getting started with your first IDE can be a total pain but I promise it’s worth the heartache. Often people find their first encounter so miserable that they just throw up their hands and go back to their Jupyter Notebooks. Notebooks certainly have their place but don’t despair! The beginning really is the horrible bit, and persistence pays off.
With all of this having been said, I’ll finish with a few suggestions for ways to make your code look nicer without loads of extra effort.
- Your code should run end to end without any errors or warnings. The first thing I do when I get a take-home task is run it and see what happens. I can sometimes forgive a warning, but your code should never throw an error. Also, if you’ve set the code to just ignore warnings, I’ll comment that out and see what I get because that’s cheating. Just because you don’t know about a problem doesn’t mean it doesn’t exist.
- Your IDE will have a shortcut to autoformat your code, and also to optimise imports. Do this! It only takes a second and it make everything look much more professional.
- Give your variables good names. As one of the principles of clean code, your variable names should describe to someone else what they’re looking at. Abbreviations are where hope goes to die.
- Make sure you read the directions! This may seem obvious, but I get a lot of take-home tasks that seem to have gone down a bit of a rabbit hole and forgot the specific things we asked for. Just take a step back and check in with the original instructions from time to time, to make sure you’re actually solving the problem at hand!
Getting your foot in the door of any industry can be a frustrating experience, and sometimes I think it can feel like there are secret rules that nobody will tell you. I can’t claim to be the arbiter of such rules, and I’m sure opinions will differ from one person to another. However, I’ve tried to capture some of the points that would have been helpful to me when starting out my own career in data science, and that hopefully will be helpful to you. I would finally like to offer the observation that hiring is an extremely imperfect process. When I first started looking for jobs, I found rejection gutting. Having been on the other side of things for a while now, I wouldn’t give rejection a second thought. We work really hard at Featurespace to make things as fair as we can: we have rubrics and guides and retrospectives and debates, and we care a lot about getting it right. Though I’m clearly biased, I think we hire great people. But I’m also confident we reject great people, because so much of hiring is subjective. Beyond formatting your CV or using git, by far the most useful skills in job hunting are resilience and persistence. Maybe I’ll get to see some of you in the application pool for future data scientists at Featurespace!
Read Part 1: “Starting Your Data Science Career – Background Knowledge“