There is a big demand for analysis of data in all companies. We love data. We love analysis based on fact, and that means data. But that’s not enough.
The method of doing analysis in an environment where developer and analyst overlap goes like this:
- Find the data, make sure it is clean and consistent to the nth degree; no approximations.
- Discover the important questions
- Answer the important questions
- Think about what would be the perfect answer to those questions
- Provide the “so what?” analysis
My experience recently has been with a few large datasets where we would hope that the application of a few GROUP BYs and some graphs would let the data speak for itself.
However, I have generally found that is not the case.
You can ask specific questions and get specific answers but dashboards don’t just appear out of thin air. And in the manner of “Zen and the art of motorcycle maintenance” there are far more questions than you have time to answer them, just that not all of the questions are interesting and not all of them will produce a result that you can use to change your business.
So how do we get to the “so what?” answers?
I think that an essential part has to be a great tool that allows you to quickly try out ideas. That means the tool must be
- Quick to use, low friction on getting and manipulating data
- Plenty of “batteries included” that will make use of the work of others
- Must have a REPL or something like it that allows and encourages you to “play” with data, rather than static processes like ETL
- Must have plotting and graphics.
So I looked and saw this: and really liked it
It’s a way of turning python code (notebooks) into interactive documents with living code.
So you put your python notepad into Git and then feed the git address to this thing and bingo. It’s the cloud, innit. And the notebooks look great. Very much in the manner of Magic Ink. Also looks very familiar from my days using Mathematica (great days they were too).
Just remember, you aren’t making software here! This is the new whiteboard: you throw up some ideas, ram the data in and see what happens. Just because it’s made of code and attached to Git, it doesn’t mean that this is software you want to keep.