I’ve said many a time that doing data analysis in R is not programming, and yet I have the habits of a lifetime to undo.
This is not about encapsulation and abstraction, it’s about getting a result. Less about unlearning programming and more that I need to learn the idioms of data analysis, and the way to use data analysis tools like RStudio.
I’m working on a quite involved piece of R code that I’ve been tinkering with and adding too for a few days, and it’s getting a little bit big. I mean it’s only a few hundred lines, but already (as with Powershell) without the structures of modules, classes and namespaces it’s getting messy.
So big, in fact I’ve started to separate sections with big comment banners like this:
When you have a comment banner, you have a problem.
Global variables are not evil
This is such a piece of folklore amongst developers, like “GOTO considered harmful” that it’s hard to give up.
My program follows the classic data analysis model:
(from R for data science)
I’ve just realised what I should be doing: I should have a few R script files, each that does a different bit, but they can communicate via global variables!
No! Wait, come back!
RStudio is not an IDE!
RStudio isn’t for programming. It’s a data science whiteboard, for playing with ideas. You keep the data in the environment while you are working; like a clipboard. They aren’t even global to the script! They are outside of the script, they even survive when you shut RStudio and start it again…
Use the global. I think that RStudio projects are perfect for embracing the global variable