After half a dozen years in data entry and analysis I can tell you that I love Excel. I use it every day, for both my job and my private life; to help me understand MFP-Copier data pricing and specifications, and mapping out my finances to create a budget. Excel has shown me I can turn almost anything into data that can be analyzed and played with in a Pivot Table. Until recently, you couldn’t tell me I should use anything else.
Over the summer I took an R programming class that took me out of my Excel comfort zone and showed me just how differently data analysis can be when applied in different programs. I used R to create all kinds of statistical models including linear regressions, histograms, cluster analyses, and prediction methods to analyze data in ways that I had never done in Excel. I got a (rough) intro into source code writing and programming language and realized how much more quickly R can perform functions than the complex Excel formulas I had been playing with for years. Data analysis had never looked so shiny and new.
Image Credit: pinterest.com
The Debate Between Excel and R for Data Analysis
I decided to cruise some blogs on the debate between Excel and R for data analysis. Based on the popularity of both programs, it seemed worthwhile to investigate other data analysts’ perspectives and why they may use R over Excel. During my searches I realized that there are several other programs that are arguably better for data analysis (Python, Matlab, SAS, SQL, etc.), but for the sake of simplicity, this blog will only dive into Excel and R. In most instances, I found much broader support for R over Excel with common themes for its preferability listed in the 6 points below:
Image Credit: umass.edu
R can handle very large datasets
Excel is limited in that there are only so many rows and columns per spreadsheet. So when you run out of rows/columns, you’re forced to move to a new tab or a new file. While it’s debatable that needing that many rows or columns of data is unlikely in most circumstances, there are cases where datasets grow over time and eventually the excel spreadsheet will not be able to contain all of that data.
Bottom line: The Excel spreadsheet is finite and this limits the datasets you can use.
R can automate and calculate much faster than Excel
Point 1 brings us to Point 2: I can’t tell you the number of times I’ve had a gigantic file crash because it contains up to 20 tabs chock-full of data, including a Pivot Table, a tab that contains over 6 years’ worth of pricing for 3,000+ products, and countless formulas throughout. Naturally, the file crashes due to the fact that Excel can handle a certain amount of data, but can barely function properly when you use it to capacity. This creates a serious problem when you start losing data because the file seems unable to save when you add any more data to it.
Bottom line: R is able to not only handle huge datasets but can still run efficiently while doing so.
R source code is reproducible
Research any number of R advocate blogs and you’ll find this point is a big one. R source codes can be used repeatedly and with very different datasets in ways that Excel formulas and VBA source codes cannot. There are statistical source codes available that can be applied to any dataset with only a few changes to code and reference data that can then be reapplied several times over very easily. While VBA can run virtually anything R can, it can be much more time consuming, and also limited similarly to Excel. R also has an advantage in that it shows the data and analysis separately, while Excel shows them together (data within formulas).This allows the user to view the data more clearly to correct any errors or see the progression of the data.
Bottom line: Reproducibility of R source code is much more advanced and easy to use than Excel or VBA.
Image Credit: trendct.orgCommunity libraries worth of R source code are available to all
R has been growing in usage and popularity over the past several years and with that, the number of users adding new functions to the available packages and libraries has also increased. This allows any R users access to not only basic statistical functions, but to an increasing number of complex new functions that may be applicable to their data. This creates a community of R users who are extending their knowledge easily to other R users who may require a similar solution to their data.
Bottom line: R promotes sharing of functions to expand libraries with new and different reproducible statistical functions.
R provides more complex and advanced data visualization
Excel can produce several types of basic graphs once you chop up and select the exact data you want to analyze. R is designed to much more easily produce graphs without all the pre-graph work, as well as provide more types of graphs than you’d ever know what to do with. Take a look here (http://shinyapps.stat.ubc.ca/r-graph-catalog/ ) to see the types of graphs R can create. Of course, Excel is perfectly sufficient when it comes to showing simple, straightforward data analysis, but R can take very complicated data and turn it into much easier to understand visual representation.
Bottom line: R can provide advanced data visualization for more complex datasets.
Image Credit: leondangio.pbworks.comR is free, Excel is not
I don’t think I have to dive too deeply into this point; R can be downloaded by anyone anywhere on any platform (even more platforms than Excel).
Bottom line: Everyone loves stuff that’s free.
Now, please don’t think that just because I wrote this blog means I’m going to delete my Excel and never look back. There are plenty of reasons to use Excel over R just as there are reasons to use any program over any another program. Excel is still a powerful tool for smaller datasets, basic data entry, simpler functions and formulas, and viewing raw data. It would likely benefit all data analysts to have a broad understanding of multiple types of programs that can be used to organize and analyze their own specific type of data. Each dataset is unique and analysis depends entirely on the user and what they’re looking to uncover. Don’t be afraid to branch out of your Excel comfort zone; there’s plenty of interesting programs to explore to improve your data analysis.