full credit to udemy for providing this photo online here
Setup
This guide was written in R 3.2.3.
R & R Studio
Packages
Next, to install the R packages, cd into your workspace, and enter the following, very simple, command into your bash:
R
This will prompt a session in R! From here, you can install any needed packages. For the sake of this tutorial, enter the following into your terminal R session just to see how it works:
install.packages("")
What is R?
R is a powerful language used primarily for data analysis and statistical computing. R has what we call packages
, which can used for almost any data science task. Packages like dplyr
, tidyr
, readr
, data.table
, SparkR
, ggplot2
have made data manipulation, visualization and computation much easier and faster.
Why use R?
- It's open source
- 7800 packages available for computation tasks
- High performance computing experience
Comments
In the context of computer science, comments are used for providing details throughout your code. They're particularly useful when you're working on something complex and want to remember why or what you did, as well as for when other people need to read your code and don't have you to explain it to them.
In R, we denote comments with the #
symbols, such as follows:
# This is a comment!
Data in R
At the core of R is the data we use, and its different forms. In this section, we'll review the different data types R supports and when to use each. But first, we'll begin with variables.
Variables
Imagine you had no memory to store information you need on a regular basis. That would be miserable, right? You'd have to relearn everything so you can reference it in whatever context you need it for. In R and any other programming language, the form of 'memory' we use to reference information (or data) is with variables.
Variable are composed of two parts: its variable name, and its variable value. The variable name is how you reference whatever piece or collection of data you need. The are names we assign values to. Why do we want to do this? Because without variables, we don't have a way of referencing and using data. Value can be many things, including another variable, but in most cases, the value is a data type.
In R, there are actually two ways of assigning values: =
and <-
. Typically though, we use <-
, such as my_val <- 4
.
Data Types and Operators
Every programming language needs to store data and a way to work with this data. R, like other languages, breaks these data into types and provides different ways to interact with them.
Everything you see or create in R is an object. A vector, matrix, data frame, even a variable is an object. R treats it that way. So, R has 5 basic classes of objects, including:
- Character
- Numeric (Real Numbers)
- Integer (Whole Numbers)
- Complex
- Logical (True/False)
These classes have attributes, such as the following:
- names
- dimension names
- dimensions
- class
- length
Attributes of an object can be accessed using attributes()
function. We will get into what functions are later.
Challenge
Let's try a challenge together:
Assign three variables called var1
, var2
, and var3
to the values 1
, "Byte"
, and 5.43
.
var1 <- 1
var2 = "Byte"
var3 <- 5.43
print(var1)
print(var2)
print(var3)
[1] 1
[1] "Byte"
[1] 5.43
Stay tuned for the next post on R data collections!