Overview
This tutorial will introduce you to the fundamentals of R programming, focusing on core data structures and basic operations.
1. Getting Started
What is R?
- Open-source programming language for statistical computing and graphics
- Used widely in data analysis, machine learning, and research
- Extensible through packages (CRAN has over 18,000 packages)
Setting Up
# Check R version
version$version.string
# Install a package (example)
install.packages("ggplot2")
# Load a package
library(ggplot2)
2. Basic Data Structures
Vectors
The most fundamental data structure in R - a sequence of elements of the same type.
# Create numeric vectors
x <- c(1, 3, 5, 7, 9) # c() = combine function
print(x)
y <- 2:6 # Colon operator for sequences
print(y)
# Create character vectors
names <- c("Alice", "Bob", "Charlie")
print(names)
# Create logical vectors
logical_vec <- c(TRUE, FALSE, TRUE, TRUE)
print(logical_vec)
# Vector operations
x + 2 # Add 2 to each element
x * y # Element-wise multiplication
sum(x) # Sum of elements
mean(x) # Mean of elements
length(x) # Number of elements
# Indexing (R uses 1-based indexing)
x[3] # Third element
x[c(1, 4)] # First and fourth elements
x[x > 5] # Elements greater than 5
Matrices
Two-dimensional arrays with elements of the same type.
# Create a matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)
print(mat)
mat <- matrix(1:12, nrow = 3, byrow = TRUE) # Fill by rows
print(mat)
# Matrix operations
dim(mat) # Dimensions (rows, columns)
nrow(mat) # Number of rows
ncol(mat) # Number of columns
# Indexing matrices
mat[2, 3] # Element at row 2, column 3
mat[, 2] # Entire second column
mat[1:2, ] # First two rows
# Matrix arithmetic
mat * 2 # Multiply all elements by 2
t(mat) # Transpose matrix
Factors
Specialized vectors for categorical data (as discussed in “R in Action” Chapter 2).
# Create a factor
gender <- c("Male", "Female", "Male", "Male", "Female")
gender_factor <- factor(gender)
print(gender_factor)
# Examine the factor
gender_factor
levels(gender_factor) # Categories
nlevels(gender_factor) # Number of categories
# Ordered factors
satisfaction <- c("Low", "Medium", "High", "Medium", "Low")
# These categories have a ranked relationship:"Low" < "Medium" < "High"
# Specified by the levels parameter.
satisfaction_factor <- factor(satisfaction,
levels = c("Low", "Medium", "High"),
ordered = TRUE)
# Check ordering
satisfaction_factor[1] < satisfaction_factor[3] # Should be TRUE
Data Frames
Tabular data structure (most commonly used in data analysis).
# Create a data frame
id <- 1:5
name <- c("Alice", "Bob", "Charlie", "Diana", "Eve")
age <- c(25, 30, 35, 40, 45)
salary <- c(50000, 60000, 70000, 80000, 90000)
# ?data.frame
# The function data.frame() creates data frames, tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software.
employees <- data.frame(id, name, age, salary, stringsAsFactors = FALSE)
# Examine the data frame
head(employees) # First few rows
str(employees) # Structure of the data frame
summary(employees) # Summary statistics
# Accessing elements
employees$age # Age column
employees[, 3] # Third column
employees[2, ] # Second row
employees[employees$age > 30, ] # Rows where age > 30
3. Basic Operations & Functions
Built-in Functions
# Math functions
sqrt(25)
log(10)
exp(1)
max(x)
min(x)
sd(x) # Standard deviation
# Statistical functions
set.seed(123) # For reproducibility
random_numbers <- rnorm(100) # 100 random numbers from normal distribution
mean(random_numbers)
median(random_numbers)
quantile(random_numbers)
Control Structures
# If-else statements
x <- 10
if (x > 5) {
print("x is greater than 5")
} else {
print("x is less than or equal to 5")
}
# For loops
for (i in 1:5) {
print(i^2)
}
# Apply functions (vectorized operations - preferred over loops)
sapply(1:5, function(x) x^2)
4. Practice Exercise
Let’s create and analyze a dataset about students:
# 1. Create the following vectors:
student_id <- 1:10
names <- c("Anna", "Ben", "Claire", "Dan", "Eva",
"Frank", "Grace", "Henry", "Ivy", "Jack")
gender <- factor(c("F", "M", "F", "M", "F", "M", "F", "M", "F", "M"))
test_scores <- c(85, 76, 92, 68, 90, 72, 88, 79, 95, 81)
# 2. Combine them into a data frame called 'students'
students <- data.frame(student_id, names, gender, test_scores)
# 3. Find the average test score
mean_score <- mean(students$test_scores)
print(paste("Average score:", mean_score))
# 4. Find how many students scored above 80
high_performers <- students[students$test_scores > 80, ]
n_high_performers <- nrow(high_performers)
print(paste("Number of high performers:", n_high_performers))
# 5. Calculate average score by gender
female_scores <- students$test_scores[students$gender == "F"]
male_scores <- students$test_scores[students$gender == "M"]
print(paste("Female average:", mean(female_scores)))
print(paste("Male average:", mean(male_scores)))
5. Resources & Next Steps
- “R in Action” by Robert Kabacoff (Chapters 1-3 for further study)
- R Documentation:
?function_name
orhelp(function_name)
- CRAN: https://cran.r-project.org/
- Online tutorials: RStudio Cheatsheets, DataCamp, Coursera