R(17), Data Frames
A data frame is a table or two-dimensional array structure, in which each column contains the value of a variable, and each row contains a set of values from each column.
characteristics
- the column name should not be empty.
- the row name should be unique.
- the data stored in the data frame can be number, factor, or character types.
- each column should contain the same number of data term.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame
print(emp.data)
print the structure
We can use str()
function to print the structure of data frame.
str(emp.data)
get the summary
get and print the summary
print(summary(emp.data))
extract data
extract specific column from data frame by using column name.
result <- data.frame(emp.data$emp_name,emp.data$salary)
extract specific elements by using index.
like
result <- emp.data[1:2,]
or
result <- emp.data[c(3,5),c(2,4)]
expand
we can use new column name to expand column vector.
# Add a column named "dept"
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
If we wanna put more rows to the existing data frame permanently, we need to introduce new rows of the same structure as existing data frames and use the rbind()
function.
# Create the second data frame
emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)