R(17), Data Frames

Published: by Creative Commons Licence

A data frame is a table or two-dimensional array structure, in which each column contains the value of a variable, and each row contains a set of values from each column.

characteristics

  1. the column name should not be empty.
  2. the row name should be unique.
  3. the data stored in the data frame can be number, factor, or character types.
  4. each column should contain the same number of data term.
# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame
print(emp.data) 

print the structure

We can use str() function to print the structure of data frame.

str(emp.data)

get the summary

get and print the summary

print(summary(emp.data))

extract data

extract specific column from data frame by using column name.

result <- data.frame(emp.data$emp_name,emp.data$salary)

extract specific elements by using index.

like

result <- emp.data[1:2,]

or

result <- emp.data[c(3,5),c(2,4)]

expand

we can use new column name to expand column vector.

# Add a column named "dept"
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data

If we wanna put more rows to the existing data frame permanently, we need to introduce new rows of the same structure as existing data frames and use the rbind() function.

# Create the second data frame
emp.newdata <- 	data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactors = FALSE
)

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)