R(19), Data Interface - CSV

Published: by Creative Commons Licence

We will learn how to read data from CSV data, and how to write data into CSV file.

We should make sure that the CSV file exists in the current working directory and read files from there. Of course, we can also set our own directory and read files from there.

get the working directory

getwd(): check the current working directory setwd(): set current working directory

about CSV file

A CSV file is a text file in which the values in the columns are separated by commas(,), like:

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
 ,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

Read

we use read.csv() function to read available CSV files in the current working directory.

like:

data <- read.csv("input.csv")

By default, the type of data in the code block is data frame.

Analysis

  • is.data.frame(): to see whether the type of data is data frame.
  • ncol(): get the number of columns
  • nrow(): get the number of rowse
  • max(): get the maximum of elements in a vector

we use $ to connect the data and the name of column.

sal <- max(data$salary)

We can get rows that meet certain filter conditions, using subset().

# get the detail of the person who has max salar
retval <- subset(data, salary == max(salary))
info <- subset(data, salary > 600 & dept == "IT")

Write

R can be used to create existing data frames in the form of CSV files. To create CSV data, we use function write.csv(), and the file will be create in the current working directory.

# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")

# print the newdata
print(newdata)

result:

  X      id   name      salary   start_date    dept
1 3      3    Michelle  611.00   2014-11-15    IT
2 4      4    Ryan      729.00   2014-05-11    HR
3 5     NA    Gary      843.25   2015-03-27    Finance
4 8      8    Guru      722.50   2014-06-17    Finance

the column X comes from the origin data, we can delete it with additional parameters when writing.

write.csv(retval,"output.csv", row.names = FALSE)

then the result will be:

      id    name      salary   start_date    dept
1      3    Michelle  611.00   2014-11-15    IT
2      4    Ryan      729.00   2014-05-11    HR
3     NA    Gary      843.25   2015-03-27    Finance
4      8    Guru      722.50   2014-06-17    Finance