R(19), Data Interface - CSV
We will learn how to read data from CSV data, and how to write data into CSV file.
We should make sure that the CSV file exists in the current working directory and read files from there. Of course, we can also set our own directory and read files from there.
get the working directory
getwd()
: check the current working directory
setwd()
: set current working directory
about CSV file
A CSV file is a text file in which the values in the columns are separated by commas(,
), like:
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
Read
we use read.csv()
function to read available CSV files in the current working directory.
like:
data <- read.csv("input.csv")
By default, the type of data
in the code block is data frame.
Analysis
is.data.frame()
: to see whether the type of data is data frame.ncol()
: get the number of columnsnrow()
: get the number of rowsemax()
: get the maximum of elements in a vector
we use $
to connect the data and the name of column.
sal <- max(data$salary)
We can get rows that meet certain filter conditions, using subset()
.
# get the detail of the person who has max salar
retval <- subset(data, salary == max(salary))
info <- subset(data, salary > 600 & dept == "IT")
Write
R can be used to create existing data frames in the form of CSV files. To create CSV data, we use function write.csv()
, and the file will be create in the current working directory.
# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
# print the newdata
print(newdata)
result:
X id name salary start_date dept
1 3 3 Michelle 611.00 2014-11-15 IT
2 4 4 Ryan 729.00 2014-05-11 HR
3 5 NA Gary 843.25 2015-03-27 Finance
4 8 8 Guru 722.50 2014-06-17 Finance
the column X
comes from the origin data, we can delete it with additional parameters when writing.
write.csv(retval,"output.csv", row.names = FALSE)
then the result will be:
id name salary start_date dept
1 3 Michelle 611.00 2014-11-15 IT
2 4 Ryan 729.00 2014-05-11 HR
3 NA Gary 843.25 2015-03-27 Finance
4 8 Guru 722.50 2014-06-17 Finance