R(21), Data Interface - Others

Published: by Creative Commons Licence

Binary File

writeBin(object, con)
readBin(con, what, n)
  • con: a connection object that read or writes to a binary file
  • object: a binary file that will be written
  • what: the mode of bytes
  • n: the number of bytes read from the binary
writeBin(colnames(new.mtcars), write.filename)
column.names <- readBin(read.filename, character(),  n = 3)

XML File

If we wanna load XML files, we need to load packages named "XML" and "methods".

library("XML")
library("methods")

read XML file by using xmlParse(), and the file will be stored in R as a list.

xmlParse(): give the input file name to the function.
xmlRoot(): extract the root node from the xml file.
xmlSize(): get the number of nodes in the root.

We can use the index to get the detail of the node.

In order to process data effectively in large files, we read the data in XML file as data frame, and then the data frame is processed for data analysis.

We can use function xmlToDataFrame() to convert the input XML file to a data frame.

xmldataframe <- xmlToDataFrame("input.xml")

Because the data frame can be used as data frame, then we can use functions related to data frames to read and write XML files.

JSON File

We need to install packages named "rjson" in console.

install.packages("rjson")

We use fromJSON() to read JSON file in R, and it will be stored in R as a list.

result <- fromJSON(file = "input.json")

We still convert the JSON file to a data frame, using as.data.frame().

json_data_frame <- as.data.frame(result)

Web

If we wanna handle URL and Links, we need use some packages.

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("plyr")

Examples:

# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files. 
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename) {
   filedetails <- str_c(mainurl,filename)
   download.file(filedetails,filename)
}

# Now apply the l_ply function and save the files into the current R working directory.
l_ply(filenames,downloadcsv,mainurl = "http://www.geos.ed.ac.uk/~weather/jcmb_ws/")

Database

R makes it easy to connect to many relational databases, and then get records from them as data frames. Once the data is available in the R environment, then it becomes a normal R language dataset and can be manipulated or analyzed using all the powerful packages and functions.

Here, we take the MySQL as the reference database to connect to R.

A package named "RMySQL" is needed.

install packages("RMySQL")

Connect R and MySQL.

# Create a connection Object to MySQL database.
# We will connect to the sample database named "sakila" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost')

# List the tables available in this database.
dbListTables(mysqlconnection)

Query Tables: The query is executed in MySQL and the result set is returned using fetch() function in R. Finally, it is stored as a data frame in R.

# Query the "actor" tables to get all the rows.
result = dbSendQuery(mysqlconnection, "select * from actor")

# Store the result in a R data frame object. n = 5 is used to fetch first 5 rows.
data.frame = fetch(result, n = 5)

# Then print the data.frame to show the result
print(data.frame)

use select statements to implement filter query.

result = dbSendQuery(mysqlconnection, "select * from actor where last_name = 'TORN'")

# Fetch all the records(with n = -1) and store it as a data frame.
data.frame = fetch(result, n = -1)

use dbSendQuery() to update the rows in MySQL.

dbSendQuery(mysqlconnection, "update mtcars set disp = 168.5 where hp = 110")

insert data into a table

dbSendQuery(mysqlconnection, "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb) values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)")

create tables: we can use dbWriteTable() to create a table. If there is already a table, then the origin table will be covered.

# Create the connection object to the database where we want to create the table.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost')

# Use the R data frame "mtcars" to create the table in MySql.
# All the rows of mtcars are taken inot MySql.
dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)

delete the table in MySQL database: we pass the drop table statement to dpsendquery()

dbSendQuery(mysqlconnection, 'drop table if exists mtcars')