R(30), Chi-square Distribution

Published: by Creative Commons Licence

Chi-square test is a statistical method to determine whether there is significant correlation between two categorical variables.

function:

chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)
  • x: a numeric vector or matrix. x and y can also both be factors.
  • y: a numeric vector; ignored if x is a matrix. If x is a factor, y should be a factor of the same length.
  • correct: a logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables: one half is subtracted from all |O - E| differences; however, the correction will not be bigger than the differences themselves. No correction is done if simulate.p.value = TRUE.
  • p: a vector of probabilities of the same length of x. An error is given if any entry of p is negative.
  • rescale.p: a logical scalar; if TRUE then p is rescaled (if necessary) to sum to 1. If rescale.p is FALSE, and p does not sum to 1, an error is given.
  • simulate.p.value: a logical indicating whether to compute p-values by Monte Carlo simulation.
  • B: an integer specifying the number of replicates used in the Monte Carlo test.
# Load the library.
library("MASS")

# Create a data frame from the main data set.
# car.data <- data.frame(Cars93$AirBags, Cars93$Type)

# Create a table with the needed variables.
car.data = table(Cars93$AirBags, Cars93$Type) 
print(car.data)

# Perform the Chi-Square test.
print(chisq.test(car.data))

table() uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.