How to preserve number-as-character fields with write.csv (base R)

- June 12, 2018

I cannot believe I have spent quite some time on it.

The question in short:

I have variables in numbers: 00, 01, 10. When exporting to a csv file, they become 0, 1, 10. How to keep these numbers as characters? This discussion on the top of google has the same title:
Preserving number-as-character fields with write.csv (base R)

But the answers are confusing and the real answer is actually mentioned in the discussion of the replies. (It was mentioned as a premise in the question but very hard to notice if someone like me is just searching for an answer.)

The quick answer is:

Specify the colClasses options in the read.csv function in R.

It is not about the stringsAsFactors option. Actually, as briefed in the R document: stringsAsFactors "is overridden by as.is and colClasses, both of which allow finer control."
It is not about adding "quote".

The right approach is not on the write.table side but on the read.table side.

# df is the sample dataset, all the variables are factors / characters
names(df) <- c("ID", "String_1", "String_2")
# just output it normally with colnames. #"append = F" will create a new file. 
write.table(df, "D:/NYU Marron/Data3_R/AnExample.csv",
          col.names = T, row.names = F, append = F, sep = ",")

Dataset:

## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" ...
##  $ String_1: chr  "11" "00" "11" "11" ...
##  $ String_2: chr  "11" "00" "11" "11" ...

What I don't want:
If read directly, for example, “00” would be 0, “01” would be 1.

test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
                   sep = ",", header = T)
str(test)

## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : num  1e+14 1e+14 1e+14 1e+14 1e+14 ...
##  $ String_1: int  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: int  11 0 11 11 11 10 0 11 11 0 ...

What I wish to have:
If all of them are characters, then it is easy. Use colClasses to claim it.

test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
                   sep = ",", header = T,
                   colClasses = "character")
str(test)

## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: chr  "11" "00" "11" "11" ...
##  $ String_2: chr  "11" "00" "11" "11" ...

If not all the variables are characters, then we can specify the structure of each column. Let _String_1 be numeric:

test2 <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
                   sep = ",", header = T,
                   colClasses = c(ID = "character",
                                  String_1 = "numeric",
                                  String_2 = "character"))
str(test2)

## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: num  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: chr  "11" "00" "11" "11" ...

Another option to write it:
Notice that the colClasses vector must have length equal to the number of imported columns.

test <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
                   sep = ",", header = T,
                   colClasses = c("character","numeric","character"))
str(test)

## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: num  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: chr  "11" "00" "11" "11" ...

There are more technical discussions on this colClasses option in this post: Specifying colClasses in the read.csv. But the key points have already been illustrated above.

Search This Blog

Statistics and Data Analysis