How to preserve number-as-character fields with write.csv (base R)

I cannot believe I have spent quite some time on it.

The question in short:

I have variables in numbers: 00, 01, 10. When exporting to a csv file, they become 0, 1, 10. How to keep these numbers as characters? This discussion on the top of google has the same title:
Preserving number-as-character fields with write.csv (base R)

But the answers are confusing and the real answer is actually mentioned in the discussion of the replies. (It was mentioned as a premise in the question but very hard to notice if someone like me is just searching for an answer.)

The quick answer is:

Specify the colClasses options in the read.csv function in R.
  • It is not about the stringsAsFactors option. Actually, as briefed in the R document: stringsAsFactors "is overridden by as.is and colClasses, both of which allow finer control." 
  • It is not about adding "quote".
The right approach is not on the write.table side but on the read.table side.
# df is the sample dataset, all the variables are factors / characters
names(df) <- c("ID", "String_1", "String_2")
# just output it normally with colnames. #"append = F" will create a new file. 
write.table(df, "D:/NYU Marron/Data3_R/AnExample.csv",
          col.names = T, row.names = F, append = F, sep = ",")
  • Dataset:
## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" ...
##  $ String_1: chr  "11" "00" "11" "11" ...
##  $ String_2: chr  "11" "00" "11" "11" ...
  • What I don't want: 
  • If read directly, for example, “00” would be 0, “01” would be 1.
test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
                   sep = ",", header = T)
str(test)
## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : num  1e+14 1e+14 1e+14 1e+14 1e+14 ...
##  $ String_1: int  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: int  11 0 11 11 11 10 0 11 11 0 ...
  • What I wish to have:
  • If all of them are characters, then it is easy. Use colClasses to claim it.
test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
                   sep = ",", header = T,
                   colClasses = "character")
str(test)
## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: chr  "11" "00" "11" "11" ...
##  $ String_2: chr  "11" "00" "11" "11" ...
  • If not all the variables are characters, then we can specify the structure of each column. Let _String_1 be numeric:
test2 <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
                   sep = ",", header = T,
                   colClasses = c(ID = "character",
                                  String_1 = "numeric",
                                  String_2 = "character"))
str(test2)
## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: num  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: chr  "11" "00" "11" "11" ...
  • Another option to write it:
  • Notice that the colClasses vector must have length equal to the number of imported columns.
test <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
                   sep = ",", header = T,
                   colClasses = c("character","numeric","character"))
str(test)
## 'data.frame':    168 obs. of  3 variables:
##  $ ID      : chr  "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
##  $ String_1: num  11 0 11 11 11 10 0 11 11 0 ...
##  $ String_2: chr  "11" "00" "11" "11" ...
There are more technical discussions on this colClasses option in this post: Specifying colClasses in the read.csv. But the key points have already been illustrated above. 


Comments

Popular posts from this blog

How to Draw Heatmap with Colorful Dendrogram

Power-law distribution (Pareto)& Zipf's Law: connection and how to fit the distribution of global city population

SAS’s Best Subset Selection by Mallows's Cp is actually Stepwise?