How to preserve number-as-character fields with write.csv (base R)
I cannot believe I have spent quite some time on it.
Preserving number-as-character fields with write.csv (base R)
But the answers are confusing and the real answer is actually mentioned in the discussion of the replies. (It was mentioned as a premise in the question but very hard to notice if someone like me is just searching for an answer.)
The question in short:
I have variables in numbers: 00, 01, 10. When exporting to a csv file, they become 0, 1, 10. How to keep these numbers as characters? This discussion on the top of google has the same title:Preserving number-as-character fields with write.csv (base R)
But the answers are confusing and the real answer is actually mentioned in the discussion of the replies. (It was mentioned as a premise in the question but very hard to notice if someone like me is just searching for an answer.)
The quick answer is:
Specify the
colClasses
options in the read.csv
function in R.- It is not about the stringsAsFactors option. Actually, as briefed in the R document: stringsAsFactors "is overridden by as.is and colClasses, both of which allow finer control."
- It is not about adding "quote".
The right approach is not on the
write.table
side but on the read.table
side.# df is the sample dataset, all the variables are factors / characters
names(df) <- c("ID", "String_1", "String_2")
# just output it normally with colnames. #"append = F" will create a new file.
write.table(df, "D:/NYU Marron/Data3_R/AnExample.csv",
col.names = T, row.names = F, append = F, sep = ",")
- Dataset:
## 'data.frame': 168 obs. of 3 variables:
## $ ID : chr "100010550000310" "100010550000410" "100010560000110" ...
## $ String_1: chr "11" "00" "11" "11" ...
## $ String_2: chr "11" "00" "11" "11" ...
- What I don't want:
- If read directly, for example, “00” would be 0, “01” would be 1.
test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
sep = ",", header = T)
str(test)
## 'data.frame': 168 obs. of 3 variables:
## $ ID : num 1e+14 1e+14 1e+14 1e+14 1e+14 ...
## $ String_1: int 11 0 11 11 11 10 0 11 11 0 ...
## $ String_2: int 11 0 11 11 11 10 0 11 11 0 ...
- What I wish to have:
- If all of them are characters, then it is easy. Use
colClasses
to claim it.
test <- read.table("D:/NYU Marron/Data3_R/AnExample.csv",
sep = ",", header = T,
colClasses = "character")
str(test)
## 'data.frame': 168 obs. of 3 variables:
## $ ID : chr "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
## $ String_1: chr "11" "00" "11" "11" ...
## $ String_2: chr "11" "00" "11" "11" ...
- If not all the variables are characters, then we can specify the structure of each column. Let _String_1 be numeric:
test2 <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
sep = ",", header = T,
colClasses = c(ID = "character",
String_1 = "numeric",
String_2 = "character"))
str(test2)
## 'data.frame': 168 obs. of 3 variables:
## $ ID : chr "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
## $ String_1: num 11 0 11 11 11 10 0 11 11 0 ...
## $ String_2: chr "11" "00" "11" "11" ...
- Another option to write it:
- Notice that the
colClasses
vector must have length equal to the number of imported columns.
test <- read.table("D:/NYU Marron/Data3_R/AnExample2.csv",
sep = ",", header = T,
colClasses = c("character","numeric","character"))
str(test)
## 'data.frame': 168 obs. of 3 variables:
## $ ID : chr "100010550000310" "100010550000410" "100010560000110" "100010560000210" ...
## $ String_1: num 11 0 11 11 11 10 0 11 11 0 ...
## $ String_2: chr "11" "00" "11" "11" ...
There are more technical discussions on this colClasses option in this post: Specifying colClasses in the read.csv. But the key points have already been illustrated above.
Comments