Quantcast
Channel: What can R do about a messy data format? - Stack Overflow
Viewing all articles
Browse latest Browse all 7

Answer by Zheyuan Li for What can R do about a messy data format?

$
0
0
md_table <- scan(text = "+------------+------+------+----------+--------------------------+|    Date    | Emp1 | Case | Priority | PriorityCountinLast7days |+------------+------+------+----------+--------------------------+| 2018-06-01 | A    | A1   |        0 |                        0 || 2018-06-03 | A    | A2   |        0 |                        1 || 2018-06-03 | A    | A3   |        0 |                        2 || 2018-06-03 | A    | A4   |        1 |                        1 || 2018-06-03 | A    | A5   |        2 |                        1 || 2018-06-04 | A    | A6   |        0 |                        3 || 2018-06-01 | B    | B1   |        0 |                        1 || 2018-06-02 | B    | B2   |        0 |                        2 || 2018-06-03 | B    | B3   |        0 |                        3 |+------------+------+------+----------+--------------------------+",what = "", sep = "", comment.char = "+", quiet = TRUE)## it is clear that there are 5 columnsmat <- matrix(md_table[md_table != "|"], ncol = 5, byrow = TRUE)#      [,1]         [,2]   [,3]   [,4]       [,5]                      # [1,] "Date""Emp1""Case""Priority""PriorityCountinLast7days"# [2,] "2018-06-01""A""A1""0""0"# [3,] "2018-06-03""A""A2""0""1"# [4,] "2018-06-03""A""A3""0""2"# [5,] "2018-06-03""A""A4""1""1"# [6,] "2018-06-03""A""A5""2""1"# [7,] "2018-06-04""A""A6""0""3"# [8,] "2018-06-01""B""B1""0""1"# [9,] "2018-06-02""B""B2""0""2"#[10,] "2018-06-03""B""B3""0""3"

## a data frame with all character columnsdat <- setNames(data.frame(mat[-1, ], stringsAsFactors = FALSE), mat[1, ])#        Date Emp1 Case Priority PriorityCountinLast7days#1 2018-06-01    A   A1        0                        0#2 2018-06-03    A   A2        0                        1#3 2018-06-03    A   A3        0                        2#4 2018-06-03    A   A4        1                        1#5 2018-06-03    A   A5        2                        1#6 2018-06-04    A   A6        0                        3#7 2018-06-01    B   B1        0                        1#8 2018-06-02    B   B2        0                        2#9 2018-06-03    B   B3        0                        3

## or maybe just use `type.convert` on some columns?dat[] <- lapply(dat, type.convert)

Viewing all articles
Browse latest Browse all 7

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>