The short answer to the question is yes, R code can solve that mess and no, it doesn't take that much trouble.
The first step after copying & pasting the table into an R session is to read it in with read.table
setting the header
, sep
, comment.char
and strip.white
arguments.
Credits for reminding me of arguments comment.char
and strip.white
go to @nicola, and his comment.
dat <- read.table(text = "+------------+------+------+----------+--------------------------+| Date | Emp1 | Case | Priority | PriorityCountinLast7days |+------------+------+------+----------+--------------------------+| 2018-06-01 | A | A1 | 0 | 0 || 2018-06-03 | A | A2 | 0 | 1 || 2018-06-03 | A | A3 | 0 | 2 || 2018-06-03 | A | A4 | 1 | 1 || 2018-06-03 | A | A5 | 2 | 1 || 2018-06-04 | A | A6 | 0 | 3 || 2018-06-01 | B | B1 | 0 | 1 || 2018-06-02 | B | B2 | 0 | 2 || 2018-06-03 | B | B3 | 0 | 3 |+------------+------+------+----------+--------------------------+", header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE)
But as you can see there are some issues with the result.
dat X Date Emp1 Case Priority PriorityCountinLast7days X.11 NA 2018-06-01 A A1 0 0 NA2 NA 2018-06-03 A A2 0 1 NA3 NA 2018-06-03 A A3 0 2 NA4 NA 2018-06-03 A A4 1 1 NA5 NA 2018-06-03 A A5 2 1 NA6 NA 2018-06-04 A A6 0 3 NA7 NA 2018-06-01 B B1 0 1 NA8 NA 2018-06-02 B B2 0 2 NA9 NA 2018-06-03 B B3 0 3 NA
To have separators start and end each data row made R believe those separators mark extra columns, which is not what is meant by the original question's OP.
So the second step is to keep only the real columns. I will do this subsetting the columns by their numbers, easily done, they usually are the first and last columns.
dat <- dat[-c(1, ncol(dat))]dat Date Emp1 Case Priority PriorityCountinLast7days1 2018-06-01 A A1 0 02 2018-06-03 A A2 0 13 2018-06-03 A A3 0 24 2018-06-03 A A4 1 15 2018-06-03 A A5 2 16 2018-06-04 A A6 0 37 2018-06-01 B B1 0 18 2018-06-02 B B2 0 29 2018-06-03 B B3 0 3
That wasn't too hard, much better.
In this case there is still a problem, to coerce column Date
to class Date
.
dat$Date <- as.Date(dat$Date)
And the result is satisfactory.
str(dat)'data.frame': 9 obs. of 5 variables: $ Date : Date, format: "2018-06-01""2018-06-03" ... $ Emp1 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2 $ Case : Factor w/ 9 levels "A1","A2","A3",..: 1 2 3 4 5 6 7 8 9 $ Priority : int 0 0 0 1 2 0 0 0 0 $ PriorityCountinLast7days: int 0 1 2 1 1 3 1 2 3
Note that I have not set the more or less standard argument stringsAsFactors = FALSE
. If needed, this should be done when running read.table
.
The whole process took only 3 lines of base R code.
Finally, the end result in dput
format, like it should be in the first place.
dat <-structure(list(Date = structure(c(17683, 17685, 17685, 17685, 17685, 17686, 17683, 17684, 17685), class = "Date"), Emp1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B"), Case = c("A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3"), Priority = c(0, 0, 0, 1, 2, 0, 0, 0, 0), PriorityCountinLast7days = c(0, 1, 2, 1, 1, 3, 1, 2, 3)), row.names = c(NA, -9L), class = "data.frame")