Testing Netlify hosting with R Markdown
This report was generated on 2020-12-30, as a demo of textclean
from https://github.com/trinker/textclean#check-text
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Fast example with created examples
x <- c("i like", "<p>i want. </p>. thet them ther .", "I am ! that|", "", NA,
""they" they,were there", ".", " ", "?", "3;", "I like goud eggs!",
"bi\xdfchen Z\xfcrcher", "i 4like...", "\\tgreat", "She said \"yes\"")
Encoding(x) <- "latin1"
x <- as.factor(x)
check_text(x)
##
## =============
## NON CHARACTER
## =============
##
## The text variable is not a character column (likely `factor`):
##
##
## *Suggestion: Consider using `as.character` or `stringsAsFactors = FALSE` when reading in
## Also, consider rerunning `check_text` after fixing
##
##
## =====
## DIGIT
## =====
##
## The following observations contain digits/numbers:
##
## 10, 13
##
## This issue affected the following text:
##
## 10: 3;
## 13: i 4like...
##
## *Suggestion: Consider using `replace_number`
##
##
## ========
## EMOTICON
## ========
##
## The following observations contain emoticons:
##
## 6
##
## This issue affected the following text:
##
## 6: "they" they,were there
##
## *Suggestion: Consider using `replace_emoticons`
##
##
## =====
## EMPTY
## =====
##
## The following observations contain empty text cells (all white space):
##
## 1
##
## This issue affected the following text:
##
## 1: i like
##
## *Suggestion: Consider running `drop_empty_row`
##
##
## =======
## ESCAPED
## =======
##
## The following observations contain escaped back spaced characters:
##
## 14
##
## This issue affected the following text:
##
## 14: \tgreat
##
## *Suggestion: Consider using `replace_white`
##
##
## ====
## HTML
## ====
##
## The following observations contain HTML markup:
##
## 2, 6
##
## This issue affected the following text:
##
## 2: <p>i want. </p>. thet them ther .
## 6: "they" they,were there
##
## *Suggestion: Consider running `replace_html`
##
##
## ==========
## INCOMPLETE
## ==========
##
## The following observations contain incomplete sentences (e.g., uses ending punctuation like '...'):
##
## 13
##
## This issue affected the following text:
##
## 13: i 4like...
##
## *Suggestion: Consider using `replace_incomplete`
##
##
## =============
## MISSING VALUE
## =============
##
## The following observations contain missing values:
##
## 5
##
## *Suggestion: Consider running `drop_NA`
##
##
## ========
## NO ALPHA
## ========
##
## The following observations contain elements with no alphabetic (a-z) letters:
##
## 4, 7, 8, 9, 10
##
## This issue affected the following text:
##
## 4:
## 7: .
## 8:
## 9: ?
## 10: 3;
##
## *Suggestion: Consider cleaning the raw text or running `filter_row`
##
##
## ==========
## NO ENDMARK
## ==========
##
## The following observations contain elements with missing ending punctuation:
##
## 1, 3, 4, 6, 8, 10, 12, 14, 15
##
## This issue affected the following text:
##
## 1: i like
## 3: I am ! that|
## 4:
## 6: "they" they,were there
## 8:
## 10: 3;
## 12: bißchen Zürcher
## 14: \tgreat
## 15: She said "yes"
##
## *Suggestion: Consider cleaning the raw text or running `add_missing_endmark`
##
##
## ====================
## NO SPACE AFTER COMMA
## ====================
##
## The following observations contain commas with no space afterwards:
##
## 6
##
## This issue affected the following text:
##
## 6: "they" they,were there
##
## *Suggestion: Consider running `add_comma_space`
##
##
## =========
## NON ASCII
## =========
##
## The following observations contain non-ASCII text:
##
## 12
##
## This issue affected the following text:
##
## 12: bißchen Zürcher
##
## *Suggestion: Consider running `replace_non_ascii`
##
##
## ==================
## NON SPLIT SENTENCE
## ==================
##
## The following observations contain unsplit sentences (more than one sentence per element):
##
## 2, 3
##
## This issue affected the following text:
##
## 2: <p>i want. </p>. thet them ther .
## 3: I am ! that|
##
## *Suggestion: Consider running `textshape::split_sentence`
And if all is well the user should be greeted by a cow:
y <- c("A valid sentence.", "yet another!")
check_text(y)
##
## -------------
## No problems found!
## This text is virtuosic!
## ----------------
## \ ^__^
## \ (oo)\ ________
## (__)\ )\ /\
## ||------w|
## || ||
Row Filtering
It is useful to drop/remove empty rows or unwanted rows (for example the researcher dialogue from a transcript). The drop_empty_row
& drop_row
do empty row do just this. First I’ll demo the removal of empty rows.
## create a data set wit empty rows
(dat <- rbind.data.frame(DATA[, c(1, 4)], matrix(rep(" ", 4),
ncol =2, dimnames=list(12:13, colnames(DATA)[c(1, 4)]))))
## person state
## 1 sam Computer is fun. Not too fun.
## 2 greg No it's not, it's dumb.
## 3 teacher What should we do?
## 4 sam You liar, it stinks!
## 5 greg I am telling the truth!
## 6 sally How can we be certain?
## 7 greg There is no way.
## 8 sam I distrust you.
## 9 sally What are you talking about?
## 10 researcher Shall we move on? Good then.
## 11 greg I'm hungry. Let's eat. You already?
## 12
## 13
drop_empty_row(dat)
## person state
## 1 sam Computer is fun. Not too fun.
## 2 greg No it's not, it's dumb.
## 3 teacher What should we do?
## 4 sam You liar, it stinks!
## 5 greg I am telling the truth!
## 6 sally How can we be certain?
## 7 greg There is no way.
## 8 sam I distrust you.
## 9 sally What are you talking about?
## 10 researcher Shall we move on? Good then.
## 11 greg I'm hungry. Let's eat. You already?
Next we drop out rows. The drop_row
function takes a data set, a column (named or numeric position) and regex terms to search for. The terms
argument takes regex(es) allowing for partial matching. terms
is case sensitive but can be changed via the ignore.case
argument.
drop_row(dataframe = DATA, column = "person", terms = c("sam", "greg"))
## person sex adult state code
## 1 teacher m 1 What should we do? K3
## 2 sally f 0 How can we be certain? K6
## 3 sally f 0 What are you talking about? K9
## 4 researcher f 1 Shall we move on? Good then. K10
drop_row(DATA, 1, c("sam", "greg"))
## person sex adult state code
## 1 teacher m 1 What should we do? K3
## 2 sally f 0 How can we be certain? K6
## 3 sally f 0 What are you talking about? K9
## 4 researcher f 1 Shall we move on? Good then. K10
keep_row(DATA, 1, c("sam", "greg"))
## person sex adult state code
## 1 sam m 0 Computer is fun. Not too fun. K1
## 2 greg m 0 No it's not, it's dumb. K2
## 3 sam m 0 You liar, it stinks! K4
## 4 greg m 0 I am telling the truth! K5
## 5 greg m 0 There is no way. K7
## 6 sam m 0 I distrust you. K8
## 7 greg m 0 I'm hungry. Let's eat. You already? K11
drop_row(DATA, "state", c("Comp"))
## person sex adult state code
## 1 greg m 0 No it's not, it's dumb. K2
## 2 teacher m 1 What should we do? K3
## 3 sam m 0 You liar, it stinks! K4
## 4 greg m 0 I am telling the truth! K5
## 5 sally f 0 How can we be certain? K6
## 6 greg m 0 There is no way. K7
## 7 sam m 0 I distrust you. K8
## 8 sally f 0 What are you talking about? K9
## 9 researcher f 1 Shall we move on? Good then. K10
## 10 greg m 0 I'm hungry. Let's eat. You already? K11
drop_row(DATA, "state", c("I "))
## person sex adult state code
## 1 sam m 0 Computer is fun. Not too fun. K1
## 2 greg m 0 No it's not, it's dumb. K2
## 3 teacher m 1 What should we do? K3
## 4 sam m 0 You liar, it stinks! K4
## 5 sally f 0 How can we be certain? K6
## 6 greg m 0 There is no way. K7
## 7 sally f 0 What are you talking about? K9
## 8 researcher f 1 Shall we move on? Good then. K10
## 9 greg m 0 I'm hungry. Let's eat. You already? K11
drop_row(DATA, "state", c("you"), ignore.case = TRUE)
## person sex adult state code
## 1 sam m 0 Computer is fun. Not too fun. K1
## 2 greg m 0 No it's not, it's dumb. K2
## 3 teacher m 1 What should we do? K3
## 4 greg m 0 I am telling the truth! K5
## 5 sally f 0 How can we be certain? K6
## 6 greg m 0 There is no way. K7
## 7 researcher f 1 Shall we move on? Good then. K10