Testing Netlify hosting with R Markdown

This report was generated on 2020-12-30, as a demo of textclean from https://github.com/trinker/textclean#check-text

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Fast example with created examples

x <- c("i like", "<p>i want. </p>. thet them ther .", "I am ! that|", "", NA, 
    "&quot;they&quot; they,were there", ".", "   ", "?", "3;", "I like goud eggs!", 
    "bi\xdfchen Z\xfcrcher", "i 4like...", "\\tgreat",  "She said \"yes\"")
Encoding(x) <- "latin1"
x <- as.factor(x)
check_text(x)
## 
## =============
## NON CHARACTER
## =============
## 
## The text variable is not a character column (likely `factor`):
## 
## 
## *Suggestion: Consider using `as.character` or `stringsAsFactors = FALSE` when reading in
##              Also, consider rerunning `check_text` after fixing
## 
## 
## =====
## DIGIT
## =====
## 
## The following observations contain digits/numbers:
## 
## 10, 13
## 
## This issue affected the following text:
## 
## 10: 3;
## 13: i 4like...
## 
## *Suggestion: Consider using `replace_number`
## 
## 
## ========
## EMOTICON
## ========
## 
## The following observations contain emoticons:
## 
## 6
## 
## This issue affected the following text:
## 
## 6: &quot;they&quot; they,were there
## 
## *Suggestion: Consider using `replace_emoticons`
## 
## 
## =====
## EMPTY
## =====
## 
## The following observations contain empty text cells (all white space):
## 
## 1
## 
## This issue affected the following text:
## 
## 1: i like
## 
## *Suggestion: Consider running `drop_empty_row`
## 
## 
## =======
## ESCAPED
## =======
## 
## The following observations contain escaped back spaced characters:
## 
## 14
## 
## This issue affected the following text:
## 
## 14: \tgreat
## 
## *Suggestion: Consider using `replace_white`
## 
## 
## ====
## HTML
## ====
## 
## The following observations contain HTML markup:
## 
## 2, 6
## 
## This issue affected the following text:
## 
## 2: <p>i want. </p>. thet them ther .
## 6: &quot;they&quot; they,were there
## 
## *Suggestion: Consider running `replace_html`
## 
## 
## ==========
## INCOMPLETE
## ==========
## 
## The following observations contain incomplete sentences (e.g., uses ending punctuation like '...'):
## 
## 13
## 
## This issue affected the following text:
## 
## 13: i 4like...
## 
## *Suggestion: Consider using `replace_incomplete`
## 
## 
## =============
## MISSING VALUE
## =============
## 
## The following observations contain missing values:
## 
## 5
## 
## *Suggestion: Consider running `drop_NA`
## 
## 
## ========
## NO ALPHA
## ========
## 
## The following observations contain elements with no alphabetic (a-z) letters:
## 
## 4, 7, 8, 9, 10
## 
## This issue affected the following text:
## 
## 4: 
## 7: .
## 8:    
## 9: ?
## 10: 3;
## 
## *Suggestion: Consider cleaning the raw text or running `filter_row`
## 
## 
## ==========
## NO ENDMARK
## ==========
## 
## The following observations contain elements with missing ending punctuation:
## 
## 1, 3, 4, 6, 8, 10, 12, 14, 15
## 
## This issue affected the following text:
## 
## 1: i like
## 3: I am ! that|
## 4: 
## 6: &quot;they&quot; they,were there
## 8:    
## 10: 3;
## 12: bißchen Zürcher
## 14: \tgreat
## 15: She said "yes"
## 
## *Suggestion: Consider cleaning the raw text or running `add_missing_endmark`
## 
## 
## ====================
## NO SPACE AFTER COMMA
## ====================
## 
## The following observations contain commas with no space afterwards:
## 
## 6
## 
## This issue affected the following text:
## 
## 6: &quot;they&quot; they,were there
## 
## *Suggestion: Consider running `add_comma_space`
## 
## 
## =========
## NON ASCII
## =========
## 
## The following observations contain non-ASCII text:
## 
## 12
## 
## This issue affected the following text:
## 
## 12: bißchen Zürcher
## 
## *Suggestion: Consider running `replace_non_ascii`
## 
## 
## ==================
## NON SPLIT SENTENCE
## ==================
## 
## The following observations contain unsplit sentences (more than one sentence per element):
## 
## 2, 3
## 
## This issue affected the following text:
## 
## 2: <p>i want. </p>. thet them ther .
## 3: I am ! that|
## 
## *Suggestion: Consider running `textshape::split_sentence`

And if all is well the user should be greeted by a cow:

y <- c("A valid sentence.", "yet another!")
check_text(y)
## 
##  ------------- 
## No problems found!
## This text is virtuosic! 
##  ---------------- 
##   \   ^__^ 
##    \  (oo)\ ________ 
##       (__)\         )\ /\ 
##            ||------w|
##            ||      ||

Row Filtering

It is useful to drop/remove empty rows or unwanted rows (for example the researcher dialogue from a transcript). The drop_empty_row & drop_row do empty row do just this. First I’ll demo the removal of empty rows.

## create a data set wit empty rows
(dat <- rbind.data.frame(DATA[, c(1, 4)], matrix(rep(" ", 4), 
    ncol =2, dimnames=list(12:13, colnames(DATA)[c(1, 4)]))))
##        person                                 state
## 1         sam         Computer is fun. Not too fun.
## 2        greg               No it's not, it's dumb.
## 3     teacher                    What should we do?
## 4         sam                  You liar, it stinks!
## 5        greg               I am telling the truth!
## 6       sally                How can we be certain?
## 7        greg                      There is no way.
## 8         sam                       I distrust you.
## 9       sally           What are you talking about?
## 10 researcher         Shall we move on?  Good then.
## 11       greg I'm hungry.  Let's eat.  You already?
## 12                                                 
## 13
drop_empty_row(dat)
##        person                                 state
## 1         sam         Computer is fun. Not too fun.
## 2        greg               No it's not, it's dumb.
## 3     teacher                    What should we do?
## 4         sam                  You liar, it stinks!
## 5        greg               I am telling the truth!
## 6       sally                How can we be certain?
## 7        greg                      There is no way.
## 8         sam                       I distrust you.
## 9       sally           What are you talking about?
## 10 researcher         Shall we move on?  Good then.
## 11       greg I'm hungry.  Let's eat.  You already?

Next we drop out rows. The drop_row function takes a data set, a column (named or numeric position) and regex terms to search for. The terms argument takes regex(es) allowing for partial matching. terms is case sensitive but can be changed via the ignore.case argument.

drop_row(dataframe = DATA, column = "person", terms = c("sam", "greg"))
##       person sex adult                         state code
## 1    teacher   m     1            What should we do?   K3
## 2      sally   f     0        How can we be certain?   K6
## 3      sally   f     0   What are you talking about?   K9
## 4 researcher   f     1 Shall we move on?  Good then.  K10
drop_row(DATA, 1, c("sam", "greg"))
##       person sex adult                         state code
## 1    teacher   m     1            What should we do?   K3
## 2      sally   f     0        How can we be certain?   K6
## 3      sally   f     0   What are you talking about?   K9
## 4 researcher   f     1 Shall we move on?  Good then.  K10
keep_row(DATA, 1, c("sam", "greg"))
##   person sex adult                                 state code
## 1    sam   m     0         Computer is fun. Not too fun.   K1
## 2   greg   m     0               No it's not, it's dumb.   K2
## 3    sam   m     0                  You liar, it stinks!   K4
## 4   greg   m     0               I am telling the truth!   K5
## 5   greg   m     0                      There is no way.   K7
## 6    sam   m     0                       I distrust you.   K8
## 7   greg   m     0 I'm hungry.  Let's eat.  You already?  K11
drop_row(DATA, "state", c("Comp"))
##        person sex adult                                 state code
## 1        greg   m     0               No it's not, it's dumb.   K2
## 2     teacher   m     1                    What should we do?   K3
## 3         sam   m     0                  You liar, it stinks!   K4
## 4        greg   m     0               I am telling the truth!   K5
## 5       sally   f     0                How can we be certain?   K6
## 6        greg   m     0                      There is no way.   K7
## 7         sam   m     0                       I distrust you.   K8
## 8       sally   f     0           What are you talking about?   K9
## 9  researcher   f     1         Shall we move on?  Good then.  K10
## 10       greg   m     0 I'm hungry.  Let's eat.  You already?  K11
drop_row(DATA, "state", c("I "))
##       person sex adult                                 state code
## 1        sam   m     0         Computer is fun. Not too fun.   K1
## 2       greg   m     0               No it's not, it's dumb.   K2
## 3    teacher   m     1                    What should we do?   K3
## 4        sam   m     0                  You liar, it stinks!   K4
## 5      sally   f     0                How can we be certain?   K6
## 6       greg   m     0                      There is no way.   K7
## 7      sally   f     0           What are you talking about?   K9
## 8 researcher   f     1         Shall we move on?  Good then.  K10
## 9       greg   m     0 I'm hungry.  Let's eat.  You already?  K11
drop_row(DATA, "state", c("you"), ignore.case = TRUE)
##       person sex adult                         state code
## 1        sam   m     0 Computer is fun. Not too fun.   K1
## 2       greg   m     0       No it's not, it's dumb.   K2
## 3    teacher   m     1            What should we do?   K3
## 4       greg   m     0       I am telling the truth!   K5
## 5      sally   f     0        How can we be certain?   K6
## 6       greg   m     0              There is no way.   K7
## 7 researcher   f     1 Shall we move on?  Good then.  K10
Ian T. Adams, Ph.D.
Ian T. Adams, Ph.D.
Assistant Professor, Department of Criminology & Criminal Justice

My research interests include human capital in criminal justice, policing, and criminal justice policy.

Related