regex - Data frame column vector manipulation -


i have dataframe mydf:

                content    term     1 search term: abc|    na     2 search term-xyz      na     3 search term-pqr|     na 

made regex:

\search term[:]?.?([a-za-z]+)\  

to terms abc xyz , pqr.

how extract these terms in term column. tried str_match , gsub, not getting correct results.

we can try sub

sub(".*(\\s+|-)", "", df1$content) #[1] "abc" "xyz" "pqr" 

or

library(stringr) str_extract(df1$content, "\\w+$") #[1] "abc" "xyz" "pqr" 

update

if | found in string @ end

gsub(".*(\\s+|-)|[^a-z]+$", "", df1$content) #[1] "abc" "xyz" "pqr" 

or

 str_extract(df1$content, "\\w+(?=(|[|])$)")  #[1] "abc" "xyz" "pqr" 

Comments

Popular posts from this blog

PySide and Qt Properties: Connecting signals from Python to QML -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -