Sunday, January 23, 2011

STATA: Regular expressions

A regular expression allows you to do a moderately fancy search (and replace if you want). So say you wanted to replace all the "Dennis"s in a variable with "Awesome"s, but only if they're at the end of the line. You could try:
-replace PBFnamevar = regexr(PBFnamevar,"Dennis$","Awesome")-
You could also replace any character, or just capitals, or just digits...there are lots of possibilities:
http://www.stata.com/support/faqs/data/regex.html

You can also use it for locals:
-local strata = regexr("agecat","age")-

Or -if- commands:
if regexm("`strata'","age") {
}

On a related note (although not actually regular expressions), say that you've got a string variable that consists of a bunch of what should be separate variables, only lumped all into one, separated by a semicolon (e.g. a row might look like "1;15.2;89;hi;21"). Try -split-:
-split textvar, gen(newtextvars) parse(";")-

I should note that Stata's regular expressions are wimpy compared to what other languages support. R supports PERL regular expressions, which can do so many things it's scary.

No comments:

Post a Comment