Address Standardization
You can standardize a single address, or addresses in a field in a view:
| Standardize() | Converts an address string into a normalized form for address matching, according to a file of transformation rules |
| StandardizeView() | Standardizes address strings in a view and writes the results to a table |
Parsing Addresses with Regular Expressions
You can also use regular expressions to parse street addresses into components. The RegEx class lets you extract components from an input string and puts them into an option array. For example, the following script creates a RegEx object, sets a delimiter, creates fields for a number and for a set of words, and defines a rule to parse an address:
// Create a RegEx gisdk object.
rx = CreateObject("RegEx")
// Specify the word delimiter
rx.Delimiters(" ")
// Define a number is a sequence of 1 or more digits
rx.Field("number","[0-9]+")
// Define a word is any other character, and words as at least one word
rx.Field("words","[a-z ]+")
// Create a match rule for a number followed by one or more words.
// The number will be stored in the output option called STDNUMBER,
// and the words in the option called STDNAME.
rx.Rule("$number:(STDNUMBER) $words:(STDNAME)")
You can apply the rule to one address:
parsed = rx.Match("139 main street")
ShowArray(parsed)
The options array contains:
parsed.STDNUMBER = "139"
parsed.STDNAME = "MAIN STREET"
Note that the value of the STDNAME option is capitalized. The Match method also accepts an array of address strings for batch processing.
Delimiters(delimiters)
Sets the characters in string as delimiters.
| Arguments: | string – any string |
| Example: | rx.Delimiters(" \n") sets a space or a newline character as delimiters |
ReplaceChar(list_of_characters)
ReplaceWith(replacement_characters)
Replace all of the characters listed in the string list_of_characters with the characters listed in the string replacement_characters before matching the input expression.
| Arguments: | list_of_characters – a string with the characters to replace |
| replacement_characters – a string with the replacement characters | |
| Example: | rx.ReplaceChar("à áâãåäæéèêëÃîïóöôõœùûúüçñ-") lists the characters to replace |
| rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ") gives the non-accented replacements for those characters |
Field(name, pattern[, replacement])
Defines the field name based on the regular expression pattern.
| Arguments: | name – a string with the field name |
| pattern – a string with the regular expression for the field | |
| replacement – optional, a string with the replacement for name | |
| Examples: | rx.Field("number","[0-9]+") defines the field number as one or more digits |
| rx.Field("words","[a-z ]+") defines the field words as one or more sets of one or more letters |
Rule(pattern)
Creates a matching rule using the specified regular expression pattern.
| Arguments: | pattern – a string with the regular expression for the rule |
| Returned value: | An options array with the defined options |
| Example: | rx.Rule("$number:(STDNUMBER) $words:(STDNAME)") defines a rule that puts the number into the STDNUMBER option and the words into the STDNAME option |
GetRules(recompile)
Gets the delimiters, fields, and rules for the object.
| Arguments: | recompile – a Boolean value, True to recompile the rules |
| Returned value: | An array with the delimiters, fields, and rules |
| Example: | rx.GetRules() returns the current delimiters, fields, and rules for the rx object |
Match(strings[, option])
Parses strings based on the rules.
| Arguments: | strings – one string or an array of strings to parse |
| option – optional, to return just a string with the result of the rule option | |
| Returned value: | An options array with the defined options, or a string with the result of the rule |
| Examples: | rx.Match("139 main street") returns "139" in the STDNUMBER option and "MAIN STREET" in the STDNAME option; rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)") and then result = rx.Match("144 Mason Terrace","name") returns "MASON TERRACE". |
MatchView(view_set, input_fields, id_field, output_bin_file)
Creates a matching rule using the specified regular expression pattern.
| Arguments: | view_set – a string with the view and set |
| input_fields – an array of strings with the address field name(s) | |
| id_field – a string with the ID field name | |
| output_bin_file – a string with the output BIN file name | |
| Returned value: | A string with the name of the view created from the output BIN file |
| Example: | rx.MatchView(GetView()+"|",{ "ADDRESS" },"ID","standardized.bin") uses all the records in the current view, parses the ADDRESS field into STDNUMBER and STDNAME, uses the ID field as the ID, and saves the result into the table standardized.bin |
Recompile()
Runs the GetRules method with recompile = True.
| Returned value: | An array with the delimiters, fields, and rules |
| Example: | rx.Recompile() recompiles and returns the current delimiters, fields, and rules for the rx object |
Here is a complete example:
rx = CreateObject("RegEx")
rx.Delimiters(". ;:?[]*=()#,%!~\"+{}")
rx.ReplaceChar("à áâãåäæéèêëÃîïóöôõœùûúüçñ-"")
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ")
rx.Field("ave/","ave|avenue")
rx.Field("rd/","rd|road")
rx.Field("st/","st|street")
rx.Field("number","[0-9]+")
rx.Field("words","[a-z ]+")
rx.Field("sttype","$ave|$rd|$st")
rx.Rule("$number:(STDNUMBER) $words:(STDNAME) $sttype:(STDNAME)")
rx.Rule("$number:(STDNUMBER) $sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$words:(STDNAME) $sttype:(STDNAME)")
ShowArray(rx.GetRules())
sentences = {"à áâãåäæéèêëÃî avenue","123 à áâãåäæéèêëÃî avenue","132 avenue of the americas","123 a avenue"}
result = rx.Match(sentences)
ShowArray({sentences,result})