Address Standardization

You can standardize a single address, or addresses in a field in a view:

Standardize() Converts an address string into a normalized form for address matching, according to a file of transformation rules
StandardizeView() Standardizes address strings in a view and writes the results to a table

Parsing Addresses with Regular Expressions

You can also use regular expressions to parse street addresses into components. The RegEx class lets you extract components from an input string and puts them into an option array. For example, the following script creates a RegEx object, sets a delimiter, creates fields for a number and for a set of words, and defines a rule to parse an address:

// Create a RegEx gisdk object.
rx = CreateObject("RegEx")
// Specify the word delimiter
rx.Delimiters(" ")
// Define a number is a sequence of 1 or more digits
rx.Field("number","[0-9]+")
// Define a word is any other character, and words as at least one word
rx.Field("words","[a-z ]+")
// Create a match rule for a number followed by one or more words.
// The number will be stored in the output option called STDNUMBER,
// and the words in the option called STDNAME.
rx.Rule("$number:(STDNUMBER) $words:(STDNAME)")

You can apply the rule to one address:

parsed = rx.Match("139 main street")
ShowArray(parsed)

The options array contains:

parsed.STDNUMBER = "139"
parsed.STDNAME = "MAIN STREET"

Note that the value of the STDNAME option is capitalized. The Match method also accepts an array of address strings for batch processing.

Delimiters(delimiters)

Sets the characters in string as delimiters.

Arguments: string – any string
Example: rx.Delimiters(" \n") sets a space or a newline character as delimiters

ReplaceChar(list_of_characters)

ReplaceWith(replacement_characters)

Replace all of the characters listed in the string list_of_characters with the characters listed in the string replacement_characters before matching the input expression.

Arguments: list_of_characters – a string with the characters to replace
replacement_characters – a string with the replacement characters
Example: rx.ReplaceChar("à áâãåäæéèêëíîïóöôõœùûúüçñ-") lists the characters to replace
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ") gives the non-accented replacements for those characters

Field(name, pattern[, replacement])

Defines the field name based on the regular expression pattern.

Arguments: name – a string with the field name
pattern – a string with the regular expression for the field
replacement – optional, a string with the replacement for name
Examples: rx.Field("number","[0-9]+") defines the field number as one or more digits
rx.Field("words","[a-z ]+") defines the field words as one or more sets of one or more letters

Rule(pattern)

Creates a matching rule using the specified regular expression pattern.

Arguments: pattern – a string with the regular expression for the rule
Returned value: An options array with the defined options
Example: rx.Rule("$number:(STDNUMBER) $words:(STDNAME)") defines a rule that puts the number into the STDNUMBER option and the words into the STDNAME option

GetRules(recompile)

Gets the delimiters, fields, and rules for the object.

Arguments: recompile – a Boolean value, True to recompile the rules
Returned value: An array with the delimiters, fields, and rules
Example: rx.GetRules() returns the current delimiters, fields, and rules for the rx object

Match(strings[, option])

Parses strings based on the rules.

Arguments: strings – one string or an array of strings to parse
option – optional, to return just a string with the result of the rule option
Returned value: An options array with the defined options, or a string with the result of the rule
Examples: rx.Match("139 main street") returns "139" in the STDNUMBER option and "MAIN STREET" in the STDNAME option; rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)") and then result = rx.Match("144 Mason Terrace","name") returns "MASON TERRACE".

MatchView(view_set, input_fields, id_field, output_bin_file)

Creates a matching rule using the specified regular expression pattern.

Arguments: view_set – a string with the view and set
input_fields – an array of strings with the address field name(s)
id_field – a string with the ID field name
output_bin_file – a string with the output BIN file name
Returned value: A string with the name of the view created from the output BIN file
Example: rx.MatchView(GetView()+"|",{ "ADDRESS" },"ID","standardized.bin") uses all the records in the current view, parses the ADDRESS field into STDNUMBER and STDNAME, uses the ID field as the ID, and saves the result into the table standardized.bin

Recompile()

Runs the GetRules method with recompile = True.

Returned value: An array with the delimiters, fields, and rules
Example: rx.Recompile() recompiles and returns the current delimiters, fields, and rules for the rx object

Here is a complete example:

rx = CreateObject("RegEx")
rx.Delimiters(". ;:?[]*=()#,%!~\"+{}")
rx.ReplaceChar("à áâãåäæéèêëíîïóöôõœùûúüçñ-"")
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ")
rx.Field("ave/","ave|avenue")
rx.Field("rd/","rd|road")
rx.Field("st/","st|street")
rx.Field("number","[0-9]+")
rx.Field("words","[a-z ]+")
rx.Field("sttype","$ave|$rd|$st")
rx.Rule("$number:(STDNUMBER) $words:(STDNAME) $sttype:(STDNAME)")
rx.Rule("$number:(STDNUMBER) $sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$words:(STDNAME) $sttype:(STDNAME)")
ShowArray(rx.GetRules())
sentences = {"à áâãåäæéèêëíî avenue","123 à áâãåäæéèêëíî avenue","132 avenue of the americas","123 a avenue"}
result = rx.Match(sentences)
ShowArray({sentences,result})