Maptitude GISDK Help

Address Standardization

You can standardize a single address, or addresses in a field in a view:

 

GISDK Function Summary
Standardize() Converts an address string into a normalized form for address matching, according to a file of transformation rules
StandardizeView() Standardizes address strings in a view and writes the results to a table

 

Parsing Addresses with Regular Expressions

You can also use regular expressions to parse street addresses into components. The RegEx class lets you extract components from an input string and puts them into an option array. For example, the following script creates a RegEx object, sets a delimiter, creates fields for a number and for a set of words, and defines a rule to parse an address:

 

// Create a RegEx gisdk object.

rx = CreateObject("RegEx")

 

// Specify the word delimiter

rx.Delimiters(" ")

 

// Define a number is a sequence of 1 or more digits

rx.Field("number","[0-9]+")

// Define a word is any other character, and words as at least one word

rx.Field("words","[a-z ]+")

 

// Create a match rule for a number followed by one or more words.

// The number will be stored in the output option called STDNUMBER,

// and the words in the option called STDNAME.

rx.Rule("$number:(STDNUMBER) $words:(STDNAME)")

 

You can apply the rule to one address:

 

parsed = rx.Match("139 main street")

ShowArray(parsed)

 

The options array contains:

 

parsed.STDNUMBER = "139"

parsed.STDNAME = "MAIN STREET"

 

Note that the value of the STDNAME option is capitalized. You can also pass an array of address strings to the Match method.

 

If you want to extract only one component from an address, you can create a simple "one-rule" regular expression like this:

 

rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)")

 

Then you can match it and get just the name:

 

result = rx.Match("144 Mason Terrace","name")

 

The result is the string "MASON TERRACE".

 

You can use the MatchView method to apply the rule to a batch of addresses:

 

// Open Customer.dbf in the Tutorial folder before using the MatchView method.

// The output is a table with three columns: ID, STDNUMBER and STDNAME.

table = rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin")

CreateEditor(table, table+"|",,{{"Row Background", "True"}})

SetEditorOptionEx(,{{"Row Background Color", ColorRGB(59000, 59000, 59000)}})

RedrawEditor()

 

Fields and rules will be applied in the order in which they are declared. The first rule that match successfully is the one that will return the output options array or table.

 

The regular expression syntax is as follows:

 

Item Description
{} Group
< Beginning of line
> End of line
| Alternative; note that "abc|def" represents "abc" or "def" while "ab{c|d}ef" represents "abcef" or "abdef"
* Zero or more of previous match
+ One or more of previous match
? Zero or one of previous match
. Any single char
[ ] Charset; e.g. [0-9] is all digits and [a-z] is all letters
[~] Not charset
\n Newline
\ Escape
$ Field starter
: Field assignment
() Field name; e.g., {[0-9]+}:(num) will assign any number to num

 

The RegEx object has the following methods; the examples assume the object rx has been created:

 

Delimiters(delimiters)

Description: Sets the characters instringas delimiters.
Arguments: string– any string
Example: rx.Delimiters(" /n") sets a space or a newline character as delimiters

 

ReplaceChar(list_of_characters)

ReplaceWith(replacement_characters)

Description: Replace all of the characters listed in the stringlist_of_characterswith the characters listed in the stringreplacement_charactersbefore matching the input expression.
Arguments: list_of_characters– a string with the characters to replace
replacement_characters– a string with the replacement characters
Example: rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-") lists the characters to replace
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ") gives the non-accented replacements for those characters

 

Field(name, pattern[, replacement]

Description: Defines the fieldnamebased on the regular expressionpattern.
Arguments: name– a string with the field name
pattern– a string with the regular expression for the field
replacement– optional, a string with the replacement forname
Examples: rx.Field("number","[0-9]+") defines the field number as one or more digits
rx.Field("words","[a-z ]+") defines the field words as one or more sets of one or more letters

Rule(pattern)

Description: Defines a rule based on the regular expressionpattern.
Arguments: pattern– a string with the regular expression for the rule
Returned value: An options array with the defined options
Example: rx.Rule("$number:(STDNUMBER) $words:(STDNAME)") defines a rule that puts the number into the STDNUMBER option and the words into the STDNAME option

 

GetRules(recompile)

Description: Gets the delimiters, fields, and rules for the object.
Arguments: recompile– a Boolean value, True to recompile the rules
Returned value: An array with the delimiters, fields, and rules
Example: rx.GetRules() returns the current delimiters, fields, and rules for the rx object

 

Match(strings[, option])

Description: Parsesstringsbased on the rules.
Arguments: strings– one string or an array of strings to parse
option– optional, to return just a string with the result of the rule option
Returned value: An options array with the defined options, or a string with the result of the rule
Examples: rx.Match("139 main street") returns "139" in the STDNUMBER option and "MAIN STREET" in the STDNAME optionrx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)") and thenresult = rx.Match("144 Mason Terrace","name") returns "MASON TERRACE".

 

MatchView(view_set, input_fields, id_field, output_bin_file)

Description: Defines a rule based on the regular expressionpattern.
Arguments: view_set– a string with the view and set
input_fields– an array of strings with the address field name(s)
id_fields– a string with the ID field name
output_bin_file– a string with the output BIN file name
Returned value: A string with the name of the view created from the output BIN file
Example: rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin") uses all the records in the current view, parses the ADDRESS field into STDNUMBER and STDNAME, uses the ID field as the ID, and saves the result into the table standardized.bin

 

Recompile()

Description: Runs the GetRules method with recompile = True.
Returned value: An array with the delimiters, fields, and rules
Example: rx.Recompile() recompiles and returns the current delimiters, fields, and rules for the rx object

 

Here is a complete example:

 

rx = CreateObject("RegEx")

rx.Delimiters(". ;:?[]*=()#,%!~\"+{}")

rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-")

rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ")

rx.Field("ave/","ave|avenue")

rx.Field("rd/","rd|road")

rx.Field("st/","st|street")

rx.Field("number","[0-9]+")

rx.Field("words","[a-z ]+")

rx.Field("sttype","$ave|$rd|$st")

rx.Rule("$number:(STDNUMBER) $words:(STDNAME) $sttype:(STDNAME)")

rx.Rule("$number:(STDNUMBER) $sttype:(STDNAME) $words:(STDNAME)")

rx.Rule("$sttype:(STDNAME) $words:(STDNAME)")

rx.Rule("$words:(STDNAME) $sttype:(STDNAME)")

ShowArray(rx.GetRules())

sentences = {"àáâãåäæéèêëíî avenue","123 àáâãåäæéèêëíî avenue","132 avenue of the americas","123 a avenue"}

result = rx.Match(sentences)

ShowArray({sentences,result})

 

 

©2025 Caliper Corporation www.caliper.com