DataFrame Class

Summary

Allows tables and other data to be loaded into memory and manipulated. Designed to mimic components of R packages `dplyr` and `tidyr`

Constructor

DataFrame( [string or options_array table, array descriptions, array groups] )

Argument	Contents
table	Optional. If string, the filename of the table to be opened (only .csv or .bin are currently supported )If options array, each option name is the name of a field in the table, and the option values are an array of values. See example below. If null, an empty dataframe is created
descriptions	Optional. Provides descriptions for each column in the data frame. Names must match column names in the tableargumentThe descriptions will only be visible if written to a bin file
groups	Optional. Lists the groupings fields. See the group_by() method.

Example

tbl.a = {1, 3, 5}
tbl.b = {"a", "b", "c"}
desc.a = "This is a column of numbers."
desc.b = "This is a column of letters."
df = CreateObject("DataFrame" , tbl, desc)
df.view()

Methods

arrange( array fields )

Sorts a table based on a list of fields.

Argument	Contents
fields	A list of fields to sort by

Example

tbl.a = {1, 5,5, 3}
tbl.b = {"a", "c","b", "d"}
desc.a = "This is a column of numbers."
desc.b = "This is a column of letters."
df = CreateObject("DataFrame" , tbl, desc)
df.arrange({"a" ,"b"})
df.view()

bin_field( options array )

Creates a field of categories based on a continuous numeric field.

Option	Type	Contents
Name of the continuous field to be "binned"
bins	int or array	If int, the number of bins to create. The range of the in_field will be divided up evenly if array of integers. Each array element is the starting of a bin. The end of the last bin is assumed to be the max value in the field. e.g. {0, 1} is: 0 <= x < 1 1 <= x < [max number] .
labels	array	Optional. The names of the bins. If bins is a list, the array length must be 1 less the length f bins. If bins is a number, the array length must be the same as bins. If null, bins will be labelled 1-n

Example

tbl.a = {1, 2,3, 4}
tbl.b = {"a", "c","b", "d"}
desc.a = "This is a column of numbers."
desc.b = "This is a column of letters."
df = CreateObject("DataFrame" , tbl, desc)
df.bin_field({"in_field": "a", "bins":2, "labels":{"kkk","kkk2"}})
df.view()

bind_rows(DataFrame df )

Appends the rows of one data frame to another. Both data frames should have the same columns.

Argument	Contents
df	The data frame to be appended. It must have the same number of columns as the current dataframe

Example

tbl1.a = {1, 2,3, 4}
tbl1.b = {"a", "b","v", "d"}
df = CreateObject("DataFrame" , tbl1, desc)
tbl2.a = {4, 5,6, 7}
tbl2.b = {"m", "n","o", "p"}
df2 = CreateObject("DataFrame" , tbl2, desc)
df.bind_rows(df2)
df.view()

check()

Checks that the data frame is valid.

Returns

The data frame if check is successful, an error with a descriptive message otherwise

colnames(array options)

Either returns vector of all column names or sets all column names. Use rename() to change individual column names.

Option	Type	Contents
new_names	array or vector	Optional. A list of the new column name, one for each existing column in the data frame.
start	string	Optional. The name of the first column to be returned, defaults to first column
labels	string	Optional. The name of the first column to be returned, defaults to lastcolumn

Returns

An array of column names

Example

tbl1.ID = {1, 2,3, 4}
tbl1.SHOP_TYPE = {"Bakery", "Restaurant","Beauty Salon", "Sandy"}
tbl1.OwnerlastName = {"Smith", "Jones","Christie", "Good"}
df = CreateObject("DataFrame" , tbl1, desc)
df.colnames( { "new_names": {"ID", "category", "owner" }})
Showarray(df.colnames()) // show all the column names
Showarray(df.colnames({"start": "category"}) ) // Show colum names, start with category

coltypes()

Gets the column types

Returns

An array of column types Possible types returned are: short, long, double, string

colwidths(array col_names)

Gets the column types

Argument	Contents
col_names	Optional.

Returns

An array of column types Possible types returned are: short, long, double, string

copy()

Creates a complete copy of the data frame.

Example

// if you use new_df = old_df you simply get two variable names that point to the same object
// Instead, use:
new_df = old_df.copy()

filter(string query)

Applies a query to a table object.

Argument	Contents
query	A valid query string. (e.g. "ID = 5" , "Name = 'Sam'" , "id > 10 and size > 100" ). The "Select *where" clause as used in GISDK queries is optional

Example

tbl1.a = {1, 2,3, 4}
tbl1.b = {"a", "b","c", "d"}
df = CreateObject("DataFrame" , tbl1, desc)
tbl2.a = {4, 5,6, 7}
tbl2.b = {"m", "n","o", "p"}
df2 = CreateObject("DataFrame" , tbl2, desc)
df.filter( "a > 2")
df.view()

gather(array cols, string value, string or numeric fill)

Transforms data from wide to long format. Places the names of multiple columns into a single "key" column and places the values of those multiple columns into a single "value" column. Reverse of spread().

Argument	Contents
cols	A list of the fields to gather
key	The column whose values will become new column name
value	The column whose values will fill the new columns
fill	The string/number to fill into empty data cells of new columns

Example

tbl1.ID = {1, 2,3, 4}
tbl1.SHOPTYPE = {"Bakery", "Restaurant","Beauty Salon", "Grocery" }
tbl1.OwnerlastName = {"John", "Rita", "Paul", "Mary"}
df = CreateObject("DataFrame" , tbl1)
df.gather( {"ID", "Shoptype", "OwnerlastName"}, "key", "value" )
df.view()

read( string filename, [ array fields, options_array expr_vars] )

Reads a DataFrame from a file, either a CSV or a FFB (*.bin) based on the file extension.

Argument	Contents
filename	Full path of file. The file type is inferred from the extension
fields	Optional. Array of columns names to read. If null, all columns are read
value	Optional. Name/value pairs such that "{Name}" found in the file will be replaced with Value

read_bin( string filename, [ array fields, options_array expr_vars] )

Reads a DataFrame from a FFB (*.bin) file.

Argument	Contents
filename	Full path of file. The file type is inferred from the extension
fields	Optional. Array of columns names to read. If null, all columns are read
value	Optional. Name/value pairs such that "{Name}" found in the file will be replaced with Value

read_csv( string filename, [ array fields, options_array expr_vars] )

Reads a DataFrame from a CSV file.

Argument	Contents
filename	Full path of file. The file type is inferred from the extension
fields	Optional. Array of columns names to read. If null, all columns are read
value	Optional. Name/value pairs such that "{Name}" found in the file will be replaced with Value

read_view( string filename, [ string set, array fields, options_array expr_vars, string null_to_zero] )

Converts a view into a data frame. Useful if you want to specify a selection set or already have a view open.

Argument	Contents
filename	Full path of file. The file type is inferred from the extension
set	Optional. A set name
fields	Optional. Array of columns names to read. If null, all columns are read
value	Optional. Name/value pairs such that "{Name}" found in the file will be replaced with Value
null_to_zero	Optional. Whether to convert null values to zero. Either "true" or "false". Defaults to false
include_descriptions	Optional. Whether to include field descriptions. Not applicable for all table types. Either "true" or "false". Defaults to false

See Also:

Alphabetical List of GISDK Classes