Gather model data from a formula — gather_model

Gather the preprocess data from a formula given a model and sub model, where the output corresponds to the data structure used by the engine gather_compute; see estimate.

gather_model_data(
  formula,
  model = c("DyNAM", "REM"),
  sub_model = c("choice", "choice_coordination", "rate"),
  data = NULL,
  control_preprocessing = set_preprocessing_opt(),
  progress = getOption("progress")
)

Arguments

formula

a formula object that defines at the left-hand side the dependent network (see make_dependent_events()) and at the right-hand side the effects and the variables for which the effects are expected to occur (see vignette("goldfishEffects")).

model

a character string defining the model type. Current options include "DyNAM", "DyNAMi" or "REM"

DyNAM: Dynamic Network Actor Models (Stadtfeld, Hollway and Block, 2017 and Stadtfeld and Block, 2017)
DyNAMi: Dynamic Network Actor Models for interactions (Hoffman et al., 2020)
REM: Relational Event Model (Butts, 2008)

sub_model

A character string specifying the sub-model to be estimated. It can be "rate" to model the waiting times between events, "choice" to model the choice of the receiver, or "choice_coordination" to model coordination ties. See details.

choice: a multinomial receiver choice model estimate_dynam() (Stadtfeld and Block, 2017). A multinomial group choice model estimate_dynami() (Hoffman et al., 2020)
choice_coordination: a multinomial-multinomial model for coordination ties estimate_dynam() (Stadtfeld, Hollway and Block, 2017)
rate: A individual activity rates model estimate_dynam() (Stadtfeld and Block, 2017). Two rate models, one for individuals joining groups and one for individuals leaving groups, jointly estimated estimate_dynami()(Hoffman et al., 2020)

data

a data.goldfish object created with make_data(). It is an environment that contains the nodesets, networks, attributes and dependent events objects. Default to NULL.

control_preprocessing

An object of class "preprocessing_options.goldfish", usually the result of a call to set_preprocessing_opt(). This object contains parameters that control the data preprocessing. See set_preprocessing_opt() for details on the available parameters.

progress

logical indicating whether should print a minimal output to the console of the progress of the preprocessing and estimation processes.

Value

a list object including:

stat_all_events: a matrix. The number of rows can be up to the number of events times the number of actors (square number of actors for the REM). Rigth-censored events are included when the model has an intercept. The number of columns is the number of effects in the model. Every row is the effect statistics at the time of the event for each actor in the choice set or the sender set.
n_candidates: a numeric vector with the number of rows related with an event. The length correspond to the number of events plus right censored events if any.
selected: a numeric vector with the position of the selected actor (choice model), sender actor (rate model), or active dyad (choice-coordination model, REM model). Indexing start at 1 for each event.
sender, receiver: a character vector with the label of the sender/receiver actor. For right-censored events the receiver values is not meaningful.
has_intercept: a logical value indicating if the model has an intercept.
namesEffects: a character vector with a short name of the effect. It includes the name of the object used to calculate the effects and modifiers of the effect, e.g., the type of effect, weighted effect.
effectDescription: a character matrix with the description of the effects. It includes the name of the object used to calculate the effects and additional information of the effect, e.g., the type of effect, weighted effect, transformation function, window length.

If the model has an intercept and the sub_model is rate or model is REM, additional elements are included:

timespan: a numeric vector with the time span between events, including right-censored events.
isDependent: a logical vector indicating if the event is dependent or right-censored.

Details

It differs from the estimate_dynam(), estimate_rem() and estimate_dynami() output when the argument preprocessing_only is set to TRUE regarding the memory space requirement. The gather_model_data() produces a list where the first element is a matrix that could have up to the number of events times the number of actors rows and the number of effects columns. For medium to large datasets with thousands of events and thousands of actors, the memory RAM requirements are large and, therefore, errors are produced due to a lack of space. The advantage of the data structure is that it can be adapted to estimate the models (or extensions of them) using standard packages for generalized linear models (or any other model) that use tabular data as input.

Examples

data("Fisheries_Treaties_6070")
states <- make_nodes(states)
states <- link_events(states, sovchanges, attribute = "present")
states <- link_events(states, regchanges, attribute = "regime")
states <- link_events(states, gdpchanges, attribute = "gdp")

bilatnet <- make_network(bilatnet, nodes = states, directed = FALSE)
bilatnet <- link_events(bilatnet, bilatchanges, nodes = states)

createBilat <- make_dependent_events(
  events = bilatchanges[bilatchanges$increment == 1, ],
  nodes = states, default_network = bilatnet
)

fisheriesData <- make_data(createBilat)

gatheredData <- gather_model_data(
  createBilat ~ inertia(bilatnet) + trans(bilatnet) + tie(contignet),
  model = "DyNAM", sub_model = "choice_coordination",
  data = fisheriesData
)