Gather the preprocess data from a formula and a model,
where the output corresponds to the data structure used by the engine
gather_compute
; see estimate()
.
a formula object that defines at the
left-hand side the dependent
network (see defineDependentEvents()
) and at the right-hand side the
effects and the variables for which the effects are expected to occur
(see vignette("goldfishEffects")
).
a character string defining the model type.
Current options include "DyNAM"
, "DyNAMi"
or "REM"
Dynamic Network Actor Models (Stadtfeld, Hollway and Block, 2017 and Stadtfeld and Block, 2017)
Dynamic Network Actor Models for interactions (Hoffman et al., 2020)
Relational Event Model (Butts, 2008)
a character string defining the submodel type.
Current options include "choice"
, "rate"
or
"choice_coordination"
a multinomial receiver choice model model = "DyNAM"
(Stadtfeld and Block, 2017), or the general Relational event model
model = "REM"
(Butts, 2008).
A multinomial group choice model model = "DyNAMi"
(Hoffman et al., 2020)
a multinomial-multinomial model for coordination
ties model = "DyNAM"
(Stadtfeld, Hollway and Block, 2017)
A individual activity rates model model = "DyNAM"
(Stadtfeld and Block, 2017).
Two rate models, one for individuals joining groups and one for individuals
leaving groups, jointly estimated model = "DyNAMi"
(Hoffman et al., 2020)
a list containing additional parameters for preprocessing. It may contain:
a numerical value or a date-time character with the same time-zone formatting as the times in event that indicates the starting time to be considered during estimation. Note: it is only use during preprocessing
a numerical value or a date-time character with the same time-zone formatting as the times in event that indicates the end time to be considered during estimation. Note: it is only use during preprocessing
a list containing for each dependent event the list of available nodes for the choice model, this list should be the same length as the dependent events list (ONLY for choice models).
logical indicating whether should print a minimal output to the console of the progress of the preprocessing and estimation processes.
an environment
where formula
objects and their linked
objects are available.
a list object including:
a matrix. The number of rows can be up to the number of events times the number of actors (square number of actors for the REM). Rigth-censored events are included when the model has an intercept. The number of columns is the number of effects in the model. Every row is the effect statistics at the time of the event for each actor in the choice set or the sender set.
a numeric vector with the number of rows related with an event. The length correspond to the number of events plus right censored events if any.
a numeric vector with the position of the selected actor (choice model), sender actor (rate model), or active dyad (choice-coordination model, REM model). Indexing start at 1 for each event.
a character vector with the label of the sender/receiver actor. For right-censored events the receiver values is not meaningful.
a logical value indicating if the model has an intercept.
a character vector with a short name of the effect. It includes the name of the object used to calculate the effects and modifiers of the effect, e.g., the type of effect, weighted effect.
a character matrix with the description of the effects. It includes the name of the object used to calculate the effects and additional information of the effect, e.g., the type of effect, weighted effect, transformation function, window length.
If the model has an intercept and the subModel is rate
or model is REM
,
additional elements are included:
a numeric vector with the time span between events, including right-censored events.
a logical vector indicating if the event is dependent or right-censored.
It differs from the estimate()
output when the argument preprocessingOnly
is set to TRUE
regarding the memory space requirement.
The gatherPreprocessing()
produces a list where the first element
is a matrix that could have up to the number of events times
the number of actors rows and the number of effects columns.
For medium to large datasets with thousands of events and
thousands of actors, the memory RAM requirements are large and,
therefore, errors are produced due to a lack of space.
The advantage of the data structure is that it can be adapted
to estimate the models (or extensions of them) using standard packages
for generalized linear models (or any other model)
that use tabular data as input.
data("Fisheries_Treaties_6070")
states <- defineNodes(states)
states <- linkEvents(states, sovchanges, attribute = "present")
states <- linkEvents(states, regchanges, attribute = "regime")
states <- linkEvents(states, gdpchanges, attribute = "gdp")
bilatnet <- defineNetwork(bilatnet, nodes = states, directed = FALSE)
bilatnet <- linkEvents(bilatnet, bilatchanges, nodes = states)
createBilat <- defineDependentEvents(
events = bilatchanges[bilatchanges$increment == 1, ],
nodes = states, defaultNetwork = bilatnet
)
contignet <- defineNetwork(contignet, nodes = states, directed = FALSE)
contignet <- linkEvents(contignet, contigchanges, nodes = states)
gatheredData <- GatherPreprocessing(
createBilat ~ inertia(bilatnet) + trans(bilatnet) + tie(contignet)
)