This report is a supplement to my PhD thesis on fare accessibility in the transport associations of Hamburg Region (HVV) and Berlin/Brandenburg (VBB). The German thesis fulltext can be downloaded via DOI:10.15480/882.13161. An English summary of my thesis can be found in the Fare Accessibility Dashboard via DOI:10.15480/882.13164. The report describes results at a stop level. Reports for the other two levels (municipality and 500m grid) can be downloaded via DOI:10.15480/882.13162.
The report presents the estimates of four regression models that were applied to a VBB dataset. The independent variables are listed in the first column of the summary table. The dependent variable is fare accessibility (fare_acc). The aim is to predict the variability of fare_acc based on the independent variables.
After giving an overview of the input variables, this report describes the models and delivers test results that describe the data quality and support the data generation process: Moran’s I to test for autocorrelation, Likelihood Ratio tests to compare the goodness of fit, a Breusch-Pagan test for heteroskedasticity, and a Spatial Hausman test for model suitability. Furthermore, 10+1 hypotheses are tested based on the regression estimates.
Please note that this report is a technical supplement to my PhD thesis. For a more abstract interpretation of the results and comparison of different spatial resolutions, please refer to the result chapter of my thesis.
The methodology was inspired by Mark Burkey, who is an economics professor at North Carolina A&T State University. Mark has produced a number of videos that are very informative for students and researchers of spatial econometrics, for example:
##### PREPARE PLOT INPUT #####
# Reordering the levels of the 'group' variable to match the desired order and assigning new labels
bplot_import$group <- factor(bplot_import$group,
levels = c("fare_acc", "pt_index",
"purch_power", "rent", "travel_time", "pop_dens", "dist_centre", "cars_cap", "dwelling_area"))
bplot_fill_colors <- c("fare_acc" = "#178072", "pt_index" = "#62988D", "travel_time" = "grey80", "dwelling_area" = "grey80",
"cars_cap" = "grey80", "dist_centre" = "grey80",
"pop_dens" = "grey80", "purch_power" = "#a4aa4d", "rent" = "#933890")
bplot_fill_alpha <- .5
bplot_outline_size <- c(1, .5, .5, .5, .5, .5, .5, .5, .5)
bplot_median_size <- .2
# define text modules
bplot_xlab<-"Variable"
bplot_ylab<-"Standard deviation"
bplot_dec_sep="."
bplot_big_sep = ","
##### WRITE A FUNCTION FOR THE PLOT #####
create_violin_boxplot <- function(pinput, bplot_fill_alpha, bplot_fill_colors, group_name, bplot_ylab, bplot_median_size) {
# Check if the specified group_name exists in the dataset
if (!(group_name %in% unique(pinput$group))) {
stop("The specified group_name is not present in the dataset.")
}
# Calculate sample size for the specified group
group_sample_size <- pinput %>%
filter(group == group_name) %>%
nrow() # Number of rows for the specified group
# Create the ggplot
p <- ggplot(pinput, aes(y = sd, x = group, fill = group, color = group)) +
geom_hline(yintercept = 0, linewidth = .4, linetype = "dotted", color = "grey50") + # Add horizontal line at y = 0
# Violin plot
geom_violin(aes(linewidth = ifelse(group == "fare_acc", "thick", "thin")), # Conditional mapping for outline size
scale = "width", alpha = bplot_fill_alpha) +
scale_fill_manual(values = bplot_fill_colors, guide = FALSE) +
scale_color_manual(values = bplot_fill_colors, guide = FALSE) +
scale_linewidth_manual(values = c(thick = 1, thin = 0.5), guide = "none") + # Disable legend for thick/thin
# Boxplot
geom_boxplot(width = 0.1, outlier.size = .1, color = 'black', fill = "black", alpha = 0) + # Boxplot
# Update x-axis label to include sample size of the specified group
scale_x_discrete(name = paste0("Variable<br>(n = ", formatC(group_sample_size, big.mark=","), " ", lvl_en_p, ")")) +
scale_y_continuous(name = bplot_ylab)+
# theme options
theme_light() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1), # Adjust angle and justification of x-axis labels
axis.ticks.x = element_blank(),
axis.text = element_text(color = 'black'),
panel.background = element_rect(fill = "white"))# Set the background color of the plot area to white
# Convert to an interactive plotly plot
p_interactive <- ggplotly(p)
# Disable tooltips for all traces
p_interactive <- p_interactive %>%
style(hoverinfo = "none", text = NULL)
# Return the interactive plotly plot
return(p_interactive)
}
##### CALCULALTE PLOTS FOR DIFFERENT SAMPLES #####
# create a plot that includes outliers
bplot_outliers <- create_violin_boxplot(pinput = bplot_import, bplot_fill_alpha = bplot_fill_alpha, bplot_fill_colors = bplot_fill_colors, bplot_ylab = bplot_ylab, bplot_median_size = bplot_median_size, group_name = "pt_index")
# create a plot that is stripped of outliers
bplot <- create_violin_boxplot(pinput = subset(bplot_import, outlier == 0), bplot_fill_alpha = bplot_fill_alpha, bplot_fill_colors = bplot_fill_colors, bplot_ylab = bplot_ylab, bplot_median_size = bplot_median_size, group_name = "pt_index")
# create a plot that is stripped of outliers - URBAN ONLY
bplot_u <- create_violin_boxplot(pinput = subset(bplot_import, outlier_u == 0 & raumt_kl == 1), bplot_fill_alpha = bplot_fill_alpha, bplot_fill_colors = bplot_fill_colors, bplot_ylab = bplot_ylab, bplot_median_size = bplot_median_size, group_name = "pt_index")
# create a plot that is stripped of outliers - RURAL ONLY
bplot_r <- create_violin_boxplot(pinput = subset(bplot_import, outlier_r == 0 & raumt_kl == 0), bplot_fill_alpha = bplot_fill_alpha, bplot_fill_colors = bplot_fill_colors, bplot_ylab = bplot_ylab, bplot_median_size = bplot_median_size, group_name = "pt_index")
# plots will be called inline below!
As the plot below shows, the input variables carry lots of outliers.
The plot below is corrected for outliers beyond ± 1.5 IQR for any of the input variables.
These are aggregates of the variables that were used for the (auto-)regressive models. The sample is stripped of outliers (see boxplot above) and contains urban as well as rural stops.
# Create a kableExtra table and set styles for all columns
agg_table_styled <- kable(
agg_table_import,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, bold = TRUE, extra_css = "text-align: right;") # Make first column bold and align right
# Print or use the modified HTML table
print(agg_table_styled)
variable_en | unit_en | mean | median | sd | min | max | source_en | yr |
---|---|---|---|---|---|---|---|---|
Fare accessibility on a €2.30 budget | 10 Standard deviations | 0.00 | -0.01 | 0.07 | -0.20 | 0.20 | Own calculation, see method section | 12/2018 |
Average per capita purchasing power | €10,000/year | 2.04 | 1.97 | 0.37 | 1.22 | 3.09 | infas360 | 12/2018 |
Average rent | €/sqm/month, no utilities | 6.63 | 6.25 | 1.29 | 4.13 | 11.00 | infas360 | 12/2018 |
Private cars per capita | Count, approximated from street block | 0.51 | 0.51 | 0.12 | 0.13 | 0.88 | infas360 | 2018/19 |
Distance to the nearest urban centre | km, radial | 3.64 | 2.62 | 2.97 | 0.27 | 12.35 | Regional plans | 12/2018 |
Public transport index | [no unit] | 4.59 | 4.52 | 2.02 | 0.00 | 11.12 | Own calculation, see method section | 2018/19 |
Travel time to the next destination | 10 minutes, weighted across 15 categories similar to fare accessibility, see method section | 2.08 | 1.86 | 1.01 | 0.32 | 5.10 | HVV API (“GEOFOX”) | 12/2018 |
Population density | 100 persons/ha | 0.35 | 0.22 | 0.32 | 0.02 | 1.41 | infas360 | 12/2018 |
Average dwelling area per capita | 10sqm | 5.71 | 5.80 | 1.03 | 2.50 | 9.00 | infas360 | 12/2018 |
The plot below shows corrected data for urban areas (as per the definition in the method section of my PhD thesis).
The plot below shows corrected data for rural areas.
This report presents four models at the stop level:
SDEM and SLX assume a spatially lagged distribution of the residuals of the independent variable, while the other two do not assume a lag.X. The lag.X columns of SEM and OLS are therefore empty. Please note: For this table, fare accessibility was multiplied by 10 in order to obtain readable coefficients. I give an example in the Interpretation section below the table.
# #the parameter "results ='asis'" is crucial. Otherwise, the output would be raw html code
##### Calculate Pseudo-R^2^ for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#SDEM
psr2_sdem <- 1 - (sdem$SSE / (var(stops[[regr]]) * (length(stops[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (sem$SSE / (var(stops[[regr]]) * (length(stops[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(sdem, sem, slx, ols),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '') # this one inserts two values and two empty strings into the table
),
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Multivariate regression estimates of the four models<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The statistical effects of each model are interpreted below using purchasing power as an example. The other independent variables can be interpreted in the same way. The following levels of significance are used:
p | |
---|---|
* | < 0.05 |
** | < 0.01 |
*** | < 0.001 |
# #download and launch plugin to insert a table easily (available when using the "Visual" mode on the upper left)
# install.packages("devtools")
# devtools::install_github("lbusett/insert_table")
# Extract the p-value and beta coefficient for the variable of interest
#SDEM
sdem_p_kk <- coef(summary(sdem))["purch_power", "Pr(>|z|)"]
sdem_p_kk_lag <- coef(summary(sdem))["lag.purch_power", "Pr(>|z|)"]
# extract beta
sdem_kk <- round(sdem[["coefficients"]][["purch_power"]],2)
# round beta
if (abs(sdem_kk) < 1) {
sdem_kk_r <- round(sdem_kk, 2) # round to 2 decimal places (0.06)
} else {
sdem_kk_r <- round(sdem_kk, 0) # round to nearest integer
}
# extract beta (lag)
sdem_kk_lag <- round(sdem[["coefficients"]][["lag.purch_power"]],2)
# round beta (lag)
if (abs(sdem_kk_lag) < 1) {
sdem_kk_lag_r <- round(sdem_kk_lag, 2) # round to 2 decimal places (0.06)
} else {
sdem_kk_lag_r <- round(sdem_kk_lag, 0) # round to nearest integer
}
#SEM
sem_p_kk <- coef(summary(sem))["purch_power", "Pr(>|z|)"]
# extract beta
sem_kk <- round(sem[["coefficients"]][["purch_power"]],2)
# round beta
if (abs(sem_kk) < 1) {
sem_kk_r <- round(sem_kk, 2) # round to 2 decimal places (0.06)
} else {
sem_kk_r <- round(sem_kk, 0) # round to nearest integer
}
#SLX
slx_p_kk <- coef(summary(slx))["purch_power", "Pr(>|t|)"]
# extract beta
slx_kk <- round(slx[["coefficients"]][["purch_power"]],2)
# round beta
if (abs(slx_kk) < 1) {
slx_kk_r <- round(slx_kk, 2) # round to 2 decimal places (0.06)
} else {
slx_kk_r <- round(slx_kk, 0) # round to nearest integer
}
#OLS
ols_summ <- summary(ols)
ols_p_kk <- round(ols_summ$coefficients["purch_power", "Pr(>|t|)"],2)
# extract beta
ols_kk <- round(ols[["coefficients"]][["purch_power"]],2)
# round beta
if (abs(ols_kk) < 1) {
ols_kk_r <- round(ols_kk, 2) # round to 2 decimal places (0.06)
} else {
ols_kk_r <- round(ols_kk, 0) # round to nearest integer
}
# note that I can weave these p values into the report text:
# "(p = `r sdem_p_kk`)"
As the summary table shows, the purchasing power (kk) exerts an effect of 1.5 on the fare accessibility (fare_acc). This means that for every 1,000 EUR increase in purchasing power in a given stop, we should expect an increase of approximately 0.2 fare accessibility units. (*correction factor of 10 for readability of coefficients). The regression estimates are ***significant.
The SDEM also estimates spillover effects: the purchasing power (kk) exerts an indirect effect of 2.29 on the fare accessibility (fare_acc) in a neighbouring stop. This means that for every 1,000 EUR increase in purchasing power in a given stop, we should expect an increase of approximately 0.2 fare accessibility units in any neighbouring stop.
As the summary table shows, the purchasing power (kk) exerts an effect of 1.28 on the fare accessibility (fare_acc). This means that for every 1,000 EUR increase in purchasing power in a given stop, we should expect an increase of approximately 0.1 fare accessibility units. The regression estimates are ***significant.
As the summary table shows, the purchasing power (kk) exerts an effect of 1.9 on the fare accessibility (fare_acc). This means that for every 1,000 EUR increase in purchasing power in a given stop, we should expect an increase of approximately 0.2 fare accessibility units. The regression estimates are **significant.
As the summary table shows, the purchasing power (kk) exerts an effect of 3.81 on the fare accessibility (fare_acc). This means that for every 1,000 EUR increase in purchasing power in a given stop, we should expect an increase of approximately 0.4 fare accessibility units. The regression estimates are ***significant.
The table below shows the same models with normalized input variables. Each variable is standardized to Mean = 0 and Standard Deviation = 1. This allows the beta coefficients to be compared with each other even though they have different units. The interpretation is similar to the example above, but the unit is Standard Deviations.
# #the parameter "results ='asis'" is crucial. Otherwise, the output would be raw html code
##### Calculate Pseudo-R^2^ for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
regr <- "fare_acc_zt"
dv <- regr
dv_en <- t2_en
#SDEM
psr2_sdem <- 1 - (sdem_zt$SSE / (var(stops[[regr]]) * (length(stops[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (sem_zt$SSE / (var(stops[[regr]]) * (length(stops[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(sdem_zt, sem_zt, slx_zt, ols_zt),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '') # this one inserts two values and two empty strings into the table
),
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Multivariate regression estimates of the four models, normalized values<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The Moran’s I test indicates whether it is useful to use
spatial models. It tests against the following null hypothesis:
There is no global spatial correlation in the residuals.
If p ≤ .05, we reject the null hypothesis, i.e., we observe global
spatial autocorrelation, i.e. contiguous features tend to have similar
values.
moran_regr <- lm.morantest(ols, nb, zero.policy=T)
if(moran_regr$p.value <= .05) {
moran_regr$autoc <- T
} else {
moran_regr$autoc <- F
}
#transpone the moran values into a dataframe
moran_out_num <- t(broom::tidy(moran_regr))
# assign column name (that's important to coerce/cast the data type later)
colnames(moran_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"estimate1" = "Observed I",
"estimate2" = "Expectation",
"estimate3" = "Variance",
"statistic" = "Statistic",
"p.value" = "p Value",
"method" = "Method",
"alternative" = "Alternative")
# Apply replacements for row names
row.names(moran_out_num) <- replacement_patterns[row.names(moran_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method", "Alternative")
#write character rows into new matrix
moran_out_chr <- as.data.frame(moran_out_num[(rownames(moran_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
moran_out_num <- as.data.frame(moran_out_num[!(rownames(moran_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
moran_out_num$value <- round(as.numeric(moran_out_num$value),3)
# call the moran results:
# meta information in character rows
DT::datatable(
moran_out_chr,
options = list(
caption = "Moran's I parameters",
autoWidth = FALSE,
searching = FALSE,
bPaginate = FALSE,
bLengthChange = FALSE,
bInfo = FALSE,
columnDefs = list(
list(targets = 1, # target only the 2nd column (the count seems to start from 0)
className = "dt-right") # Set text alignment to right
)
)
)
# call the moran results:
# estimates in numeric rows
DT::datatable(
moran_out_num,
options = list(caption = "Moran's I estimates",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. Thus, we observe spatial autocorrelation within the OLS residuals and should use spatial models.
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POLYGON
# load spatial data (important for Moran's I and R^2)
polygons <- readRDS(paste0(ur_s, "_ausw", vers, ".rds"))
con <- readRDS(paste0(ur_s, "_ausw", vers, "_con.rds"))
nb <- readRDS(paste0(ur_s, "_ausw", vers, "_nb.rds"))
#urban areas
polygons_u <- readRDS(paste0(ur_s, "_ausw", vers, "_u.rds"))
con_u <- readRDS(paste0(ur_s, "_ausw", vers, "_con_u.rds"))
nb_u <- readRDS(paste0(ur_s, "_ausw", vers, "_nb_u.rds"))
#rural areas
polygons_r <- readRDS(paste0(ur_s, "_ausw", vers, "_r.rds"))
con_r <- readRDS(paste0(ur_s, "_ausw", vers, "_con_r.rds"))
nb_r <- readRDS(paste0(ur_s, "_ausw", vers, "_nb_r.rds"))
# prepare data:
#there's one polygon that has no fare accessibility value (02000000-96011, HH-Eissendorf). Assign the average value to it to enable lagged values in the next step
polygons$fare_acc[is.na(polygons$fare_acc)] <- mean(polygons$fare_acc, na.rm = TRUE)
# calculate lagged values
polygons$fare_acc.lag <- lag.listw(nb, na.omit(polygons$fare_acc))
col1 <- "#178072"
p_var <- "fare_acc"
p_var_en <- "fare accessibility up to €2.30"
p_xlab <- paste0(p_var_en, " (", p_var, ")")
p_ylab <- paste0("lagged ", p_var_en, " (", p_var, ".lag)")
p <- ggplot(polygons, aes(x = fare_acc, y = fare_acc.lag)) +
geom_point(color = "black", alpha = .5) +
geom_smooth(method = "lm", se = FALSE, color = col1) +
labs(x = p_xlab, y = p_ylab)
p <- ggplot(polygons, aes(x = fare_acc, y = fare_acc.lag)) +
geom_point(color = "black", alpha = .5,
aes(text = sprintf("%s<br><b>%s</b><br>fare_acc: \t\t%.2f\nfare_acc.lag: \t%.2fe-6",
ags, name, fare_acc, fare_acc.lag))) +
# geom_point(aes(color = "black", alpha = .5, text = sprintf("<b>fare_acc:</b> %.2f<br><b>fare_acc.lag:</b> %.2f e-6<br><b>ags:</b> <b>%s</b><br><b>name:</b> <b>%s</b>", fare_acc, fare_acc.lag, polygons$ags, polygons$name))) +
geom_smooth(method = "lm", se = FALSE, color = col1) +
labs(x = p_xlab, y = p_ylab)
#ggplotly(p)
ggplotly(p, tooltip = "text")
# # spdep's out-of-the-box plot would this one:
# spdep::moran.plot(polygons$kk, nb, zero.policy=attr(nb, "zero.policy"), spChk=NULL, labels=F,
# xlab=NULL, ylab=NULL, quiet=NULL, plot=TRUE, return_df=F)
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POINT
# prepare data:
# calculate lagged values
stops$fare_acc.lag <- lag.listw(nb, na.omit(stops$fare_acc))
# if on raster level: rename the first column to h_id_varchar and add an empty name column
if (lvl == "z500" | lvl == "z250" | lvl == "z100"){
names(stops)[1] <- 'h_id_varchar'
names(stops_u)[1] <- 'h_id_varchar'
names(stops_r)[1] <- 'h_id_varchar'
stops$h_name <- ""
stops_u$h_name <- ""
stops_r$h_name <- ""
# if we're analysing vbb stops that have gtfs as a PK: rename PK column
} else if (ur_s == "vbb" & lvl == "hst"){
names(stops)[1] <- 'h_id_varchar'
names(stops_u)[1] <- 'h_id_varchar'
names(stops_r)[1] <- 'h_id_varchar'
}
col1 <- "#178072"
p_var <- "fare_acc"
p_var_en <- "fare accessibility up to €2.30"
p_xlab <- paste0(p_var_en, " (", p_var, ")")
p_ylab <- paste0("lagged ", p_var_en, " (", p_var, ".lag)")
p <- ggplot(stops, aes(x = fare_acc, y = fare_acc.lag)) +
geom_point(color = "black", alpha = .5) +
geom_smooth(method = "lm", se = FALSE, color = col1) +
labs(x = p_xlab, y = p_ylab)
p <- ggplot(stops, aes(x = fare_acc, y = fare_acc.lag)) +
geom_point(color = "black", alpha = .5,
aes(text = sprintf("%s<br><b>%s</b><br>fare_acc: \t\t%.2f\nfare_acc.lag: \t%.2fe-6",
h_id_varchar,
h_name,
fare_acc, fare_acc.lag))) +
# geom_point(aes(color = "black", alpha = .5, text = sprintf("<b>fare_acc:</b> %.2f<br><b>fare_acc.lag:</b> %.2f e-6<br><b>ags:</b> <b>%s</b><br><b>name:</b> <b>%s</b>", fare_acc, fare_acc.lag, stops$ags, stops$name))) +
geom_smooth(method = "lm", se = FALSE, color = col1) +
labs(x = p_xlab, y = p_ylab)
#ggplotly(p)
ggplotly(p, tooltip = "text")
# # spdep's out-of-the-box plot would this one:
# spdep::moran.plot(stops$kk, nb, zero.policy=attr(nb, "zero.policy"), spChk=NULL, labels=F,
# xlab=NULL, ylab=NULL, quiet=NULL, plot=TRUE, return_df=F)
# define text chunks that explain LISA (or LISA's absence), which will be called depending on the variable lisa_draw that defines if a plot shall be drawn, depending on the spatial level.
lisa_T <- paste0("The LISA test (local indicators of spatial association) measures
autocorrelation at the local level, i.e. for each area compared to
its immediate neighbours. Again, it is the level of statistical significance that is relevant. LISA can be
mapped to explore spatial patterns, as the following plot shows for the distribution of ", dv_en, ".
The plot shows four different LISA patterns by comparing each area with its neighbours. Note that LISA shows significant patterns of of clustering. However, it does not allow any interpretation of the values themselves -- or, as Luc Anselin puts it, ['[LISA] only suggests interesting locations ... it does not prove anything!'](https://youtu.be/xMzK1iNgDFE?si=OrMu_kx-VYCbZC05&t=1305)
###### The plot below shows data for ", formatC(nobs(ols), format = "d", big.mark = ","), " ", lvl_en_p, " (no outliers).")
lisa_F <- paste0("A LISA plot for the ", lvl, " level would require much computing capacity. As it is based on a nearest-neighbour-analysis, it would deliver a picture quite similar to the other spatial levels. Therefore, this report does not include a LISA plot. Please refer to the reports for ", lvl_oth, " levels, which can be downloaded via [DOI:10.15480/882.13162](https://doi.org/10.15480/882.13162).")
The LISA test (local indicators of spatial association) measures autocorrelation at the local level, i.e. for each area compared to its immediate neighbours. Again, it is the level of statistical significance that is relevant. LISA can be mapped to explore spatial patterns, as the following plot shows for the distribution of fare accessibility. The plot shows four different LISA patterns by comparing each area with its neighbours. Note that LISA shows significant patterns of of clustering. However, it does not allow any interpretation of the values themselves – or, as Luc Anselin puts it, ‘[LISA] only suggests interesting locations … it does not prove anything!’
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POLYGON
# Compute LISA plot
# Inspired by https://stackoverflow.com/a/70387888
# Calculate a separate nb matrix for local LISA test
# Note that I assign a 30m buffer to heal the boundary effect between metropolis and outskirts
if (!exists("nb_loc")) {
# If it doesn't exist, create nb_loc
nb_loc <- rgeoda::queen_weights(st_buffer(polygons, 30))
}
# Calculate local Moran's I
moran_loc <- rgeoda::local_moran(nb_loc, st_drop_geometry(polygons["fare_acc"]))
moran_loc_lbls <- lisa_labels(moran_loc)
moran_loc_colors <- setNames(lisa_colors(moran_loc), moran_loc_lbls)
# Define my own colors instead of the standard ones (i.e., replace them in the respective array)
moran_loc_colors <- dplyr::recode(
moran_loc_colors,
"#eeeeee" = "#eeeeee", # not significant
"#FF0000" = "#0f4f4c", # high-high
"#f4ada8" = "#178072", # high-low
"#0000FF" = "#2DC6D6", # low-low
"#a7adf9" = "#C1EEF3", # low-high
"#464646" = "#464646", # undefined
"#999999" = "#999999" # isolated
)
# Define clusters
moran_clustered <- polygons %>%
st_drop_geometry() %>%
select(ags) %>%
mutate(cluster_num = lisa_clusters(moran_loc) + 1, # Add 1 because clusters are zero-indexed
cluster = factor(moran_loc_lbls[cluster_num], levels = moran_loc_lbls)) %>%
right_join(polygons, by = "ags") %>%
st_as_sf()
# # Map the clusters as a static ggplot
# ggplot(moran_clustered, aes(fill = cluster)) +
# geom_sf(color = "white", size = 0) +
# scale_fill_manual(name = "New Legend Title", values = moran_loc_colors, na.value = "grey90") +
# theme_minimal() +
# theme(panel.grid = element_blank(),
# axis.text = element_blank(),
# axis.title = element_blank(),
# panel.border = element_blank(),
# panel.background = element_blank())
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POLYGON
# Define color palette
pal <- colorFactor(moran_loc_colors, domain = moran_clustered$cluster)
# Define LISA label for each location, based on numeric cluster field
moran_clustered <- mutate(moran_clustered, cluster_label = case_when(
cluster_num == 1 ~ "No significant clustering",
cluster_num == 2 ~ "<strong>High fare_acc</strong> in this area, <strong>high fare_acc</strong> in the neighbouring areas",
cluster_num == 3 ~ "<strong>Low fare_acc</strong> in this area, <strong>low fare_acc</strong> in the neighbouring areas",
cluster_num == 4 ~ "<strong>Low fare_acc</strong> in this area, <strong>high fare_acc</strong> in the neighbouring areas",
cluster_num == 5 ~ "<strong>High fare_acc</strong> in this area, <strong>low fare_acc</strong> in the neighbouring areas",
cluster_num == 7 ~ "<strong>Isolated:</strong>This area has no neighbours",
TRUE ~ as.character(cluster_num)
))
moran_clustered$label <- paste0(
"<strong>", moran_clustered$ags, "<br/>",
moran_clustered$name, "</strong><br/>",
moran_clustered$cluster_label
) %>% lapply(htmltools::HTML)
# Render map
leaflet(moran_clustered) %>%
addProviderTiles('CartoDB.Positron', group = "Basemap") %>%
addPolygons(stroke = TRUE, smoothFactor = 0,
fillColor = ~pal(cluster),
color = "grey", # Thin grey outline for better visibility
weight = 0.5, # Outline thickness
fillOpacity = 0.7, # Transparency of the polygons
label = ~label,
highlight = highlightOptions(
color = "white",
bringToFront = TRUE
),
layerId = ~ags,
group = "LISA Clusters"
) %>%
addLegend(position = 'bottomright', pal = pal,
values = moran_clustered$cluster, title = 'LISA Clusters') %>%
addFullscreenControl() %>%
addLayersControl(
baseGroups = c("Basemap", "No Basemap"),
overlayGroups = c("LISA Clusters"),
options = layersControlOptions(collapsed = FALSE)
) %>%
hideGroup("No Basemap") # Start with the basemap visible
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POINT and the level is not raster 500m. The raster LISA plot would require too much computing capacity for only little gain in result.
# Compute LISA plot
# set CRS
#stops <- st_transform(stops, 4326)
st_crs(stops) <- 3857
# for testing: Create a subset of ~100 stops
#stops <- subset(stops, km_zentrum<=50)
#st_crs(stops) # check CRS
#st_crs(moran_clustered) # check CRS
# Check if matrix on local weights exists
#if (!exists("nb_loc")) {
# If it doesn't exist, create matrix
nb_loc <- rgeoda::queen_weights(stops)
# # Drop geometry to pass only numeric values to local_moran
stops_nogeom <- st_drop_geometry(stops)
# Calculate local Moran's I (fare_acc is the variable I want to investigate)
moran_loc <- rgeoda::local_moran(nb_loc, stops_nogeom["fare_acc"])
moran_loc_lbls <- lisa_labels(moran_loc)
moran_loc_colors <- setNames(lisa_colors(moran_loc), moran_loc_lbls)
# Define my own colors instead of the standard ones (i.e., replace them in the respective array)
moran_loc_colors <- dplyr::recode(
moran_loc_colors,
"#eeeeee" = "#eeeeee", # not significant
"#FF0000" = "#0f4f4c", # high-high
"#f4ada8" = "#178072", # high-low
"#0000FF" = "#2DC6D6", # low-low
"#a7adf9" = "#C1EEF3", # low-high
"#464646" = "#464646", # undefined
"#999999" = "#999999" # isolated
)
# Define clusters
moran_clustered <- stops %>%
mutate(cluster_num = lisa_clusters(moran_loc) + 1, # add 1 bc clusters are zero-indexed
cluster = factor(moran_loc_lbls[cluster_num], levels = moran_loc_lbls),
h_id_varchar = as.character(h_id_varchar)) # Ensure h_id_varchar is character type
# the eval clause in the header defines that the chunk will only be executed when the geometry type is POINT
# define color palette
pal <- colorFactor(moran_loc_colors, domain = moran_clustered$cluster)
# define LISA label for each location, based on numeric cluster field
# define LISA label for each location, based on numeric cluster field
# note that ggplotly only allows the following HTML tags: <br>, <sub>, <sup>, <b>, <i>, <em>
# (https://stackoverflow.com/a/57812154/15097106)
moran_clustered <- mutate(moran_clustered, cluster_label = case_when(
cluster_num == 1 ~ "No significant clustering",
cluster_num == 2 ~ "<b>High fare_acc</b> at this stop,<br><b>high fare_acc</b> at the neighbouring stops",
cluster_num == 3 ~ "<b>Low fare_acc</b> at this stop,<br><b>low fare_acc</b> at the neighbouring stops",
cluster_num == 4 ~ "<b>Low fare_acc</b> at this stop,<br><b>high fare_acc</b> at the neighbouring stops",
cluster_num == 5 ~ "<b>High fare_acc</b> at this stop,<br><b>low fare_acc</b> at the neighbouring stops",
cluster_num == 7 ~ "<b>Isolated:</b> This stop has no neighbours",
TRUE ~ as.character(cluster_num) # Default label for other cases
))
# note that ggplotly only allows the following HTML tags: <br>, <sub>, <sup>, <b>, <i>, <em>
# (https://stackoverflow.com/a/57812154/15097106)
# Construct labels
moran_clustered$label <- paste0(
"<b>",
moran_clustered$h_name, "</b><br>",
moran_clustered$h_id_varchar, "<br>",
moran_clustered$cluster_label
)
#ensure the correct EPSG
moran_clustered <- st_transform(moran_clustered, crs = st_crs(3857))
moran_clustered <- st_set_crs(moran_clustered, 3857)
# draw a ggplot with labels
# there were serious issues with the CRS which cost me 2 days of work.
# In the end, I decided not to reproject the layers but create a new joined layer of the stops geometry and selected attributes of the moran_clustered sf, and draw an interactive ggplot instead of a leaflet map.
stops_joined <- stops %>%
left_join(moran_clustered %>% st_drop_geometry() %>% select(h_id_varchar, cluster, label),
by = "h_id_varchar")
# Ensure the 'label' column is a character vector
stops_joined$label <- as.character(stops_joined$label)
# Debug: Check for non-finite coordinates
# non_finite_coords <- sum(!is.finite(st_coordinates(stops_joined)))
# print(paste("Number of non-finite coordinates:", non_finite_coords))
# Create the ggplot object
p <- ggplot(data = stops_joined) +
geom_sf(aes(color = cluster, text = label), size = 1) +
scale_color_manual(values = moran_loc_colors) +
theme_void() + # Remove all axes and grids
theme(
plot.background = element_rect(fill = "grey10"), # Set background to grey10
panel.background = element_rect(fill = "grey0"), # Set panel background to grey0
plot.title = element_text(color = "white", size = 14, face = "bold"), # Title font to white
legend.title = element_text(color = "white"), # Legend title font to white
legend.text = element_text(color = "white") # Legend text font to white
) +
labs(
title = "LISA plot for fare accessibility",
color = "Cluster"
)
#p
# not functional:
# Convert to an interactive plotly plot
#
#p_interactive <- ggplotly(p, tooltip = "text")
p_interactive <- ggplotly(p)
# Ensure fullscreen button and mode bar are enabled
p_interactive <- p_interactive %>%
config(
displayModeBar = TRUE, # Ensure mode bar is displayed
displaylogo = FALSE, # Optionally, hide Plotly logo
modeBarButtonsToAdd = c('fullscreen') # Add fullscreen button to mode bar
)
# Render the interactive plot
p_interactive
It may be advisable to simplify the statistical analysis by using a more basic model. The Likelihood Ratio test (LR) indicates whether it is necessary to restrict a spatial model to a spatial model or even to a non-spatial model, as explained in Burkey’s 2nd video (minute 21).
Firstly, we execute an LR test against the null hypothesis: We
should restrict the model to SEM.
If p > .05, we can’t reject H0 and we should restrict the model to
SEM.
restr2sem <- spatialreg::LR.Sarlm(sdem, sem )
if(restr2sem$p.value >= .05) {
restr2sem$restrict <- T
} else {
restr2sem$restrict <- F
}
# restr2sem
#transpone the lr values into a dataframe
lr_out_num <- t(broom::tidy(restr2sem))
# assign column name (that's important to coerce/cast the data type later)
colnames(lr_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"estimate1" = "Log likelihood of SDEM",
"estimate2" = "Log likelihood of SEM",
"statistic" = "Likelihood ratio",
"p.value" = "p Value",
"parameter" = "Parameter",
"method" = "Method")
# Apply replacements for row names
row.names(lr_out_num) <- replacement_patterns[row.names(lr_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method")
#write character rows into new matrix
lr_out_chr <- as.data.frame(lr_out_num[(rownames(lr_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
lr_out_num <- as.data.frame(lr_out_num[!(rownames(lr_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
lr_out_num$value <- round(as.numeric(lr_out_num$value),3)
# call the results:
# meta information in character rows
DT::datatable(
lr_out_chr,
options = list(
caption = "Likelihood Ratio parameters for SDEM → SEM",
autoWidth = FALSE,
searching = FALSE,
bPaginate = FALSE,
bLengthChange = FALSE,
bInfo = FALSE,
columnDefs = list(
list(targets = 1, # target only the 2nd column (the count seems to start from 0)
className = "dt-right") # Set text alignment to right
)
)
)
# call the moran results:
# estimates in numeric rows
DT::datatable(
lr_out_num,
options = list(caption = "Likelihood Ratio estimates for SDEM → SEM",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. Thus, it is not advised to restrict the
model to SEM.
Secondly, we execute an LR test against the null hypothesis: We
should restrict the model to SLX (similar to the H0 above).
If p > .05, we can’t reject H0 and we should restrict the model to
SLX.
If p ≤ .05,we do not need restrict the SDEM model. (In this case, we
don’t need to restrict to OLS either).
restr2slx <- spatialreg::LR.Sarlm(sdem, slx )
if(restr2slx$p.value >= .05) {
restr2slx$restrict <- T
} else {
restr2slx$restrict <- F
}
# restr2slx
#transpone the lr values into a dataframe
lr_out_num <- t(broom::tidy(restr2slx))
# assign column name (that's important to coerce/cast the data type later)
colnames(lr_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"estimate1" = "Log likelihood of SDEM",
"estimate2" = "Log likelihood of SLX",
"statistic" = "Likelihood ratio",
"p.value" = "p Value",
"parameter" = "Parameter",
"method" = "Method")
# Apply replacements for row names
row.names(lr_out_num) <- replacement_patterns[row.names(lr_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method")
#write character rows into new matrix
lr_out_chr <- as.data.frame(lr_out_num[(rownames(lr_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
lr_out_num <- as.data.frame(lr_out_num[!(rownames(lr_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
lr_out_num$value <- round(as.numeric(lr_out_num$value),3)
# call the results:
# meta information in character rows
DT::datatable(
lr_out_chr,
options = list(
caption = "Likelihood Ratio parameters for SDEM → SLX",
autoWidth = FALSE,
searching = FALSE,
bPaginate = FALSE,
bLengthChange = FALSE,
bInfo = FALSE,
columnDefs = list(
list(targets = 1, # target only the 2nd column (the count seems to start from 0)
className = "dt-right") # Set text alignment to right
)
)
)
# call the moran results:
# estimates in numeric rows
DT::datatable(
lr_out_num,
options = list(caption = "Likelihood Ratio estimates for SDEM → SLX",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. Thus, it is not advised to restrict the
model to SLX.
Thirdly, we execute an LR test against the null hypothesis: We
should restrict the model to OLS (similar to the H0 above).
If p > .05, we can’t reject H0 and we should restrict the model to
OLS.
If p ≤ .05,we do not need restrict the SDEM model.
restr2ols <- spatialreg::LR.Sarlm(sdem, ols )
if(restr2ols$p.value >= .05) {
restr2ols$restrict <- T
} else {
restr2ols$restrict <- F
}
# restr2ols
#transpone the lr values into a dataframe
lr_out_num <- t(broom::tidy(restr2ols))
# assign column name (that's important to coerce/cast the data type later)
colnames(lr_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"estimate1" = "Log likelihood of SDEM",
"estimate2" = "Log likelihood of OLS",
"statistic" = "Likelihood ratio",
"p.value" = "p Value",
"parameter" = "Parameter",
"method" = "Method")
# Apply replacements for row names
row.names(lr_out_num) <- replacement_patterns[row.names(lr_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method")
#write character rows into new matrix
lr_out_chr <- as.data.frame(lr_out_num[(rownames(lr_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
lr_out_num <- as.data.frame(lr_out_num[!(rownames(lr_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
lr_out_num$value <- round(as.numeric(lr_out_num$value),3)
# call the results:
# meta information in character rows
DT::datatable(
lr_out_chr,
options = list(
caption = "Likelihood Ratio parameters for SDEM → OLS",
autoWidth = FALSE,
searching = FALSE,
bPaginate = FALSE,
bLengthChange = FALSE,
bInfo = FALSE,
columnDefs = list(
list(targets = 1, # target only the 2nd column (the count seems to start from 0)
className = "dt-right") # Set text alignment to right
)
)
)
# call the moran results:
# estimates in numeric rows
DT::datatable(
lr_out_num,
options = list(caption = "Likelihood Ratio estimates for SDEM → OLS",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. Thus, it is not advised to restrict the
model to OLS.
The studentized Breusch-Pagan test measures heteroskedasticity, i.e.
whether the residuals of the dependent variable depend on the
independent variable. It tests against the null hypothesis: There is
no heteroskedasticity. If p ≤ .05, we can reject H0 and have to
assume heteroskedasticity. This [slightly] affects our standard errors
and p values in the regression, but not the coefficients. The effect on
the p values could lead to incorrect hypothesis testing in the models
above.
If p > .05, we can assume that our regression model is not affected
by heteroskedasticity.
bp_sdem <- spatialreg::bptest.Sarlm(sdem, studentize = TRUE)
if(bp_sdem$p.value < .05) {
bp_sdem$het <- T
} else {
bp_sdem$het <- F
}
# bp_sdem
#transpone the lr values into a dataframe
bp_out_num <- t(broom::tidy(bp_sdem))
# assign column name (that's important to coerce/cast the data type later)
colnames(bp_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"statistic" = "Breusch-Pagan Statistic",
"p.value" = "p Value",
"parameter" = "Degrees of Freedom",
"method" = "Method")
# Apply replacements for row names
row.names(bp_out_num) <- replacement_patterns[row.names(bp_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method")
#write character rows into new matrix
bp_out_chr <- as.data.frame(bp_out_num[(rownames(bp_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
bp_out_num <- as.data.frame(bp_out_num[!(rownames(bp_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
bp_out_num$value <- round(as.numeric(bp_out_num$value),3)
# call the results:
# estimates in numeric rows
DT::datatable(
bp_out_num,
options = list(caption = "Breusch-Pagan statistics for SDEM",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. Thus, we have to assume heteroskedasticity in the model.
The Spatial Hausman test compares the SEM to the OLS parameters. It
tests against the null hypothesis: There is no difference beween the
SEM and OLS parameters.
If p ≤ .05, we can reject H0 and have to assume that neither of the two
models are a good fit to the data and/or that the models are not the
optimal choice to capture spatial autocorrelation (Burkey
1, minute 33ff).
hausman_regr <- spatialreg::Hausman.test(sem)
if(hausman_regr$p.value <= .05) {
hausman_regr$significant <- T
} else {
hausman_regr$significant <- F
}
#transpone the lr values into a dataframe
hausman_out_num <- t(broom::tidy(hausman_regr))
# assign column name (that's important to coerce/cast the data type later)
colnames(hausman_out_num) <- c("value")
# Define Replacement patterns to specify the reported row names (estimate1-3 need to get proper labels)
replacement_patterns <- c(
"statistic" = "Spatial Hausman Statistic",
"p.value" = "p Value",
"parameter" = "Degrees of Freedom",
"method" = "Method")
# Apply replacements for row names
row.names(hausman_out_num) <- replacement_patterns[row.names(hausman_out_num)]
# Define the rows that contain strings (i.e., non-numeric values)
chr <- c("Method")
#write character rows into new matrix
hausman_out_chr <- as.data.frame(hausman_out_num[(rownames(hausman_out_num) %in% chr), , drop = FALSE])
#keep only numeric rows in numeric matrix
hausman_out_num <- as.data.frame(hausman_out_num[!(rownames(hausman_out_num) %in% chr), , drop = FALSE])
# coerce/cast data type to numeric, also round to 3 digits
hausman_out_num$value <- round(as.numeric(hausman_out_num$value),3)
# call the results:
# estimates in numeric rows
DT::datatable(
hausman_out_num,
options = list(caption = "Breusch-Pagan statistics for SDEM",
autoWidth = F,
searching = F, #disable search field
bPaginate = F, #hide pagination control
bLengthChange = F, #hide dropdown menu for list length
bInfo = F #hide information how many entries are shown
)) %>%
formatRound(1,digits = 3)
The p value is ≤ .05. This suggests that neither OLS nor SEM are appropriate to capture the effects in our spatial data as the model parameters differ to a great extent.
In my doctoral thesis, I put forward 13 hypotheses concerning fare accessibility, purchasing power and residential density (see literature section of my dissertation). Eleven of the hypotheses are tested at the stop level in this report, i.e. U1-U5, R1-R5, and G1. The remaining hypotheses G2 and G3 compare the goodness-of-fit of the models across the spatial levels, which is not possible at this level of analysis. Therefore, hypotheses G2 and G3 are being discussed in the results section of my thesis. All bivariate hypotheses are tested based on normalized input variables (see explanation above).
#define the variables for this Pseudo-R^2 and for the caption
regr <- "pt_index_zt"
dv <- regr
dv_en <- oevx_en
iv <- "purch_power_zt"
iv_en <- kk_en
iv_unit_en <- kk_unit_en
hyp <- 1
hyp_sdem <- paste0("u", hyp, "_sdem_zt")
hyp_sem <- paste0("u", hyp, "_sem_zt")
hyp_slx <- paste0("u", hyp, "_slx_zt")
hyp_ols <- paste0("u", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_u)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis U1 states that low purchasing power predicts high public transportation service. As the beta coefficient of the bivariate regression shows, the purchasing power (purch_power_zt) exerts an effect of -0.1 on the public transport index (pt_index_zt). This means that for an increase of 1 standard deviation of purch_power_zt at a given stop, we should expect a decrease of 0.1 standard deviations of pt_index_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is ***significant.
Hence, hypothesis U1 is verified.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "purch_power_zt"
dv <- regr
dv_en <- oevx_en
iv <- "pop_dens_zt"
iv_en <- pop_ha_en
iv_unit_en <- pop_ha_unit_en
hyp <- 2
hyp_sdem <- paste0("u", hyp, "_sdem_zt")
hyp_sem <- paste0("u", hyp, "_sem_zt")
hyp_slx <- paste0("u", hyp, "_slx_zt")
hyp_ols <- paste0("u", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_u)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis U2 states that high residential density predicts low purchasing power. As the beta coefficient of the bivariate regression shows, the residential density (pop_dens_zt) exerts an effect of -0.33 on the public transport index (purch_power_zt). This means that for an increase of 1 standard deviation of pop_dens_zt at a given stop, we should expect a decrease of 0.33 standard deviations of purch_power_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is ***significant.
Hence, hypothesis U2 is verified.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- oevx_en
iv <- "purch_power_zt"
iv_en <- kk_en
iv_unit_en <- kk_unit_en
hyp <- 3
hyp_sdem <- paste0("u", hyp, "_sdem_zt")
hyp_sem <- paste0("u", hyp, "_sem_zt")
hyp_slx <- paste0("u", hyp, "_slx_zt")
hyp_ols <- paste0("u", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_u)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis U3 states that low purchasing power predicts high fare accessibility. As the beta coefficient of the bivariate regression shows, the purchasing power (purch_power_zt) exerts an effect of -0.1 on the public transport index (fare_acc_zt). This means that for an increase of 1 standard deviation of purch_power_zt at a given stop, we should expect a decrease of 0.1 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is *significant.
Hence, hypothesis U3 is verified.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- t2_en
iv <- "dist_centre_zt"
iv_en <- km_zentrum_en
iv_unit_en <- km_zentrum_unit_en
hyp <- 4
hyp_sdem <- paste0("u", hyp, "_sdem_zt")
hyp_sem <- paste0("u", hyp, "_sem_zt")
hyp_slx <- paste0("u", hyp, "_slx_zt")
hyp_ols <- paste0("u", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_u)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis U4 states that a short distance to the next centre (=high spatial centrality) predicts high fare accessibility. As the beta coefficient of the bivariate regression shows, the distance to the next middle level center (dist_centre_zt) exerts an effect of -0.24 on the fare accessibility (fare_acc_zt). This means that for an increase of 1 standard deviation of dist_centre_zt at a given stop, we should expect a decrease of 0.24 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is, however, not significant.
Hence, hypothesis U4 is falsified and needs to be dismissed.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- t2_en
iv <- "cars_cap_zt"
iv_en <- pkw_en
iv_unit_en <- pkw_unit_en
hyp <- 5
hyp_sdem <- paste0("u", hyp, "_sdem_zt")
hyp_sem <- paste0("u", hyp, "_sem_zt")
hyp_slx <- paste0("u", hyp, "_slx_zt")
hyp_ols <- paste0("u", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_u)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis U5 states that high car availability predicts low fare accessibility. As the beta coefficient of the bivariate regression shows, the car availability (cars_cap_zt) exerts an effect of -0.24 on the fare accessibility (fare_acc_zt). This means that for an increase of 1 standard deviation of cars_cap_zt at a given stop, we should expect a decrease of 0.24 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is ***significant.
Hence, hypothesis U5 is verified.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "pt_index_zt"
dv <- regr
dv_en <- oevx_en
iv <- "purch_power_zt"
iv_en <- kk_en
iv_unit_en <- kk_unit_en
hyp <- 1
hyp_sdem <- paste0("r", hyp, "_sdem_zt")
hyp_sem <- paste0("r", hyp, "_sem_zt")
hyp_slx <- paste0("r", hyp, "_slx_zt")
hyp_ols <- paste0("r", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_r)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis R1 states that low purchasing power predicts low public transportation service. As the beta coefficient of the bivariate regression shows, the purchasing power (purch_power_zt) exerts an effect of 0 on the public transport index (pt_index_zt). This means that for an increase of 1 standard deviation of purch_power_zt at a given stop, we should expect an increase of 0 standard deviations of pt_index_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is positive. The regression estimate is, however, not significant.
Hence, hypothesis R1 is falsified and needs to be
dismissed.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "purch_power_zt"
dv <- regr
dv_en <- oevx_en
iv <- "pop_dens_zt"
iv_en <- pop_ha_en
iv_unit_en <- pop_ha_unit_en
hyp <- 2
hyp_sdem <- paste0("r", hyp, "_sdem_zt")
hyp_sem <- paste0("r", hyp, "_sem_zt")
hyp_slx <- paste0("r", hyp, "_slx_zt")
hyp_ols <- paste0("r", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_r)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis R2 states that high residential density predicts high purchasing power. As the beta coefficient of the bivariate regression shows, the residential density (pop_dens_zt) exerts an effect of -0.27 on the public transport index (purch_power_zt). This means that for an increase of 1 standard deviation of pop_dens_zt at a given stop, we should expect a decrease of 0.27 standard deviations of purch_power_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is ***significant.
Hence, hypothesis R2 is falsified and needs to be
dismissed.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- oevx_en
iv <- "purch_power_zt"
iv_en <- kk_en
iv_unit_en <- kk_unit_en
hyp <- 3
hyp_sdem <- paste0("r", hyp, "_sdem_zt")
hyp_sem <- paste0("r", hyp, "_sem_zt")
hyp_slx <- paste0("r", hyp, "_slx_zt")
hyp_ols <- paste0("r", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_r)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis R3 states that low purchasing power predicts low fare accessibility. As the beta coefficient of the bivariate regression shows, the purchasing power (purch_power_zt) exerts an effect of 0.15 on the public transport index (fare_acc_zt). This means that for an increase of 1 standard deviation of purch_power_zt at a given stop, we should expect an increase of 0.15 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is positive. The regression estimate is ***significant.
Hence, hypothesis R3 is verified.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- t2_en
iv <- "dist_centre_zt"
iv_en <- km_zentrum_en
iv_unit_en <- km_zentrum_unit_en
hyp <- 4
hyp_sdem <- paste0("r", hyp, "_sdem_zt")
hyp_sem <- paste0("r", hyp, "_sem_zt")
hyp_slx <- paste0("r", hyp, "_slx_zt")
hyp_ols <- paste0("r", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_r)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis R4 states that a short distance to the next centre (=high spatial centrality) predicts high fare accessibility. As the beta coefficient of the bivariate regression shows, the distance to the next middle level center (dist_centre_zt) exerts an effect of -0.02 on the fare accessibility (fare_acc_zt). This means that for an increase of 1 standard deviation of dist_centre_zt at a given stop, we should expect a decrease of 0.02 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is negative. The regression estimate is, however, not significant.
Hence, hypothesis R4 is falsified and needs to be
dismissed.
#define the variables for this Pseudo-R^2 and for the caption
regr <- "fare_acc_zt"
dv <- regr
dv_en <- t2_en
iv <- "cars_cap_zt"
iv_en <- pkw_en
iv_unit_en <- pkw_unit_en
hyp <- 5
hyp_sdem <- paste0("r", hyp, "_sdem_zt")
hyp_sem <- paste0("r", hyp, "_sem_zt")
hyp_slx <- paste0("r", hyp, "_slx_zt")
hyp_ols <- paste0("r", hyp, "_ols_zt")
##### Calculate Pseudo-R^2 for SDEM and SEM to list them in the table
# Note that the formula points to the dynamic r variable, i.e., oevx/t2/whatever I assign as the dependent variable in the file 1_make_models.R!
#Furthermore, the get(...) calls dynamically point to the variables I have defined above, depending on my dissertation hypotheses
#SDEM
psr2_sdem <- 1 - (get(hyp_sdem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sdem <- round(psr2_sdem,2)
##### SEM
psr2_sem <- 1 - (get(hyp_sem)$SSE / (var(stops_u[[regr]]) * (length(stops_u[[regr]]) - 1)))
psr2_sem <- round(psr2_sem,2)
# extract p value for hyp testing
#hyp_p <- coef(summary(get(hyp_sdem)))[iv, "Pr(>|z|)"]
hyp_p <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == iv), "Pr(>|z|)"]
# Generate the htmlreg table with custom rows for Pseudo-R^2^
model_table <- texreg::htmlreg(
list(get(hyp_sdem), get(hyp_sem), get(hyp_slx), get(hyp_ols)),
doctype = FALSE,
custom.model.names = model_names,
custom.gof.rows = list('Pseudo-R^2^' = c(psr2_sdem, psr2_sem, '', '')), # this one inserts two values and two empty strings into the table
reorder.gof = c(2:10, 1, 11:19), # reorder the R^2^ rows, i.e., insert row number 1 in-between row 10 (last R^2^ of the regular models) and row 11.
caption = paste0("Bivariate regression estimates of the four models<br>
Independent variable: ", iv_en, " (", iv, ")",
"<br>
Dependent variable: ", dv_en, " (", dv, ")<br>
The sample is stripped of outliers [(see boxplot above)](#bplot_r)."),
custom.note = cap_sig)
# Create a kableExtra table and set styles for all columns
model_table_styled <- kable(
model_table,
format = "html",
escape = FALSE
) %>%
kable_styling(full_width = FALSE) %>% # Set to TRUE if you want full-width
column_spec(1, extra_css = "text-align: right;") # Align all columns to the right
# Print or use the modified HTML table
print(model_table_styled)
x | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Hypothesis R5 states that high car availability predicts low fare accessibility. As the beta coefficient of the bivariate regression shows, the car availability (cars_cap_zt) exerts an effect of 0.1 on the fare accessibility (fare_acc_zt). This means that for an increase of 1 standard deviation of cars_cap_zt at a given stop, we should expect an increase of 0.1 standard deviations of fare_acc_zt. As the variables are normalized, we must not interpret them in their original units.
The bivariate regression estimator is positive. The regression estimate is ***significant.
Hence, hypothesis R5 is falsified and needs to be dismissed.
# define text chunks that explain G1 significance depending on the p values in the normalized model
#define model
hyp_sdem <- "sdem_zt"
# extract p value for hyp testing
g1_p_pop_dens <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "pop_dens_zt"), "Pr(>|z|)"]
g1_p_travel_time <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "travel_time_zt"), "Pr(>|z|)"]
g1_p_rent <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "rent_zt"), "Pr(>|z|)"]
g1_p_purch_power <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "purch_power_zt"), "Pr(>|z|)"]
g1_p_cars_cap <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "cars_cap_zt"), "Pr(>|z|)"]
g1_p_dist_centre <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "dist_centre_zt"), "Pr(>|z|)"]
g1_p_pt_index <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "pt_index_zt"), "Pr(>|z|)"]
g1_p_dwelling_area <- coef(summary(get(hyp_sdem)))[which(rownames(coef(summary(get(hyp_sdem)))) == "dwelling_area_zt"), "Pr(>|z|)"]
# write beta estimators into new variables
g1_b_pop_dens <- round(get("sdem_zt")[["coefficients"]][["pop_dens_zt"]],2)
g1_b_travel_time <- round(get("sdem_zt")[["coefficients"]][["travel_time_zt"]],2)
g1_b_rent <- round(get("sdem_zt")[["coefficients"]][["rent_zt"]],2)
g1_b_purch_power <- round(get("sdem_zt")[["coefficients"]][["purch_power_zt"]],2)
g1_b_cars_cap <- round(get("sdem_zt")[["coefficients"]][["cars_cap_zt"]],2)
g1_b_dist_centre <- round(get("sdem_zt")[["coefficients"]][["dist_centre_zt"]],2)
g1_b_pt_index <- round(get("sdem_zt")[["coefficients"]][["pt_index_zt"]],2)
g1_b_dwelling_area <- round(get("sdem_zt")[["coefficients"]][["dwelling_area_zt"]],2)
# derive text chunk that can be inserted inline
if (g1_p_pop_dens < sig3b ){g1_pop_dens_sig <- sig3} else if (g1_p_pop_dens < sig2b ){g1_pop_dens_sig <- sig2} else if (g1_p_pop_dens < sig1b ){g1_pop_dens_sig <- sig1} else {g1_pop_dens_sig <- sig0}
if (g1_p_travel_time < sig3b ){g1_travel_time_sig <- sig3} else if (g1_p_travel_time < sig2b ){g1_travel_time_sig <- sig2} else if (g1_p_travel_time < sig1b ){g1_travel_time_sig <- sig1} else {g1_travel_time_sig <- sig0}
if (g1_p_rent < sig3b ){g1_rent_sig <- sig3} else if (g1_p_rent < sig2b ){g1_rent_sig <- sig2} else if (g1_p_rent < sig1b ){g1_rent_sig <- sig1} else {g1_rent_sig <- sig0}
if (g1_p_purch_power < sig3b ){g1_purch_power_sig <- sig3} else if (g1_p_purch_power < sig2b ){g1_purch_power_sig <- sig2} else if (g1_p_purch_power < sig1b ){g1_purch_power_sig <- sig1} else {g1_purch_power_sig <- sig0}
if (g1_p_cars_cap < sig3b ){g1_cars_cap_sig <- sig3} else if (g1_p_cars_cap < sig2b ){g1_cars_cap_sig <- sig2} else if (g1_p_cars_cap < sig1b ){g1_cars_cap_sig <- sig1} else {g1_cars_cap_sig <- sig0}
if (g1_p_dist_centre < sig3b ){g1_dist_centre_sig <- sig3} else if (g1_p_dist_centre < sig2b ){g1_dist_centre_sig <- sig2} else if (g1_p_dist_centre < sig1b ){g1_dist_centre_sig <- sig1} else {g1_dist_centre_sig <- sig0}
if (g1_p_pt_index < sig3b ){g1_pt_index_sig <- sig3} else if (g1_p_pt_index < sig2b ){g1_pt_index_sig <- sig2} else if (g1_p_pt_index < sig1b ){g1_pt_index_sig <- sig1} else {g1_pt_index_sig <- sig0}
if (g1_p_dwelling_area < sig3b ){g1_dwelling_area_sig <- sig3} else if (g1_p_dwelling_area < sig2b ){g1_dwelling_area_sig <- sig2} else if (g1_p_dwelling_area < sig1b ){g1_dwelling_area_sig <- sig1} else {g1_dwelling_area_sig <- sig0}
# find out the highest absolute beta
# (the case_when clauses exclude non-significant betas)
g1_b_max <- max(c(
abs(case_when(
g1_p_pop_dens < sig1b ~ g1_b_pop_dens,
TRUE ~ NA_real_ # Ensure NA_real_ is used for numeric NAs
)),
abs(case_when(
g1_p_travel_time < sig1b ~ g1_b_travel_time,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_rent < sig1b ~ g1_b_rent,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_purch_power < sig1b ~ g1_b_purch_power,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_cars_cap < sig1b ~ g1_b_cars_cap,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_dist_centre < sig1b ~ g1_b_dist_centre,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_pt_index < sig1b ~ g1_b_pt_index,
TRUE ~ NA_real_
)),
abs(case_when(
g1_p_dwelling_area < sig1b ~ g1_b_dwelling_area,
TRUE ~ NA_real_
))
), na.rm = TRUE) # Ensure NA values are removed when calculating the max
# determine if density is the highest i.e. if we can confirm hyp G1
if(g1_b_pop_dens == g1_b_max) {
g1_decision <- T
} else {
g1_decision <- F
}
Hypothesis G1 refers to the multivariate model that is estimated for normalized variables. G1 states that the absolute beta coefficient of density is greater than any other beta. G1 can be tested by comparing the coefficients in the summary table for normalized variables. These are:
Density: 0.09 ***significant
Purchasing power: 0.09 ***significant
Rent: 0.06 **significant
Car availability: 0.08 ***significant
Distance to Centre: -0.04 , however, not significant
PT index: 0.02 , however, not significant
Travel time: -0.11 ***significant
Dwelling area: -0.04 **significant
Note that we’re comparing absolute effects. In other words, positive
and negative beta values are treated equally.
The comparison shows that population density does not have the highest absolute significant impact. Thus, Hypothesis G1 has to be dismissed.
Hypotheses G2 and G3 compare the goodness-of-fit of the models across the spatial levels, which is not possible at this level of analysis. Therefore, G2 and G3 are being discussed in the results section of my thesis.