This is an annex to my dissertation about public transport accessibility for low-income earners.
For an English account of the context, have a look at this JTranGeo paper:
DOI: 10.1016/j.jtrangeo.2025.104348
For a German account of the context, see my thesis fulltext (especially the method section 3):
DOI: 10.15480/882.13161
An English summary of my thesis can be found in the Fare Accessibility Dashboard: DOI:10.15480/882.13164
This README is a simple guide to the basic terms and the structure of the three data sets that I built for my spatial regressions. Unfortunately, I can’t provide all the variables because of the terms and conditions of some of the data sets. In any case, these data will hopefully be informative to anyone studying public transport accessibility and affordability.
You are welcome to re-use, adapt and share the data according to the Attribution-ShareAlike 4.0 International license.
Christoph Aberle
Hamburg University of Technology
Institute for Transport Planning and Logistics
ORCID: 0000-0003-0982-4869
christoph.aberle@tuhh.de
The TUHH email will be offline soon. If you use the dataset, or have a question, please drop me a line at christoph [at] fluegelrad [dot] net. I’m always curious to see what others get out of my data!
| Term | Explanation | Link |
|---|---|---|
| PT Authority | “Verkehrsverbund” that organises public transport, mainly in terms of planning and ticketing, in this data set either HVV or VBB | Wikipedia |
| HVV | PT Authority for Hamburg and surroundings | Wikipedia |
| VBB | PT Authority for Berlin and Brandenburg | Wikipedia |
| Municipality | Lowest level of official territorial division (“Gemeinde”) | Wikipedia |
| Statistisches Gebiet (Hamburg) | One of 941 areas that are used by the official social monitoring (“RISE / Rahmenprogramm Integrierte Stadtteilentwicklung”) | Paper that provides a summary |
| Planungsraum (Berlin) | One of 447 areas that are used by urban planners and others | Paper that works with these areas (e.g. Fig. 4 & 7) |
| AGS / Amtlicher Gemeindeschlüssel | Official Municipality Key | Wikipedia |
| EPSG | Spatial Reference System Identifier for my geodata | Wikipedia |
I provide two geopackage files:
fare_accessibility_hvv.gpkg – SRID = EPSG 25832fare_accessibility_vbb.gpkg – SRID = EPSG 25833Each of these files contains three layers that represents different levels:
stopmunicipalitygridAll data refer to the state 12/2018 or to the 2018/19 timetable.
This level contains data for public transport stops in the HVV and VBB service areas.
For aggregates and violin plots of the input variables, see the model reports (HVV_Stop.html and VBB_Stop.html).
| Column | Data Type | Explanation | Link / Source |
|---|---|---|---|
| id | INT | stop id, primary key | |
| name | VARCHAR | stop name | |
| ur_int | INT | case area: 1 = Hamburg / 2 = Berlin / 3 = HVV outside Hamburg / 4 = VBB outside Berlin | |
| ags | VARCHAR | Official Municipality Key | Wikipedia |
| bestmode | INT | the ‘best’ means of transport that departs here, as per capacity: 1 = local and regional railways (SPNV, U-Bahn) / 2 = buses and trams / 3 = ferries | |
| cells | BIGINT | count of populated raster cells within 800m radius | |
| outlier | INT | 1 = stop is an outlier of the whole data set (beyond ± 1.5 IQR) | Wikipedia, section 3.3 of my thesis (in German) |
| outlier_u | INT | as above, for the urban subset | as above |
| outlier_r | INT | as above, for the rural subset | as above |
| rs17 | INT | spatial type of the surrounding municipality according to the federal RegioStaR typology | RegioStaR handbook |
| rs_class | INT | spatial class: 1 = urban / 0 = rural | based on the RegioStaR typology, within Berlin and Hamburg based on local typologies, see section 3.1 of my thesis (in German) |
| rs_hh | VARCHAR | spatial type of the Statistisches Gebiet: c = city (central business district) / i = inner town / z = in-between zone (“Zwischenzone”) / s = fringe (“Stadtrand”) / g = industrial (“Gewerbe”) / l = rural (“Ländliches Hamburg”; not to confuse with the value of rs_class!) This attribute is only available for Hamburg. For Berlin, the urban/rural information is simply coded within the rs_class attribute). |
based on Gesa Matthes’ 2010 typology, updated to the state of 12/2018, see Annex A3 of my thesis (in German) |
| km_centre | NUMERIC | distance to the next centre in km, calculation based on official documents, see section 3.2 of my thesis (in German) | |
| ptx | NUMERIC | public transport service index (no unit), aggregated from grid level (median of populated cells within 800m radius) | see Aberle et al. 2025 section 3.2.2, inspired by Delbosc&Currie (2011) |
| ptx_cap | NUMERIC | as above, but the median of per-capita ptx of all grid cells within 800m radius (i.e. for each cell, ptx was divided by number of residents) | |
| t1 | NUMERIC | fare accessibility on a €1.70 budget (where applicable), ln’ed and normalised and weighted | see Aberle & Gertz 2025 |
| t2 | NUMERIC | fare accessibility on a €2.30 budget, ln’ed and normalised and weighted | as above |
| t2sum | NUMERIC | fare accessibility on a €2.30 budget, absolute and weighted for Lorenz curves (I summed up all weighted and non-ln’ed destinations across 15 categories e.g. 0.19 · grocery store count + 0.15 · doctors count …) | the weights can be found in Aberle & Gertz 2025, table 3 |
| ttime | INT | travel time to the next destination (minutes; weighted average across 15 categories), see section 3.2 of my thesis (in German) | |
| standardized variables | |||
| km_centre_zt | NUMERIC | as above, normalised to MEAN=0 and SD=1 | Wikipedia |
| ptx_zt | NUMERIC | -”- | -”- |
| ptx_cap_zt | NUMERIC | -”- | -”- |
| t1_zt | NUMERIC | -”- | -”- |
| t2_zt | NUMERIC | -”- | -”- |
| ttime_zt | NUMERIC | -”- | -”- |
| geom | GEOMETRY (POINT) | note that HVV and VBB have different SRIDs |
This level contains data for municipalities in the HVV and VBB service areas.
For the cities of Hamburg and Berlin, I’ve complemented the dataset with geometries for the statistical areas (Hamburg: Statistische Gebiete / Berlin: Planungsraum, see Basic Terms above.
For aggregates and violin plots of the input variables, see the model reports (HVV_Municipality.html and VBB_Municipality.html).
| Column | Data Type | Explanation | Link / Source |
|---|---|---|---|
| agsx | VARCHAR | Municipality Key (“Amtlicher Gemeindeschlüssel”). Within Hamburg and Berlin: Followed by a ‘-’ and the id of the Statistical Area, primary key | Wikipedia |
| name | VARCHAR | name | |
| ur_int | INT | case area: 1 = Hamburg / 2 = Berlin / 3 = HVV outside Hamburg / 4 = VBB outside Berlin | cells |
| outlier | INT | 1 = grid cell is an outlier of the whole data set (beyond ± 1.5 IQR) | Wikipedia, section 3.3 of my thesis (in German) |
| outlier_u | INT | as above, for the urban subset | as above |
| outlier_r | INT | as above, for the rural subset | as above |
| rs17 | INT | spatial type of the municipality according to the federal RegioStaR typology | RegioStaR handbook |
| rs_class | INT | spatial class: 1 = urban / 0 = rural | based on the RegioStaR typology, within Berlin and Hamburg based on local typologies, see section 3.1 of my thesis (in German) |
| rs_hh | VARCHAR | spatial type of the Statistisches Gebiet: c = city (central business district) / i = inner town / z = in-between zone (“Zwischenzone”) / s = fringe (“Stadtrand”) / g = industrial (“Gewerbe”) / l = rural (“Ländliches Hamburg”; not to confuse with the value of rs_class!) This attribute is only available for Hamburg. For Berlin, the urban/rural information is simply coded within the rs_class attribute). |
based on Gesa Matthes’ 2010 typology, updated to the state of 12/2018, see Annex A3 of my thesis (in German) |
| km_centre | NUMERIC | distance to the next centre in km (median of populated 100m grid cells within the municipality) | |
| ptx | NUMERIC | public transport service index (no unit; median of populated 100m grid cells within the municipality) | see Aberle et al. 2025 section 3.2.2, inspired by Delbosc&Currie (2011) |
| ptx_cap | NUMERIC | as above, but the median of per-capita ptx of all populated 100m grid cells within the municipality (i.e. for each cell, ptx was divided by number of residents) |
|
| t1 | NUMERIC | fare accessibility on a €1.70 budget (where applicable; median of populated 100m grid cells within the municipality) | see Aberle & Gertz 2025 |
| t2 | NUMERIC | fare accessibility on a €2.30 budget (median of populated 100m grid cells within the municipality) | as above |
| t2sum | NUMERIC | fare accessibility on a €2.30 budget, absolute and weighted for Lorenz curves (I summed up all weighted and non-ln’ed destinations across 15 categories e.g. 0.19 · grocery store count + 0.15 · doctors count …) | the weights can be found in Aberle & Gertz 2025, table 3 |
| ttime | INT | travel time to the next destination (minutes; weighted average across 15 categories; median of populated 100m grid cells within the municipality), see section 3.2 of my thesis (in German) | |
| standardized variables | |||
| km_centre_zt | NUMERIC | as above, normalised to MEAN=0 and SD=1 | Wikipedia |
| ptx_zt | NUMERIC | -”- | -”- |
| ptx_cap_zt | NUMERIC | -”- | -”- |
| t1_zt | NUMERIC | -”- | -”- |
| t2_zt | NUMERIC | -”- | -”- |
| ttime_zt | NUMERIC | -”- | -”- |
| geom | GEOMETRY (MULTIPOLYGON) | note that HVV and VBB have different SRIDs |
This level contains data for populated grid cells in the HVV and VBB service areas.
For aggregates and violin plots of the input variables, see the model reports (HVV_Grid.html and VBB_Grid.html).
| Column | Data Type | Explanation | Link / Source |
|---|---|---|---|
| gitter_id_500m | VARCHAR | INSPIRE 500 grid id, primary key | EU INSPIRE directive |
| ur_int | INT | case area: 1 = Hamburg / 2 = Berlin / 3 = HVV outside Hamburg / 4 = VBB outside Berlin | |
| outlier | INT | 1 = grid cell is an outlier of the whole data set (beyond ± 1.5 IQR) | Wikipedia, section 3.3 of my thesis (in German) |
| outlier_u | INT | as above, for the urban subset | as above |
| outlier_r | INT | as above, for the rural subset | as above |
| rs17 | INT | spatial type of the municipality according to the federal RegioStaR typology (mode of populated 100m grid cells within the 500m grid cell, i.e. the value that appeared most often) | RegioStaR handbook |
| rs_class | INT | spatial class: 1 = urban / 0 = rural | based on the RegioStaR typology, within Berlin and Hamburg based on local typologies, see section 3.1 of my thesis (in German) |
| rs_hh | VARCHAR | spatial type of the Statistisches Gebiet: c = city (central business district) / i = inner town / z = in-between zone (“Zwischenzone”) / s = fringe (“Stadtrand”) / g = industrial (“Gewerbe”) / l = rural (“Ländliches Hamburg”; not to confuse with the rural value of rs_class!) This attribute is only available for Hamburg in the HVV data set. For Berlin, the urban/rural information is simply coded within the rs_class attribute). |
based on Gesa Matthes’ 2010 typology, updated to the state of 12/2018, for details see table in Annex A3 of my thesis (in German) |
| km_centre | NUMERIC | distance to the next centre in km (median of populated 100m grid cells within the 500m grid cell) | |
| ptx | NUMERIC | public transport service index (no unit; median of populated 100m grid cells within the 500m grid cell) | see Aberle et al. 2025 section 3.2.2, inspired by Delbosc&Currie (2011) |
| ptx_cap | NUMERIC | as above, but the median of per-capita ptx of all populated 100m grid cells within the 500m grid cell (i.e. for each cell, ptx was divided by number of residents) |
|
| t1 | NUMERIC | fare accessibility on a €1.70 budget (where applicable; median of populated 100m grid cells within the 500m grid cell) | see Aberle & Gertz 2025 |
| t2 | NUMERIC | fare accessibility on a €2.30 budget (median of populated 100m grid cells within the 500m grid cell) | as above |
| t2sum | NUMERIC | fare accessibility on a €2.30 budget, absolute and weighted for Lorenz curves (I summed up all weighted and non-ln’ed destinations across 15 categories e.g. 0.19 · grocery store count + 0.15 · doctors count …) | the weights can be found in Aberle & Gertz 2025, table 3 |
| ttime | INT | travel time to the next destination (minutes; weighted average across 15 categories; median of populated 100m grid cells within the 500m grid cell), see section 3.2 of my thesis (in German) | |
| standardized variables | |||
| km_centre_zt | NUMERIC | as above, normalised to MEAN=0 and SD=1 | Wikipedia |
| ptx_zt | NUMERIC | -”- | -”- |
| ptx_cap_zt | NUMERIC | -”- | -”- |
| t1_zt | NUMERIC | -”- | -”- |
| t2_zt | NUMERIC | -”- | -”- |
| ttime_zt | NUMERIC | -”- | -”- |
| geom | GEOMETRY (POLYGON) | note that HVV and VBB have different SRIDs |
If you use the data, or have a question, please drop me a line
at christoph [at] fluegelrad [dot] net.
Have fun with the data, enjoy your ride. Geld allein macht auch nicht glücklich. Aber irgendwie schon besser, im Taxi zu weinen als im HVV-Bus.
% %
# *
/. &
# & /, *
, *
* &
_. ** & &
( %&
# &%
&
.% &
*#&&&
& %
& (
% & %
% & , /
, #( (. #
& #
& & */ &
& #, .& .#&#%#
& &
. *
.(
& / .
& #
. . ( .(/ %
% ________________ &
& | | /
& | Made with love | .
, | at TUHH | &
& |________________| &
% | &
& | (
,& O * # &
% & ( # / &.&
& # % & .%& & # ./
# & ( /
&% . * % &
& (*. (& & .&
% % /.( % (
# .
/ &
#& .&%,.#*