Difference between revisions of "Nominatim for Traffic Accident Database"
(→Exporting Accident Data to CVS) |
(→Exporting Accident Data to CVS) |
||
Line 509: | Line 509: | ||
I exported 3 databases: | I exported 3 databases: | ||
− | * Accidentes - this DB had about 536k entries for all of Chile | + | * Accidentes - this DB had about 536k entries for all of Chile (csv ~ 208 MB) |
− | * Personas | + | * Personas (csv ~ 147 MB) |
− | * Vehiculos | + | * Vehiculos (csv ~ 123 MB) |
and | and | ||
− | * Categoria Ingreso (in CLP per Comuna, year 200x) | + | * Categoria Ingreso (in CLP per Comuna, year 200x) (csv ~ 9KB) |
Revision as of 13:21, 9 March 2015
>> return to Cedeus IDE
Objective
Develop a script for automated geo-referencing of Chiles transit accident database.
Contents
- 1 Tools
- 2 Installing Nominatim
- 2.1 Setup of Nominatim VM
- 2.2 Setup of Nominatim itself
- 2.2.1 Install needed libraries
- 2.2.2 Edit PostreSQL settings
- 2.2.3 First Nominatim Installation
- 2.2.4 Creating postgres accounts
- 2.2.5 Nominatim module reading permissions
- 2.2.6 Import and index OSM data
- 2.2.7 Add special phrases
- 2.2.8 Set up the website
- 2.2.9 Load Chile Dataset
- 2.2.10 Updates of the Database
- 3 Using Nominatim
- 4 Exporting Accident Data to CVS
Tools
- Nominatim
- MS-Access / LibreOffice : data comes in an MS Access DB from Carabineros de Chile
- iPython + Notebook
Installing Nominatim
- Nominatim installation manual: https://wiki.openstreetmap.org/wiki/Nominatim/Installation
Setup of Nominatim VM
Setup of VM based on Basic Ubuntu1404 VM
- copying basicubunutu1404.vdi file and renaming to nominatim.vdi
- creating the nominatim VM on CedeusDB (ip.18):
-
VBoxManage createvm --name nominatim --ostype Ubuntu_64 --register
-
VBoxManage modifyvm nominatim --memory 4096
-
VBoxManage modifyvm nominatim --cpus 4
-
VBoxManage modifyvm nominatim --nic1 nat
-
VBoxManage storagectl nominatim --name "SATA Controller" --add sata --controller IntelAhci
- assign the (old) disk image a new uuid
-
VBoxManage internalcommands sethduuid nominatim.vdi
-
- attach the (old) disk image:
-
VBoxManage storageattach "nominatim" --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium nominatim.vdi
-
-
VBoxManage storagectl nominatim --name "IDE Controller" --add ide --controller PIIX4
- set the nat rules (ports):
-
VBoxManage modifyvm nominatim --natpf1 "ssh,tcp,,20022,,22"
-
VBoxManage modifyvm nominatim --natpf1 "apache,tcp,,20080,,80"
-
VBoxManage modifyvm nominatim --natpf1 "pgsql,tcp,,20432,,5432"
-
-
- optional - set VRDE port:
-
VBoxManage modifyvm nominatim --vrdeport 7761
(Note: 3389 is the default port anyway)
-
- check all settings with
VBoxManage showvminfo nominatim
- start the VM
-
VBoxHeadless -s nominatim --vrde on &
- the VM will listen on port 7761 - connect with GTKvncViewer to check
- check if ssh connection works
-
- change the computers name in /etc/hosts and /etc/hostname
- restarting Apache gives the error message: "apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 ..."
- => solve this by adding ServerName nominatim as the last line in /etc/apache2/apache2.conf file
Resize the Nominatim VM
- shut the VM down to resize the HDD if necessary
- check the HDD size:
VBoxManage showhdinfo nominatim.vdi
returns:
UUID: e148d758-00ac-4b60-a067-d5c97316069d Parent UUID: base</br> State: created Type: normal (base) Location: /home/ssteinig/nominatim.vdi Storage format: VDI Format variant: dynamic default Capacity: 100000 MBytes Size on disk: 6224 MBytes In use by VMs: nominatim (UUID: be795392-3786-4f3b-9714-a9445f91855d)
- so I resized the disk to 200GB as, according to Nominatim infos the disk size for a full planet is 700GB
-
VBoxManage modifyhd nominatim.vdi --resize 200000
-
- to make this resizing effective I actually have to change the partitions:
- create a zip-copy of the vdi file,
- download & start the VM with Ubuntu Desktop(!) 14.04 iso
- choose the "Try Ubuntu" mode
- use the program "gparted" to adjust/extend the partition's size
- to be able to enlarge "extended" partition, on needs to click on the key symbol and chose "deactivate", as described for instance here: http://www.howtogeek.com/114503/how-to-resize-your-ubuntu-partitions/
- remove the "iso" and restart the Vm; check size if the disks is really 200GB now
- => Well it turned out that I used LVM (Logical Volume Manager) so the new space is not taken into account. Some more stuff needs to be adjusted!
- after a bit of reading here: http://www.rootusers.com/how-to-increase-the-size-of-a-linux-lvm-by-expanding-the-virtual-machine-disk/ and here: http://www.thegeekstuff.com/2010/08/how-to-create-lvm/ and checking the status with commands such as pvscan, pvdisplay and lvdisplay; I figured that I simply need to extend the size of the logical volume (see also http://serverfault.com/questions/501895/used-vgextend-lvextend-to-add-addtional-8gb-space-but-it-is-not-reflected-in-df ) This is done with lvextend - so I used:
-
sudo lvextend -l +100%FREE /dev/basicubuntu1404-vg/root
-
sudo resize2fs /dev/mapper/basicubuntu1404--vg-root
- => this extended the disk to 187GB (see with
sudo lvdisplay
). The second command is needed to have this really available (see withdf -h
orsudo fdisk -l
)
Setup of Nominatim itself
Install needed libraries
=> see https://wiki.openstreetmap.org/wiki/Nominatim/Installation#Software
sudo apt-get install build-essential libxml2-dev libgeos-dev libpq-dev libbz2-dev libtool automake libproj-dev sudo apt-get install gcc proj-bin libgeos-c1 git osmosis libgeos++-dev sudo apt-get install php5 php-pear php5-pgsql php5-json php-db instead of: >> sudo apt-get install postgresql postgis postgresql-contrib postgresql-9.1-postgis postgresql-server-dev-9.1 use: >> sudo apt-get install postgresql postgis postgresql-contrib postgresql-9.3-postgis-2.1 postgresql-server-dev-9.3 sudo apt-get install libprotobuf-c0-dev protobuf-c-compiler
and perhaps
sudo apt-get install git autoconf-archive
Edit PostreSQL settings
perhaps edit Postgres settings (before data import) sudo nano /etc/postgresql/9.3/main/postgresql.conf
=> see https://wiki.openstreetmap.org/wiki/Nominatim/Installation#Software
My machine has only 4GB, so I changed:
- shared_buffers = 128MB => 512MB
- work_mem = 50MB (previously uncommented, and set to 1MB)
- maintenance_work_mem = 2GB (previously commented, and set to 16MB) //=> This value should be reduced again afterwards to avoid swapping when autovacuum runs.
- effective_cache_size = 2GB (previously commented, and set to 128MB)
- synchronous_commit = off (previously commented, and set to on)
- checkpoint_segments = 100 (previously commented, and set to 3)
- checkpoint_timeout = 10min (previously commented, and set to 5min)
- checkpoint_completion_target = 0.9 (previously commented, and set to 0.5)
and for initial import only:
- fsync = off (previously commented, and set to 'on')
- full_page_writes = off (previously commented, and set to 'on')
- => switch both values back to 'on' after import, to avoid database corruption.
Note, I did a restart of the postgresql service after to have the settings applied.
First Nominatim Installation
Get the source code
I used the last stable release 2.3.1:
wget http://www.nominatim.org/release/Nominatim-2.3.1.tar.bz2 tar xvf Nominatim-2.3.1.tar.bz2
and rename the folder
mv Nominatim-2.3.1 Nominatim
Compiling
now compile the software
cd Nominatim ./configure make
Note, the warning about missing lua libraries can be ignored. Nominatim does not make use of osm2pgsql's lua extension.
Configuration
create the configuration file local.php in the sub folder settings with the following content:
<?php // Paths @define('CONST_Postgresql_Version', '9.3'); @define('CONST_Postgis_Version', '2.1'); // Website settings //@define('CONST_Website_BaseURL', 'http://mysite/nominatim/'); @define('CONST_Website_BaseURL', 'http://146.155.17.18:20080/nominatim/');
for further params see the file settings/settings.php.
Download (optional) data
Wikipedia rankings (downloading took me 20+ mins @PUC)
Wikipedia can be used as an optional auxiliary data source to help indicate the importance of osm features. Nominatim will work without this information but it will improve the quality of the results if this is installed. This data is available as a binary download.
wget --output-document=data/wikipedia_article.sql.bin http://www.nominatim.org/data/wikipedia_article.sql.bin wget --output-document=data/wikipedia_redirect.sql.bin http://www.nominatim.org/data/wikipedia_redirect.sql.bin
(=> requires to be in the Nominatim folder). wikipedia_article.sql.bin had a size of 1.2GB and redirect 240MB. Combined the 2 files are around 1.5GB and add around 30GB to the install size of nominatim. They also increase the install time by an hour or so.
Creating postgres accounts
Creating the importer account
The import needs to be done with a postgres superuser with the same name as the account doing the import.
sudo -u postgres createuser -s <your user name>
You should ensure that this user can log in to the database without requiring a password (e.g. using ident authentication). This is the default on most distributions. See trust authentication for more information. (sstein: it looks like it is not mean ident but trust identification... see below)
Some more PostgreSQL Access configs
Note: To have access from outside, I also changed in /etc/postgresql/9.3/main/postgresql.conf':
listen_adresses = '*'
And in /etc/postgresql/9.3/main/pg_hba.conf I did set
local all all peer
to
local all all ident
as well as:
host all all 127.0.0.1/32 md5
to
host all all 0.0.0.0 0.0.0.0 md5
(perhaps I also need to modify /etc/postgresql/9.3/main/pg_ident.conf at some point, if I want to map a system user name to a certain db user name)
Do a sudo service postgresql restart
Then I logged in and changed my ssteinig user password this way for the standard db postgres:
sudo -u ssteinig psql postgres alter user ssteinig with password 'myNewPassWord';
Now I tested if I can connect to the DB:
- connection from command line to DB postgres:
psql -U ssteinig -d postgres -h localhost
- => This worked, but it asked me for a password. I thought this should not happen after setting local to ident?
- However, when I changed from ident/md5 to trust, I could login without password:
# "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 0.0.0.0 0.0.0.0 trust # IPv6 local connections: host all all ::1/128 trust
- got this from here: http://suite.opengeo.org/4.1/dataadmin/pgGettingStarted/firstconnect.html
- connect from outside (cedeusgis1) with pgAdmin (ssteinig, db: postgres, 20432)
- => This worked as well.
Create website user
Create the website user www-data as a PostgreSQL database role
createuser -SDR www-data
For the installation process, you must have this user. If you want to run the website under another user, see comment in section Set up the website. You must not run the import as user www-data or root.
Nominatim module reading permissions
Some Nominatim Postgres functions are implemented in the nominatim.so C module that was compiled in one of the earlier steps. In order for these functions to be successfully created, PostgreSQL server process has to be able to read the module file. Make sure that directory and file permissions allow the file to be read. For example, if you downloaded and compiled Nominatim in your home directory, you will need to issue the following commands:
chmod +x ~/Nominatim chmod +x ~/Nominatim/module
Note, the instructions say chmod +x ~/src/Nominatim
...but this must be for the installation from git source?
Import and index OSM data
First download a Planet File or a planet extract, for example from Geofabrik. Using a file in PBF format is recommended. I downloaded the Luxemburg Dataset for testing and then Chile. I used a cache of 1.2GB
Now start the import:
./utils/setup.php --osm-file /home/ssteinig/luxembourg-2015_2mar.osm.pbf --all --osm2pgsql-cache 1200 2>&1 | tee setup.log
=> This will create a new DB "nominatim" (with PostGIS extension enabled automatically). Importing the wikipedia articles was most time consuming. Together approx. 1 hour.
ToDo afterwards : reverse data import settings for PostgreSQL (see above).
Add special phrases
Add country codes and country names to the search index:
./utils/specialphrases.php --countries > specialphrases_countries.sql psql -d nominatim -f specialphrases_countries.sql
If you want to be able to search for special amenities like pubs in Dublin, you need to import special phrases from this wiki like this:
./utils/specialphrases.php --wiki-import > specialphrases.sql psql -d nominatim -f specialphrases.sql
This may be repeated from time to time when there are changes in the wiki. There is no need to repeat it after each update.
If you do not need phrases for all languages, edit utils/specialphrases.php and delete unneeded languages at the beginning of the file.
Set up the website
The following instructions will make Nominatim available at http://localhost/nominatim - or - http://146.155.17.18:22022/nominatim
Create the directory for the website and make sure it is writable by you and readable by apache:
sudo mkdir -m 755 /var/www/html/nominatim sudo chown <your username> /var/www/html/nominatim
Populate the website directory with the necessary symlinks:
./utils/setup.php --create-website /var/www/html/nominatim
You will need to make sure settings/local.php is configured with correct values for CONST_Website_BaseURL. e.g.
@define('CONST_Website_BaseURL', 'http://146.155.17.18.20080/nominatim'); //or: @define('CONST_Website_BaseURL', 'http://localhost/nominatim');
=> did this already above.
Setting up Apache
Make sure your Apache configuration (i.e. /etc/apache2/sites-enabled/000-default in a standard Ubuntu/Debian installation) contains the following settings for the directory:
<Directory "/var/www/nominatim/"> Options FollowSymLinks MultiViews AddType text/html .php </Directory>
So my 0000-default.conf file looks now like this:
<VirtualHost *:80 146.155.17.18:20022> # The ServerName directive sets the request scheme, hostname and port that # the server uses to identify itself. This is used when creating # redirection URLs. In the context of virtual hosts, the ServerName # specifies what hostname must appear in the request's Host: header to # match this virtual host. For the default virtual host (this file) this # value is not decisive as it is used as a last resort host regardless. # However, you must set it for any further virtual host explicitly. #ServerName www.example.com ServerAdmin webmaster@localhost DocumentRoot /var/www/html # Available loglevels: trace8, ..., trace1, debug, info, notice, warn, # error, crit, alert, emerg. # It is also possible to configure the loglevel for particular # modules, e.g. #LogLevel info ssl:warn ErrorLog ${APACHE_LOG_DIR}/error.log CustomLog ${APACHE_LOG_DIR}/access.log combined <Directory "/var/www/html/nominatim/"> Options FollowSymLinks MultiViews AddType text/html .php </Directory> # For most configuration files from conf-available/, which are # enabled or disabled at a global level, it is possible to # include a line for only one particular virtual host. For example the # following line enables the CGI configuration for this host only # after it has been globally disabled with "a2disconf". #Include conf-available/serve-cgi-bin.conf </VirtualHost> # vim: syntax=apache ts=4 sw=4 sts=4 sr noet
After making changes in the apache config restart apache.
sudo apache2ctl graceful
Note: The name of the website user is hard-coded into Nominatim. In most Linux distributions, apache will run as www-data, so this will work without any further modifications. If your web server runs under a different name (e.g. in Fedora and CentOS apache runs as user apache), simply alter the name of the www-data user in postgresql after the import has finished, e.g. psql -d nominatim -c 'ALTER USER "www-data" RENAME TO "apache".
Test Webpage
Now go to: http://146.155.17.18:20080/nominatim/ to see if there is an OSM map shown. Yey!
I tested the Luxemburg dataset with writing "Pfaffenthal" into the text field. Nominatim found two places with that name: (i) city district, and (ii) suburb.
Load Chile Dataset
The setup.php command does not work a second time and stops with "ERROR: database already exists (pgsql://@/nominatim)". So one can not load 2 country datasets this way, as described here: http://github.com/twain47/Nominatim/issues/224
An option to load a second country is using the update function (but this is very slow... maybe importing all of South America is even faster when deleting the db and using setup again). So, I download from http://download.geofabrik.de/south-america.html as xml/osm.bz2 file, since "update" requires xml.
add Chile via update:
./utils/update.php --import-file /home/ssteinig/chile-2014_5mar.osm.bz2 --osm2pgsql-cache 1200 2>&1 | tee setupchile.log
then run:
./utils/update.php --index
Perhaps use option --index-instances for indexing, for running with multiple threads. Doing the update with these two commands is really really slow: 15:25-20:15 (4.7h) + 3.5h for indexing - for a 120 MB osm.bz2 file.
Updates of the Database
Haven't done this so far, but see: http://wiki.openstreetmap.org/wiki/Nominatim/Installation#Updates
Using Nominatim
Geocoding
Geocoding can return none, one, or many results. In the examples below it is returned only one result. Example 3 shows output for 2 results.
Example 1
- "135 pilkington avenue, birmingham"
- result in xml format (can be also html|xml|json|jsonv2) : html will return the map-web page
- with polygon geometry : polygon_geojson=1
- with address details : Include a breakdown of the address into elements : addressdetails=1
http://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_geojson=1&addressdetails=1
=> For more search options, see http://wiki.openstreetmap.org/wiki/Nominatim
For instance, search can be restricted to a country with e.g. countrycodes=cl or countrycodes=de
Answer in XML
<searchresults timestamp="Thu, 05 Mar 15 19:50:26 +0000" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright" querystring="135 pilkington avenue, birmingham" polygon="true" exclude_place_ids="73723099" more_url="http://nominatim.openstreetmap.org/search?format=xml&exclude_place_ids=73723099&accept-language=en-US,en;q=0.5&polygon=1&addressdetails=1&q=135+pilkington+avenue%2C+birmingham"> <place place_id="73723099" osm_type="way" osm_id="90394480" place_rank="30" boundingbox="52.5487473,52.5488481,-1.8165129,-1.8163463" geojson="{"type":"Polygon","coordinates":[[[-1.8165129,52.5487566],[-1.8164912,52.548824], [-1.8164684,52.5488213],[-1.8164598,52.5488481], [-1.8163463,52.5488346],[-1.8163716,52.5487561], [-1.8164289,52.5487629],[-1.8164339,52.5487473], [-1.8165129,52.5487566]]]}" lat="52.5487921" lon="-1.8164307339635" display_name="135, Pilkington Avenue, Castle Vale, Maney, Birmingham, West Midlands, England, B72 1LH, United Kingdom" class="building" type="yes" importance="0.411"> <house_number>135</house_number> <road>Pilkington Avenue</road> <suburb>Castle Vale</suburb> <hamlet>Maney</hamlet> <city>Birmingham</city> <state_district>West Midlands</state_district> <state>England</state> <postcode>B72 1LH</postcode> <country>United Kingdom</country> <country_code>gb</country_code> </place> </searchresults>
Answer in JSON
[{"place_id":"73723099", "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright", "osm_type":"way", "osm_id":"90394480", "boundingbox":["52.5487473","52.5488481","-1.8165129","-1.8163463"], "lat":"52.5487921","lon":"-1.8164307339635", "display_name":"135, Pilkington Avenue, Castle Vale, Maney, Birmingham, West Midlands, England, B72 1LH, United Kingdom", "class":"building", "type":"yes", "importance":0.411, "address":{"house_number":"135", "road":"Pilkington Avenue", "suburb":"Castle Vale", "hamlet":"Maney", "city":"Birmingham", "state_district":"West Midlands", "state":"England", "postcode":"B72 1LH", "country":"United Kingdom", "country_code":"gb"}, "geojson":{"type":"Polygon","coordinates":[[ [-1.8165129,52.5487566], [-1.8164912,52.548824], [-1.8164684,52.5488213], [-1.8164598,52.5488481], [-1.8163463,52.5488346], [-1.8163716,52.5487561], [-1.8164289,52.5487629], [-1.8164339,52.5487473], [-1.8165129,52.5487566]]] } }]
Example 2 for Chile
- "Av. Holanda 2607, Providencia"
- result in json format: format=json
- restriction to Addresses in Chile: countrycodes=cl
with OSM Nominatim
http://nominatim.openstreetmap.org/search?q=av.+holanda+2607,+providencia&format=json&countrycodes=cl&polygon_geojson=1&addressdetails=1
with Own Nominatim
http://146.155.17.18:20080/nominatim//search?q=av.+holanda+2607,+providencia&format=json&countrycodes=cl&polygon_geojson=1&addressdetails=1
Answer in JSON:
[{"place_id":"142577360", "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright", "osm_type":"node", "osm_id":"1245479266", "boundingbox":["-33.442225304494","-33.442125304494","-70.597784810112","-70.597684810112"], "lat":"-33.4421753044944", "lon":"-70.5977348101124", "display_name":"2607, Avenida Holanda, Providencia, Provincia de Santiago, XIII Regi\u00f3n Metropolitana de Santiago, 7500000, Chile", "class":"place", "type":"house", "importance":0.511, "address":{"house_number":"2607", "road":"Avenida Holanda", "suburb":"Providencia", "city":"Providencia", "county":"Provincia de Santiago", "state":"XIII Regi\u00f3n Metropolitana de Santiago", "postcode":"7500000", "country":"Chile", "country_code":"cl"} }]
Example 3 - Luxemburg with 2 Results
Request
http://146.155.17.18:20080/nominatim/search?q=Rue+de+Neudorf+27,+Luxembourg&format=json
Answer with 2 results
The 2 results have a distinct place-id and a different location, so they are indeed different (see the maps).
Note that the first result is in first place, due to its higher importance value.
[ {"place_id":"212907", "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright", "osm_type":"way", "osm_id":"315314115", "boundingbox":["49.6162725","49.6163748","6.1485573","6.1487066"], "lat":"49.6163233","lon":"6.14863161551865", "display_name":"27, Rue de Neudorf, Neudorf, Neudorf-Weimershof, Luxembourg, Canton Luxembourg, District Luxembourg, 1257, Luxemburg", "class":"place", "type":"house", "importance":0.531}, {"place_id":"102284", "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright", "osm_type":"node", "osm_id":"360916144", "boundingbox":["49.5341927","49.5342927","5.9915497","5.9916497"], "lat":"49.5342427","lon":"5.9915997", "display_name":"27, Rue de Neudorf, Mondercange, Canton Esch-sur-Alzette, District Luxembourg, 3937, Luxemburg", "class":"place", "type":"house", "importance":0.511} ]
Reverse Geocoding
TODO ... see https://wiki.openstreetmap.org/wiki/Nominatim#Reverse_Geocoding_.2F_Address_lookup
Exporting Accident Data to CVS
The transit accident data come in an MS Access Database.
Fortunately I had MS Access installed. In case not, there seems also to be a way to get it loaded into LibreOffice/OpenOffice. This is described here: http://askubuntu.com/questions/187389/is-it-possible-to-open-an-access-2010-database-file-without-using-wine-or-virtua/519571#519571
A guide on how to export from Access to CVS is here: http://support.spatialkey.com/export-data-from-database-to-csv-file/ In short the process is as follows:
- select the table and chose "External Data" tab
- chose export to "Text File" which will start a wizard
- put a name, but change ending from "txt" to "csv"! (ending may need to be necessary to able to change the encoding to UTF-8 later), then click "ok" ... which brings us to a otpions panel.
- set options:
- delimiter ";"
- first line with column names
- no colons to indicate text/string fields
- click on "Advanced..." button in the lower left and chose UTF-8 encoding for the export
I exported 3 databases:
- Accidentes - this DB had about 536k entries for all of Chile (csv ~ 208 MB)
- Personas (csv ~ 147 MB)
- Vehiculos (csv ~ 123 MB)
and
- Categoria Ingreso (in CLP per Comuna, year 200x) (csv ~ 9KB)