Commit graph

33 commits

Author SHA1 Message Date
Arne Schlüter
adb0252130 Fix bug so one potential place following another directly is also matched 2015-01-17 17:12:10 +01:00
Arne Schlüter
73dee66d16 Look at title and text separately 2015-01-17 16:27:53 +01:00
Arne Schlüter
b85c84139b Start with improving the place results 2015-01-17 15:57:51 +01:00
Arne Schlüter
8e6032be3a Start messing around with part of speech tagging 2015-01-17 15:15:51 +01:00
Arne Schlüter
c4711310ce Merge branch 'master' of github.com:aesthaddicts/OpenData 2015-01-17 14:42:22 +01:00
Joshua Widmann
4ddd3aa376 swapped '…' with '...' and removed sqlite3 import. 2015-01-17 13:58:38 +01:00
Arne Schlüter
06814ecae8 Start writing some analyzaton code 2015-01-17 11:45:52 +01:00
Arne Schlüter
a9a096396b Install nltk for named entity extraction of places later on 2015-01-17 11:43:59 +01:00
Arne Schlüter
00f0643fff Add redis and rq as a dependency for incident analyzation 2015-01-17 11:42:38 +01:00
Arne Schlüter
7c4cf8f9f0 Clean up code, clarify and remove an unnecessary try-except-block 2014-12-11 00:29:08 +01:00
Arne Schlüter
98d1e21a90 Remove field 'addtional_place' because it can't be reliably parsed 2014-12-11 00:07:48 +01:00
Arne Schlüter
c1ac5e5ed4 Use peewee as model and rewrite the code 2014-12-10 23:56:27 +01:00
Arne Schlüter
0e095dbb63 Don't forget to close the connection 2014-12-08 20:40:13 +01:00
Arne Schlüter
36df116ed0 Crawl all pages and insert them into the database 2014-12-08 18:26:25 +01:00
Arne Schlüter
5306e6dab4 Merge branch 'dev-arne'
Conflicts:
	scraper/scraper.py
2014-12-08 16:20:08 +01:00
Arne Schlüter
55a599a47b Write insertion logic for articles 2014-12-08 16:19:18 +01:00
Joshua Widmann
c872c8e39d renamed article attributes according to database fields 2014-12-08 15:59:33 +01:00
Arne Schlüter
e018aead0e Add database setup code 2014-12-08 15:40:31 +01:00
Joshua Widmann
9ae38d3279 scraping all articles 2014-12-08 15:38:58 +01:00
Arne Schlüter
58f8e65669 Simplify code by removing :nth-of-type and accessing the list directly 2014-12-08 14:46:26 +01:00
Arne Schlüter
66b68b269d Normalize \r\n to \n and use date objects instead of the original date string 2014-12-08 14:40:05 +01:00
Arne Schlüter
7fe08227bb Add more info to the README 2014-11-17 18:32:05 +01:00
Arne Schlüter
0744167bde Implement basic article scraping 2014-11-17 18:25:46 +01:00
Arne Schlüter
2b83417f8e Add python-typic files and folders to .gitignore 2014-11-17 18:25:28 +01:00
Arne Schlüter
5fc4b7b095 Separate into different files 2014-11-17 14:37:41 +01:00
Arne Schlüter
c44c5a35b8 Add map overview of berlin with toner-lite map tiles 2014-11-17 14:36:03 +01:00
Arne Schlüter
b7f571b20f Set up basic HTML structure with necessary style rules 2014-11-17 14:26:33 +01:00
Arne Schlüter
c12c8b6dbe Add .editorconfig for code formatting 2014-11-11 07:43:19 +01:00
Arne Schlüter
fc54df1ca8 Update project description and readme 2014-11-10 15:32:37 +01:00
Arne Schlüter
d09c324c8d Remove data of bundeswahlleiter.de 2014-11-10 15:29:32 +01:00
Arne Schlüter
8323a82305 Add leaflet as our map service 2014-11-10 15:28:56 +01:00
Arne Schlüter
0f4b9d8805 Convert csv to UTF-8 2014-11-10 14:45:28 +01:00
Arne Schlüter
dbc50d8e58 Initial commit 2014-11-10 14:41:21 +01:00