March 2008
Objective
This tutorial outlines the basics of how to create KML from Comma Separated Value (CSV) data using Python. CSV data is one of the most ubiquitous file formats in use today. Most spreadsheets and databases can both read and write CSV files. Its simple format can be edited in a text editor. Many programming languages, such as Python, have special libraries to read and write CSV files. Therefore it makes a great medium for exchanging large quantities of data.
While
the code samples in this tutorial are in Python, they can be
adapted to most other
programming languages. This tutorial uses code from Geocoding
Addresses for Use in KML to turn an
address into longitude/latitude coordinates. It also uses the
new <ExtendedData>
element of KML
2.2, and takes advantage of the balloon templating outlined in Adding
Custom Data. As such, the KML produced is not currently
supported in Google Maps or other KML-consuming applications, but the
code can be adapted to
produce Maps-compatible KML.
Sample Data
For this tutorial, use the google-addresses.csv file as a sample CSV file. This file has all the addresses, phone numbers, and fax numbers of the various US Google offices. Here's the text of the file:
Office,Address1,Address2,Address3,City,State,Zip,Phone,Fax Headquarters,1600 Amphitheatre Parkway,,,Mountain View,CA,94043,650-253-0000,650-253-0001 New York Sales & Engineering Office,76 Ninth Avenue,,,New York,NY,10011,212-565-0000,212-565-0001 Ann Arbor Sales Office,201 South Division Street,,,Ann Arbor,MI,48104,734-332-6500,734-332-6501 Atlanta Sales & Engineering Office,10 10th Street NE,,,Atlanta,GA,30309,404-487-9000,404-487-9001 Boulder Sales & Engineering Office,2590 Pearl St.,,,Boulder,CO,80302,303-245-0086,303-535-5592 Cambridge Sales & Engineering Office,5 Cambridge Center,,,Cambridge,MA,02142,617-682-3635,617-249-0199 Chicago Sales & Engineering Office,20 West Kinzie St.,,,Chicago,IL,60610,312-840-4100,312-840-4101 Coppell Sales Office,701 Canyon Drive,,,Coppell,TX,75019,214-451-4000,214-451-4001 Detroit Sales Office,114 Willits Street,,,Birmingham,MI,48009,248-351-6220,248-351-6227 Irvine Sales & Engineering Office,19540 Jamboree Road,,,Irvine,CA,92612,949-794-1600,949-794-1601 Pittsburgh Engineering Office,4720 Forbes Avenue,,,Pittsburgh,PA,15213,, Santa Monica Sales & Engineering Office,604 Arizona Avenue,,,Santa Monica,CA,90401,310-460-4000,310-309-6840 Seattle Engineering Office,720 4th Avenue,,,Kirkland,WA,98033,425-739-5600,425-739-5601 Seattle Sales Office,501 N. 34th Street,,,Seattle,WA,98103,206-876-1500,206-876-1501 Washington D.C. Public Policy Office,1001 Pennsylvania Avenue NW,,,Washington,DC,20004,202-742-6520,
Notice how each line is a series of text strings separated by commas.
Each comma delimits a field; each line has the same number of commas.
The first line contains the names of the fields in order. For instance,
the
first block of text in each row is the "Office" field, the second
"Address1", etc. Python can turn that into a collection of dicts
called a DictReader
which allows you to
step through each row. This code sample relies on you knowing
beforehand the structure of your data, but you could add in
some basic handlers to pass the field structure dynamically.
Parsing the CSV File
Python's xml.dom.minidom
module provides
great tools for creating XML
documents, and since KML is XML, you'll use it pretty heavily in this
tutorial. You create an element with createElement
or
createElementNS
, and append to another element with appendChild
.
These are the steps for parsing the CSV file and creating a
KML file.
- Import geocoding_for_kml.py into your module.
- Create a
DictReader
for the CSV files. TheDictReader
is a collection ofdicts
, one for each row. - Create the document using Python's
xml.dom.minidom.Document()
. - Create the root
<kml>
element usingcreateElementNS.
- Append it to the document
.
- Create a
<Document>
element usingcreateElement
. - Append it to the
<kml>
element usingappendChild
. - For each row, create a
<Placemark>
element, and append it to the<Document>
element. - For each column in each row, create an
<ExtendedData>
element and append it to the<Placemark>
element you created in step 8. - Create a
<Data>
element, and append it to the<ExtendedData>
element. Give the<Data>
element an attribute of name, and assign it the value of the column name usingsetAttribute
. - Create a
<value>
element and append it to the<Data>
element. Create a text node, and assign it the value of the column usingcreateTextNode
. Append the text node to the<value>
element. - Create a
<Point>
element and append it to the<Placemark>
element. Create a<coordinates>
element and append it to the<Point>
element. - Extract the address from the row so that it is one single
string in
this
format: Address1,Address2,City,State,Zip. So the first row
would be
1600 Amphitheater Parkway,,Mountain View,CA,94043
. It's OK if there are commas next to each other. Note, to do this requires prior knowledge of the structure of the CSV file and which columns constitute the address. - Geocode the address using the geocoding_for_kml.py code explained in Geocoding Addresses for Use in KML. This returns a string which is the longitude and latitude of the location.
- Create a text node and assign it the
value of
the coordinates in step 14, then append it to the
<coordinates>
element. - Write the KML document to a file.
- If you pass a list of column names as arguments to the
script, the script will add elements in that order. If we did not care
about the order of the elements, we could user
dict.keys()
to produce alist
. However,dict.keys()
does not preserve the original order from the document. To use this argument, pass in the field names list as a comma separated list, like this:python csvtokml.py Office,Address1,Address2,Address3,City,State,Zip,Phone,Fax
Sample Python Code
Sample code for creating a KML file from a CSV file using Python 2.2 is shown below. You can also download it here.
import geocoding_for_kml
import csv
import xml.dom.minidom
import sys
def extractAddress(row):
# This extracts an address from a row and returns it as a string. This requires knowing
# ahead of time what the columns are that hold the address information.
return '%s,%s,%s,%s,%s' % (row['Address1'], row['Address2'], row['City'], row['State'], row['Zip'])
def createPlacemark(kmlDoc, row, order):
# This creates aelement for a row of data.
# A row is a dict.
placemarkElement = kmlDoc.createElement('Placemark')
extElement = kmlDoc.createElement('ExtendedData')
placemarkElement.appendChild(extElement)
# Loop through the columns and create a element for every field that has a value.
for key in order:
if row[key]:
dataElement = kmlDoc.createElement('Data')
dataElement.setAttribute('name', key)
valueElement = kmlDoc.createElement('value')
dataElement.appendChild(valueElement)
valueText = kmlDoc.createTextNode(row[key])
valueElement.appendChild(valueText)
extElement.appendChild(dataElement)
pointElement = kmlDoc.createElement('Point')
placemarkElement.appendChild(pointElement)
coordinates = geocoding_for_kml.geocode(extractAddress(row))
coorElement = kmlDoc.createElement('coordinates')
coorElement.appendChild(kmlDoc.createTextNode(coordinates))
pointElement.appendChild(coorElement)
return placemarkElement
def createKML(csvReader, fileName, order):
# This constructs the KML document from the CSV file.
kmlDoc = xml.dom.minidom.Document()
kmlElement = kmlDoc.createElementNS('http://earth.google.com/kml/2.2', 'kml')
kmlElement.setAttribute('xmlns','http://earth.google.com/kml/2.2')
kmlElement = kmlDoc.appendChild(kmlElement)
documentElement = kmlDoc.createElement('Document')
documentElement = kmlElement.appendChild(documentElement)
# Skip the header line.
csvReader.next()
for row in csvReader:
placemarkElement = createPlacemark(kmlDoc, row, order)
documentElement.appendChild(placemarkElement)
kmlFile = open(fileName, 'w')
kmlFile.write(kmlDoc.toprettyxml(' ', newl = '\n', encoding = 'utf-8'))
def main():
# This reader opens up 'google-addresses.csv', which should be replaced with your own.
# It creates a KML file called 'google.kml'.
# If an argument was passed to the script, it splits the argument on a comma
# and uses the resulting list to specify an order for when columns get added.
# Otherwise, it defaults to the order used in the sample.
if len(sys.argv) >1: order = sys.argv[1].split(',')
else: order = ['Office','Address1','Address2','Address3','City','State','Zip','Phone','Fax']
csvreader = csv.DictReader(open('google-addresses.csv'),order)
kml = createKML(csvreader, 'google-addresses.kml', order)
if __name__ == '__main__':
main()
Sample KML Created
A sample of the KML that this script creates is shown below.
Notice how some<value>
elements have only whitespace in them. That's because the field didn't
have any data in it. You can also download the full sample here.
<?xml version="1.0" encoding="utf-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<Placemark>
<ExtendedData>
<Data name="Office">
<value>
Headquarters
</value>
</Data>
<Data name="Address1">
<value>
1600 Amphitheater Parkway
</value>
</Data>
<Data name="City">
<value>
Mountain View
</value>
</Data>
<Data name="State">
<value>
CA
</value>
</Data>
<Data name="Zip">
<value>
94043
</value>
</Data>
<Data name="Phone">
<value>
650-253-0000
</value>
</Data>
<Data name="Fax">
<value>
650-253-0001
</value>
</Data>
</ExtendedData>
<Point>
<coordinates>
-122.081783,37.423111
</coordinates>
</Point>
</Placemark>
...
Screen shot
Below is a screenshot of what that KML file looks like in Google Earth.
Because each<Placemark>
element has no <BalloonStyle><text>
and no <description>
element,
the balloon defaults to a table style, drawing on the <Data>
elements.
Geocoding Consideration
This was mentioned in "Geocoding Addresses for Use in KML,"
but it bears repeating. Your geocoding requests will be
subject to the geocoder's maximum
query rate and 15,000 queries per day based on your IP. Additionally, a
status code of 620
will be returned by the geocoder if you query it faster than it can
handle. (A full list of status codes is
available
here.)
To ensure you don't send queries too rapidly to the geocoder, you can
specify a delay between each geocode request. You can increase this
delay each time you receive a 620
status, and use a while
loop to ensure you've
successfully geocoded an address before iterating to the next one. This
means that if your CSV file is very large, you may have to either
modify the geocoding code, or keep track of how fast you are creating
Placemarks and slow that down if you are going too fast.
Conclusion
Now you can use Python to create a KML file from a CSV file. Using the
code supplied, the KML file will only work in Google Earth. You
can modify it to work in both Maps and Earth using
<description>
instead of <ExtendedData>
.
It is also easy to convert this code sample to any
other programming languages that provide XML support.
Now that you've finished converting all your CSV files to KML, you might want to check out other KML articles, such as Using PHP and MySQL to create KML and the Google Developer Guide article on ExtendedData, Adding Custom Data.