Converting CSV files to KML

Mano Marks, Google Geo APIs Team
March 2008

Objective

This tutorial outlines the basics of how to create KML from Comma Separated Value (CSV) data using Python. CSV data is one of the most ubiquitous file formats in use today. Most spreadsheets and databases can both read and write CSV files. Its simple format can be edited in a text editor. Many programming languages, such as Python, have special libraries to read and write CSV files. Therefore it makes a great medium for exchanging large quantities of data.

While the code samples in this tutorial are in Python, they can be adapted to most other programming languages. This tutorial uses code from Geocoding Addresses for Use in KML to turn an address into longitude/latitude coordinates. It also uses the new <ExtendedData> element of KML 2.2, and takes advantage of the balloon templating outlined in Adding Custom Data. As such, the KML produced is not currently supported in Google Maps or other KML-consuming applications, but the code can be adapted to produce Maps-compatible KML.

Sample Data

For this tutorial, use the google-addresses.csv file as a sample CSV file. This file has all the addresses, phone numbers, and fax numbers of the various US Google offices. Here's the text of the file:

Office,Address1,Address2,Address3,City,State,Zip,Phone,Fax
Headquarters,1600 Amphitheatre Parkway,,,Mountain View,CA,94043,650-253-0000,650-253-0001
New York Sales & Engineering Office,76 Ninth Avenue,,,New York,NY,10011,212-565-0000,212-565-0001
Ann Arbor Sales Office,201 South Division Street,,,Ann Arbor,MI,48104,734-332-6500,734-332-6501
Atlanta Sales & Engineering Office,10 10th Street NE,,,Atlanta,GA,30309,404-487-9000,404-487-9001
Boulder Sales & Engineering Office,2590 Pearl St.,,,Boulder,CO,80302,303-245-0086,303-535-5592
Cambridge Sales & Engineering Office,5 Cambridge Center,,,Cambridge,MA,02142,617-682-3635,617-249-0199
Chicago Sales & Engineering Office,20 West Kinzie St.,,,Chicago,IL,60610,312-840-4100,312-840-4101
Coppell Sales Office,701 Canyon Drive,,,Coppell,TX,75019,214-451-4000,214-451-4001
Detroit Sales Office,114 Willits Street,,,Birmingham,MI,48009,248-351-6220,248-351-6227
Irvine Sales & Engineering Office,19540 Jamboree Road,,,Irvine,CA,92612,949-794-1600,949-794-1601
Pittsburgh Engineering Office,4720 Forbes Avenue,,,Pittsburgh,PA,15213,,
Santa Monica Sales & Engineering Office,604 Arizona Avenue,,,Santa Monica,CA,90401,310-460-4000,310-309-6840
Seattle Engineering Office,720 4th Avenue,,,Kirkland,WA,98033,425-739-5600,425-739-5601
Seattle Sales Office,501 N. 34th Street,,,Seattle,WA,98103,206-876-1500,206-876-1501
Washington D.C. Public Policy Office,1001 Pennsylvania Avenue NW,,,Washington,DC,20004,202-742-6520,

Notice how each line is a series of text strings separated by commas. Each comma delimits a field; each line has the same number of commas. The first line contains the names of the fields in order. For instance, the first block of text in each row is the "Office" field, the second "Address1", etc. Python can turn that into a collection of dicts called a DictReader which allows you to step through each row. This code sample relies on you knowing beforehand the structure of your data, but you could add in some basic handlers to pass the field structure dynamically.

Parsing the CSV File

Python's xml.dom.minidom module provides great tools for creating XML documents, and since KML is XML, you'll use it pretty heavily in this tutorial. You create an element with createElement or createElementNS, and append to another element with appendChild. These are the steps for parsing the CSV file and creating a KML file.

  1. Import geocoding_for_kml.py into your module.
  2. Create a DictReader for the CSV files. The DictReader is a collection of dicts, one for each row.
  3. Create the document using Python's xml.dom.minidom.Document().
  4. Create the root <kml> element using createElementNS.
  5. Append it to the document.
  6. Create a <Document> element using createElement.
  7. Append it to the <kml> element using appendChild.
  8. For each row, create a <Placemark> element, and append it to the <Document> element.
  9. For each column in each row, create an <ExtendedData> element and append it to the <Placemark> element you created in step 8.
  10. Create a <Data> element, and append it to the <ExtendedData> element. Give the <Data> element an attribute of name, and assign it the value of the column name using setAttribute.
  11. Create a <value> element and append it to the <Data> element. Create a text node, and assign it the value of the column using createTextNode. Append the text node to the <value> element.
  12. Create a <Point> element and append it to the <Placemark> element. Create a <coordinates> element and append it to the <Point> element.
  13. Extract the address from the row so that it is one single string in this format: Address1,Address2,City,State,Zip. So the first row would be 1600 Amphitheater Parkway,,Mountain View,CA,94043. It's OK if there are commas next to each other. Note, to do this requires prior knowledge of the structure of the CSV file and which columns constitute the address.
  14. Geocode the address using the geocoding_for_kml.py code explained in Geocoding Addresses for Use in KML. This returns a string which is the longitude and latitude of the location.
  15. Create a text node and assign it the value of the coordinates in step 14, then append it to the <coordinates> element.
  16. Write the KML document to a file.
  17. If you pass a list of column names as arguments to the script, the script will add elements in that order. If we did not care about the order of the elements, we could user dict.keys() to produce a list. However, dict.keys() does not preserve the original order from the document. To use this argument, pass in the field names list as a comma separated list, like this:
    python csvtokml.py Office,Address1,Address2,Address3,City,State,Zip,Phone,Fax

Sample Python Code

Sample code for creating a KML file from a CSV file using Python 2.2 is shown below. You can also download it here.


import geocoding_for_kml
import csv
import xml
.dom.minidom
import sys


def extractAddress
(row):
  # This extracts an address from a row and returns it as a string. This requires knowing
  # ahead of time what the columns are that hold the address information.
  return '%s,%s,%s,%s,%s' % (row['Address1'], row['Address2'], row['City'], row['State'], row['Zip'])

def createPlacemark(kmlDoc, row, order):
  # This creates a  element for a row of data.
  # A row is a dict.
  placemarkElement = kmlDoc.createElement('Placemark')
  extElement = kmlDoc.createElement('ExtendedData')
  placemarkElement.appendChild(extElement)
  
  # Loop through the columns and create a  element for every field that has a value.
  for key in order:
    if row[key]:
      dataElement = kmlDoc.createElement('Data')
      dataElement.setAttribute('name', key)
      valueElement = kmlDoc.createElement('value')
      dataElement.appendChild(valueElement)
      valueText = kmlDoc.createTextNode(row[key])
      valueElement.appendChild(valueText)
      extElement.appendChild(dataElement)
  
  pointElement = kmlDoc.createElement('Point')
  placemarkElement.appendChild(pointElement)
  coordinates = geocoding_for_kml.geocode(extractAddress(row))
  coorElement = kmlDoc.createElement('coordinates')
  coorElement.appendChild(kmlDoc.createTextNode(coordinates))
  pointElement.appendChild(coorElement)
  return placemarkElement

def createKML(csvReader, fileName, order):
  # This constructs the KML document from the CSV file.
  kmlDoc = xml.dom.minidom.Document()
  
  kmlElement = kmlDoc.createElementNS('http://earth.google.com/kml/2.2', 'kml')
  kmlElement.setAttribute('xmlns','http://earth.google.com/kml/2.2')
  kmlElement = kmlDoc.appendChild(kmlElement)
  documentElement = kmlDoc.createElement('Document')
  documentElement = kmlElement.appendChild(documentElement)

  # Skip the header line.
  csvReader.next()
  
  for row in csvReader:
    placemarkElement = createPlacemark(kmlDoc, row, order)
    documentElement.appendChild(placemarkElement)
  kmlFile = open(fileName, 'w')
  kmlFile.write(kmlDoc.toprettyxml('  ', newl = '\n', encoding = 'utf-8'))

def main():
  # This reader opens up 'google-addresses.csv', which should be replaced with your own.
  # It creates a KML file called 'google.kml'.
  
  # If an argument was passed to the script, it splits the argument on a comma
  # and uses the resulting list to specify an order for when columns get added.
  # Otherwise, it defaults to the order used in the sample.
  
  if len(sys.argv) >1: order = sys.argv[1].split(',')
  else: order = ['Office','Address1','Address2','Address3','City','State','Zip','Phone','Fax']
  csvreader = csv.DictReader(open('google-addresses.csv'),order)
  kml = createKML(csvreader, 'google-addresses.kml', order)
if __name__ == '__main__':
  main()

Sample KML Created

A sample of the KML that this script creates is shown below. Notice how some<value> elements have only whitespace in them. That's because the field didn't have any data in it. You can also download the full sample here.

<?xml version="1.0" encoding="utf-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
  <Document>
    <Placemark>
      <ExtendedData>
        <Data name="Office">
          <value>
            Headquarters
          </value>
        </Data>
        <Data name="Address1">
          <value>
            1600 Amphitheater Parkway
          </value>
        </Data>
        <Data name="City">
          <value>
            Mountain View
          </value>
        </Data>
        <Data name="State">
          <value>
            CA
          </value>
        </Data>
        <Data name="Zip">
          <value>
            94043
          </value>
        </Data>
        <Data name="Phone">
          <value>
            650-253-0000
          </value>
        </Data>
        <Data name="Fax">
          <value>
            650-253-0001
          </value>
        </Data>
      </ExtendedData>
      <Point>
        <coordinates>
          -122.081783,37.423111
        </coordinates>
      </Point>
    </Placemark>
    ...

Screen shot

Below is a screenshot of what that KML file looks like in Google Earth. Because each<Placemark> element has no <BalloonStyle><text> and no <description> element, the balloon defaults to a table style, drawing on the <Data> elements.

Screenshot of KML created by this script

Geocoding Consideration

This was mentioned in "Geocoding Addresses for Use in KML," but it bears repeating. Your geocoding requests will be subject to the geocoder's maximum query rate and 15,000 queries per day based on your IP. Additionally, a status code of 620 will be returned by the geocoder if you query it faster than it can handle. (A full list of status codes is available here.) To ensure you don't send queries too rapidly to the geocoder, you can specify a delay between each geocode request. You can increase this delay each time you receive a 620 status, and use a while loop to ensure you've successfully geocoded an address before iterating to the next one. This means that if your CSV file is very large, you may have to either modify the geocoding code, or keep track of how fast you are creating Placemarks and slow that down if you are going too fast.

Conclusion

Now you can use Python to create a KML file from a CSV file. Using the code supplied, the KML file will only work in Google Earth. You can modify it to work in both Maps and Earth using <description> instead of <ExtendedData>. It is also easy to convert this code sample to any other programming languages that provide XML support.

Now that you've finished converting all your CSV files to KML, you might want to check out other KML articles, such as Using PHP and MySQL to create KML and the Google Developer Guide article on ExtendedData, Adding Custom Data.

Back to top