60

I downloaded a google-spreadsheet as an object in python.

How can I use openpyxl use the workbook without having it to save to disk first?

I know that xlrd can do this by:

book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read()) 

with "downloaded_spreadsheet" being my downloaded xlsx-file as an object.

Instead of xlrd, I want to use openpyxl because of better xlsx-support(I read).

I'm using this so far...

#!/usr/bin/python import openpyxl import xlrd # which to use..? import re, urllib, urllib2 class Spreadsheet(object): def __init__(self, key): super(Spreadsheet, self).__init__() self.key = key class Client(object): def __init__(self, email, password): super(Client, self).__init__() self.email = email self.password = password def _get_auth_token(self, email, password, source, service): url = "https://www.google.com/accounts/ClientLogin" params = { "Email": email, "Passwd": password, "service": service, "accountType": "HOSTED_OR_GOOGLE", "source": source } req = urllib2.Request(url, urllib.urlencode(params)) return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0] def get_auth_token(self): source = type(self).__name__ return self._get_auth_token(self.email, self.password, source, service="wise") def download(self, spreadsheet, gid=0, format="xls"): url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i" headers = { "Authorization": "GoogleLogin auth=" + self.get_auth_token(), "GData-Version": "3.0" } req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers) return urllib2.urlopen(req) if __name__ == "__main__": email = "[email protected]" # (your email here) password = '.....' spreadsheet_id = "......" # (spreadsheet id here) # Create client and spreadsheet objects gs = Client(email, password) ss = Spreadsheet(spreadsheet_id) # Request a file-like object containing the spreadsheet's contents downloaded_spreadsheet = gs.download(ss) # book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read(), formatting_info=True) #It works.. alas xlrd doesn't support the xlsx-funcionality that i want... #i.e. being able to read the cell-colordata.. 

I hope anyone can help because I'm struggling for months to get the color-data from given cell in google-spreadsheet. (I know the google-api doesn't support it..)

3 Answers 3

110

In the docs for load_workbook it says:

#:param filename: the path to open or a file-like object 

..so it was capable of it all the time. It reads a path or takes a file-like object. I only had to convert my file-like object returned by urlopen, to a bytestream with:

from io import BytesIO wb = load_workbook(filename=BytesIO(input_excel.read())) 

and I can read every piece of data in my Google-spreadsheet.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 - Made a similar mistake. I read only the first half and thought it can only read files. Now I went back and read it completely and saw that it can do file-like objects as well.
21

I was looking to load a file from an URL and here is what I came up with:

util:

from openpyxl import load_workbook from io import BytesIO import urllib def load_workbook_from_url(url): file = urllib.request.urlopen(url).read() return load_workbook(filename = BytesIO(file)) 

usage:

import openpyxl_extended book = openpyxl_extended.load_workbook_from_url('https://storage.googleapis.com/pnbx-cdn/pen-campaign/campaigner-template-fr.xlsx') 

1 Comment

Great answer, clear and reusable.
-17

Actually enough is to:

file = open('path/to/file.xlsx', 'rb') wb = openpyxl.load_workbook(filename=file) 

and it will work. No need for BytesIO and stuff.

2 Comments

It's not being read from the file system as the question indicates. It's a stream.
This would read from a file saved in the disk, not from memory.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.