pyexcel Documentation

Release 0.6.0

Onni Software Ltd.

Aug 08, 2018

Contents

1 Introduction 3

2 Installation 5

3 Usage 7

4 Tutorial 9

4.1 One liners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 Stream APIs for big ﬁle : A set of two liners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3 Pyexcel-io Plugin guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 For web developer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Pyexcel data renderers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.7 Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.8 Working with databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Cook book 41

5.1 Recipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2 Loading from other sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Real world cases 49

6.1 Questions and Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.2 How to inject csv data to database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 API documentation 53

7.1 API Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.2 Internal API reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8 Developer’s guide 121

8.1 Developer’s guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8.2 How to log pyexcel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3 Packaging with PyInstaller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8.4 How to write a plugin for pyexcel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9 Change log 127

9.1 Migrate away from 0.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9.2 Migrate from 0.2.x to 0.3.0+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9.3 Migrate from 0.2.1 to 0.2.2+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

9.4 Migrate from 0.1.x to 0.2.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

9.5 Change log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

10 Indices and tables 141

Bibliography 143

pyexcel Documentation, Release 0.6.0

Author

3. Wang

Source code http://github.com/pyexcel/pyexcel.git

Issues http://github.com/pyexcel/pyexcel/issues

License New BSD License

Development 0.6.0

Released 0.5.7

Generated Aug 08, 2018

Note: The documentation of pyexcel v0.6.0 is under review and rewrite. If you have missed some information, please

read v0.5.3

Contents 1

pyexcel Documentation, Release 0.6.0

2 Contents

CHAPTER 1

Introduction

pyexcel provides single application programming interface(API) to read, write and manipulate data in different excel

ﬁle formats, in different storage media(disk, memory, database) and in different python data structures. Its loosely

coupled architecture makes it extremely extensible.

The idea originated from the common usability problem: when an excel ﬁle driven web application is delivered for

non-developer users (ie: team assistant, human resource administrator etc). The fact is that not everyone knows (or

cares) about the differences between various excel formats: csv, xls, xlsx are all the same to them. Instead of training

those users about ﬁle formats, this library helps web developers to handle most of the excel ﬁle formats by providing

a common programming interface. To add a speciﬁc excel ﬁle format type to you application, all you need is to install

an extra pyexcel plugin. Hence no code changes to your application and no issues with excel ﬁle formats any more.

Looking at the community, this library and its associated ones try to become a small and easy to install alternative to

Pandas.

pyexcel Documentation, Release 0.6.0

4 Chapter 1. Introduction

CHAPTER 2

Installation

You can install pyexcel via pip:

$ pip install pyexcel

or clone it and install it:

$ git clone https://github.com/pyexcel/pyexcel.git

$ cd pyexcel

$ python setup.py install

For individual excel ﬁle formats, please install them as you wish:

Table 1: A list of ﬁle formats supported by external plugins

Package name Supported ﬁle formats Dependencies Python versions

pyexcel-io csv, csvz [#f1]_, tsv, tsvz [#f2]_ 2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy

pyexcel-xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt same as above

pyexcel-xlsx xlsx openpyxl same as above

pyexcel-ods3 ods pyexcel-ezodf, lxml 2.6, 2.7, 3.3, 3.4 3.5, 3.6

pyexcel-ods ods odfpy same as above

Table 2: Dedicated ﬁle reader and writers

Package name Supported ﬁle formats Dependencies Python versions

pyexcel-xlsxw xlsx(write only) XlsxWriter Python 2 and 3

pyexcel-xlsxr xlsx(read only) lxml same as above

pyexcel-odsr read only for ods, fods lxml same as above

pyexcel-htmlr html(read only) lxml,html5lib same as above

pyexcel Documentation, Release 0.6.0

Table 3: Other data renderers

Package

name

Supported ﬁle formats Depen-

dencies

Python versions

pyexcel-text write only:rst, mediawiki, html, latex, grid, pipe, orgtbl, plain

simple read only: ndjson r/w: json

tabulate 2.6, 2.7, 3.3, 3.4

3.5, 3.6, pypy

pyexcel-

handsontable

handsontable in html hand-

sontable

same as above

pyexcel-

pygal

svg chart pygal 2.7, 3.3, 3.4, 3.5

3.6, pypy

pyexcel-

sortable

sortable table in html csvtotable same as above

pyexcel-gantt gantt chart in html frappe-

gantt

except pypy, same

as above

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv,

you can have different plugins per virtual environment. In the situation where you have multiple plugins that does

the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example,

pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(. . . ,

library=’pyexcel-odsr’).

For compatibility tables of pyexcel-io plugins, please click here

Table 4: Plugin compatibility table

pyexcel pyexcel-io pyexcel-text pyexcel-handsontable pyexcel-pygal pyexcel-gantt

0.6.0+ 0.4.0+ 0.2.6+ 0.0.1 0.0.1 0.0.1

0.5.0+ 0.4.0+ 0.2.6+ 0.0.1 0.0.1 0.0.1

0.4.0+ 0.3.0+ 0.2.5

Table 5: a list of support ﬁle formats

ﬁle format deﬁnition

csv comma separated values

tsv tab separated values

csvz a zip ﬁle that contains one or many csv ﬁles

tsvz a zip ﬁle that contains one or many tsv ﬁles

xls a spreadsheet ﬁle format created by MS-Excel 97-2003 [#f1]_

xlsx MS-Excel Extensions to the Ofﬁce Open XML SpreadsheetML File Format. [#f2]_

xlsm an MS-Excel Macro-Enabled Workbook ﬁle

ods open document spreadsheet

fods ﬂat open document spreadsheet

json java script object notation

html html table of the data structure

simple simple presentation

rst rStructured Text presentation of the data

mediawiki media wiki table

6 Chapter 2. Installation

CHAPTER 3

Usage

Suppose you want to process the following excel data :

Here are the example usages:

>>> import pyexcel as pe

>>> records = pe.iget_records(file_name="your_file.xls")

>>> for record in records:

... print("%s is aged at %d" % (record['Name'], record['Age']))

Adam is aged at 28

Beatrice is aged at 29

Ceri is aged at 30

Dean is aged at 26

>>> pe.free_resources()

pyexcel Documentation, Release 0.6.0

8 Chapter 3. Usage

CHAPTER 4

Tutorial

4.1 One liners

This section shows you how to get data from your excel ﬁles and how to export data to excel ﬁles in one line

4.1.1 One liner to get data from the excel ﬁles

Get a list of dictionaries

Suppose you want to process the following coffee data (data source coffee chart on the center for science in

the public interest):

Let’s get a list of dictionary out from the xls ﬁle:

>>> records = p.get_records(file_name="your_file.xls")

And let’s check what do we have:

>>> for record in records:

... print("%s of %s has %s mg" % (

... record['Serving Size'],

... record['Coffees'],

... record['Caffeine (mg)']))

venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg

large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg

grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg

regular(16 oz.) of Panera Coffee Light Roast has 300 mg

Get two dimensional array

Instead, what if you have to use pyexcel.get_array() to do the same:

pyexcel Documentation, Release 0.6.0

>>> for row in p.get_array(file_name="your_file.xls", start_row=1):

... print("%s of %s has %s mg" % (

... row[1],

... row[0],

... row[2]))

venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg

large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg

grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg

regular(16 oz.) of Panera Coffee Light Roast has 300 mg

where start_row skips the header row.

Get a dictionary

You can get a dictionary too:

Now let’s get a dictionary out from the spreadsheet:

>>> my_dict = p.get_dict(file_name="your_file.xls", name_columns_by_row=0)

And check what do we have:

>>> from pyexcel._compact import OrderedDict

>>> isinstance(my_dict, OrderedDict)

True

>>> for key, values in my_dict.items():

... print(key + " : " + ','.join([str(item) for item in values]))

Coffees : Starbucks Coffee Blonde Roast,Dunkin' Donuts Coffee with Turbo Shot,

˓→Starbucks Coffee Pike Place Roast,Panera Coffee Light Roast

Serving Size : venti(20 oz),large(20 oz.),grande(16 oz.),regular(16 oz.)

Caffeine (mg) : 475,398,310,300

Please note that my_dict is an OrderedDict.

Get a dictionary of two dimensional array

Suppose you have a multiple sheet book as the following:

Here is the code to obtain those sheets as a single dictionary:

>>> book_dict = p.get_book_dict(file_name="book.xls")

And check::

>>> isinstance(book_dict, OrderedDict)

True

>>> import json

>>> for key, item in book_dict.items():

... print(json.dumps({key: item}))

{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}

{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

10 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

4.1.2 Data export in one line

Export an array

Suppose you have the following array:

>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

And here is the code to save it as an excel ﬁle

>>> p.save_as(array=data, dest_file_name="example.xls")

Let’s verify it:

>>> p.get_sheet(file_name="example.xls")

pyexcel_sheet1:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

| 7 | 8 | 9 |

+---+---+---+

And here is the code to save it as a csv ﬁle

>>> p.save_as(array=data,

... dest_file_name="example.csv",

... dest_delimiter=':')

Let’s verify it:

>>> with open("example.csv") as f:

... for line in f.readlines():

... print(line.rstrip())

...

1:2:3

4:5:6

7:8:9

Export a list of dictionaries

>>> records = [

... {"year": 1903, "country": "Germany", "speed": "206.7km/h"},

... {"year": 1964, "country": "Japan", "speed": "210km/h"},

... {"year": 2008, "country": "China", "speed": "350km/h"}

... ]

>>> p.save_as(records=records, dest_file_name='high_speed_rail.xls')

Export a dictionary of single key value pair

>>> henley_on_thames_facts = {

... "area": "5.58 square meters",

(continues on next page)

4.1. One liners 11

pyexcel Documentation, Release 0.6.0

(continued from previous page)

... "population": "11,619",

... "civial parish": "Henley-on-Thames",

... "latitude": "51.536",

... "longitude": "-0.898"

... }

>>> p.save_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')

Export a dictionary of single dimensonal array

>>> ccs_insights = {

... "year": ["2017", "2018", "2019", "2020", "2021"],

... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],

... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]

... }

>>> p.save_as(adict=ccs_insights, dest_file_name='ccs.csv')

Export a dictionary of two dimensional array as a book

Suppose you want to save the below dictionary to an excel ﬁle

>>> a_dictionary_of_two_dimensional_arrays = {

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ],

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

... [4.0, 3.0, 2.0]

... ]

... }

Here is the code:

>>> p.save_book_as(

... bookdict=a_dictionary_of_two_dimensional_arrays,

... dest_file_name="book.xls"

... )

If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function

itself. For example:

>>> data = OrderedDict()

>>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})

(continues on next page)

12 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

>>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})

>>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})

>>> p.save_book_as(bookdict=data, dest_file_name="book.xls")

Let’s verify its order:

>>> book_dict = p.get_book_dict(file_name="book.xls")

>>> for key, item in book_dict.items():

... print(json.dumps({key: item}))

{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}

{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

Please notice that “Sheet 2” is the ﬁrst item in the book_dict, meaning the order of sheets are preserved.

4.1.3 File format transcoding on one line

Note: Please note that the following ﬁle transcoding could be with zero line. Please install pyexcel-cli and you will

do the transcode in one command. No need to open your editor, save the problem, then python run.

The following code does a simple ﬁle format transcoding from xls to csv:

>>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv")

Again it is really simple. Let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.csv")

>>> sheet

birth.csv:

+-------+--------+----------+

| name | weight | birth |

+-------+--------+----------+

| Adam | 3.4 | 03/02/15 |

+-------+--------+----------+

| Smith | 4.2 | 12/11/14 |

+-------+--------+----------+

Note: Please note that csv(comma separate value) ﬁle is pure text ﬁle. Formula, charts, images and formatting in xls

ﬁle will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding

job.

Let use previous example and save it as xlsx instead

>>> p.save_as(file_name="birth.xls",

... dest_file_name="birth.xlsx") # change the file extension

Again let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.xlsx")

>>> sheet

pyexcel_sheet1:

(continues on next page)

4.1. One liners 13

pyexcel Documentation, Release 0.6.0

(continued from previous page)

+-------+--------+----------+

| name | weight | birth |

+-------+--------+----------+

| Adam | 3.4 | 03/02/15 |

+-------+--------+----------+

| Smith | 4.2 | 12/11/14 |

+-------+--------+----------+

4.1.4 Excel book merge and split operation in one line

Merge all excel ﬁles in directory into a book where each ﬁle become a sheet

The following code will merge every excel ﬁles into one ﬁle, say “output.xls”:

from pyexcel.cookbook import merge_all_to_a_book

import glob

merge_all_to_a_book(glob.glob("your_csv_directory\

.csv"), "output.xls")

You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls,

xlsm, xlsx, ods and csv ﬁles in your_excel_ﬁle_directory, you can do the following:

from pyexcel.cookbook import merge_all_to_a_book

import glob

merge_all_to_a_book(glob.glob("your_excel_file_directory\

"), "output.xls")

Split a book into single sheet ﬁles

Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel ﬁle. You

can easily do this:

>>> from pyexcel.cookbook import split_a_book

>>> split_a_book("megabook.xls", "output.xls")

>>> import glob

>>> outputfiles = glob.glob("

_output.xls")

>>> for file in sorted(outputfiles):

... print(file)

...

Sheet 1_output.xls

Sheet 2_output.xls

Sheet 3_output.xls

for the output ﬁle, you can specify any of the supported formats

Extract just one sheet from a book

Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate

it into a single sheet excel ﬁle. You can easily do this:

14 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

>>> from pyexcel.cookbook import extract_a_sheet_from_a_book

>>> extract_a_sheet_from_a_book("megabook.xls", "Sheet 1", "output.xls")

>>> if os.path.exists("Sheet 1_output.xls"):

... print("Sheet 1_output.xls exists")

...

Sheet 1_output.xls exists

for the output ﬁle, you can specify any of the supported formats

4.2 Stream APIs for big ﬁle : A set of two liners

This section shows you how to get data from your BIG excel ﬁles and how to export data to excel ﬁles in two lines at

most.

4.2.1 Two liners for get data from big excel ﬁles

Get a list of dictionaries

Suppose you want to process the following coffee data:

Let’s get a list of dictionary out from the xls ﬁle:

>>> records = p.iget_records(file_name="your_file.xls")

And let’s check what do we have:

>>> for record in records:

... print("%s of %s has %s mg" % (

... record['Serving Size'],

... record['Coffees'],

... record['Caffeine (mg)']))

venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg

large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg

grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg

regular(16 oz.) of Panera Coffee Light Roast has 300 mg

Please do not forgot the second line:

>>> p.free_resources()

Get two dimensional array

Instead, what if you have to use pyexcel.get_array() to do the same:

>>> for row in p.iget_array(file_name="your_file.xls", start_row=1):

... print("%s of %s has %s mg" % (

... row[1],

... row[0],

... row[2]))

venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg

large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg

grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg

regular(16 oz.) of Panera Coffee Light Roast has 300 mg

4.2. Stream APIs for big ﬁle : A set of two liners 15

pyexcel Documentation, Release 0.6.0

Again, do not forgot the second line:

>>> p.free_resources()

where start_row skips the header row.

4.2.2 Data export in one liners

Export an array

Suppose you have the following array:

>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

And here is the code to save it as an excel ﬁle

>>> p.isave_as(array=data, dest_file_name="example.xls")

But the following line is not required because the data source are not ﬁle sources:

>>> # p.free_resources()

Let’s verify it:

>>> p.get_sheet(file_name="example.xls")

pyexcel_sheet1:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

| 7 | 8 | 9 |

+---+---+---+

And here is the code to save it as a csv ﬁle

>>> p.isave_as(array=data,

... dest_file_name="example.csv",

... dest_delimiter=':')

Let’s verify it:

>>> with open("example.csv") as f:

... for line in f.readlines():

... print(line.rstrip())

...

1:2:3

4:5:6

7:8:9

Export a list of dictionaries

16 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

>>> records = [

... {"year": 1903, "country": "Germany", "speed": "206.7km/h"},

... {"year": 1964, "country": "Japan", "speed": "210km/h"},

... {"year": 2008, "country": "China", "speed": "350km/h"}

... ]

>>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls')

Export a dictionary of single key value pair

>>> henley_on_thames_facts = {

... "area": "5.58 square meters",

... "population": "11,619",

... "civial parish": "Henley-on-Thames",

... "latitude": "51.536",

... "longitude": "-0.898"

... }

>>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')

Export a dictionary of single dimensonal array

>>> ccs_insights = {

... "year": ["2017", "2018", "2019", "2020", "2021"],

... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],

... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]

... }

>>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv')

>>> p.free_resources()

Export a dictionary of two dimensional array as a book

Suppose you want to save the below dictionary to an excel ﬁle

>>> a_dictionary_of_two_dimensional_arrays = {

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ],

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

... [4.0, 3.0, 2.0]

... ]

... }

4.2. Stream APIs for big ﬁle : A set of two liners 17

pyexcel Documentation, Release 0.6.0

Here is the code:

>>> p.isave_book_as(

... bookdict=a_dictionary_of_two_dimensional_arrays,

... dest_file_name="book.xls"

... )

If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function

itself. For example:

>>> from pyexcel._compact import OrderedDict

>>> data = OrderedDict()

>>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})

>>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})

>>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})

>>> p.isave_book_as(bookdict=data, dest_file_name="book.xls")

>>> p.free_resources()

Let’s verify its order:

>>> import json

>>> book_dict = p.get_book_dict(file_name="book.xls")

>>> for key, item in book_dict.items():

... print(json.dumps({key: item}))

{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}

{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

Please notice that “Sheet 2” is the ﬁrst item in the book_dict, meaning the order of sheets are preserved.

4.2.3 File format transcoding on one line

Note: Please note that the following ﬁle transcoding could be with zero line. Please install pyexcel-cli and you will

do the transcode in one command. No need to open your editor, save the problem, then python run.

The following code does a simple ﬁle format transcoding from xls to csv:

>>> import pyexcel

>>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv")

Again it is really simple. Let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.csv")

>>> sheet

birth.csv:

+-------+--------+----------+

| name | weight | birth |

+-------+--------+----------+

| Adam | 3.4 | 03/02/15 |

+-------+--------+----------+

| Smith | 4.2 | 12/11/14 |

+-------+--------+----------+

Note: Please note that csv(comma separate value) ﬁle is pure text ﬁle. Formula, charts, images and formatting in xls

18 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

ﬁle will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding

job.

Let use previous example and save it as xlsx instead

>>> import pyexcel

>>> p.isave_as(file_name="birth.xls",

... dest_file_name="birth.xlsx") # change the file extension

Again let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.xlsx")

>>> sheet

pyexcel_sheet1:

+-------+--------+----------+

| name | weight | birth |

+-------+--------+----------+

| Adam | 3.4 | 03/02/15 |

+-------+--------+----------+

| Smith | 4.2 | 12/11/14 |

+-------+--------+----------+

4.3 Pyexcel-io Plugin guide

There has been a lot of plugins for reading and writing a ﬁle types. Here is a guide for you to choose them.

Table 1: A list of ﬁle formats supported by external plugins

Package name Supported ﬁle formats Dependencies Python versions

pyexcel-io csv, csvz

, tsv, tsvz

2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy

pyexcel-xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt same as above

pyexcel-xlsx xlsx openpyxl same as above

pyexcel-ods3 ods pyexcel-ezodf, lxml 2.6, 2.7, 3.3, 3.4 3.5, 3.6

pyexcel-ods ods odfpy same as above

Table 2: Dedicated ﬁle reader and writers

Package name Supported ﬁle formats Dependencies Python versions

pyexcel-xlsxw xlsx(write only) XlsxWriter Python 2 and 3

pyexcel-xlsxr xlsx(read only) lxml same as above

pyexcel-odsr read only for ods, fods lxml same as above

pyexcel-htmlr html(read only) lxml,html5lib same as above

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv,

you can have different plugins per virtual environment. In the situation where you have multiple plugins that does

the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example,

pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(. . . ,

library=’pyexcel-odsr’).

zipped csv ﬁle

zipped tsv ﬁle

4.3. Pyexcel-io Plugin guide 19

pyexcel Documentation, Release 0.6.0

4.3.1 Read and write with performance

Partial reading

csv, tsv by pyexcel-io, ods by pyexcel-odsr, html by pyexcel-htmlr are implemented in partial read mode. If you

only need ﬁrst half of the ﬁle, the second half of the data will not be read into the memory if and only if you use

igetters(iget_records, iget_array) and isaveer(isave_as and isave_book_as).

Read on demand

xls by pyexcel-xls promised to read sheet on demand. It means if you need only one sheet from a multi-sheet book,

the rest of the sheets in the book will not be read.

Streaming write

csv, tsv by ‘pyexce-io‘_ can do streaming write.

Write with constant memory

xlsx by pyexcel-xlsxw can write big data with constant memory consumption.

4.4 For web developer

The following libraries are written to facilitate the daily import and export of excel data.

framework plugin/middleware/extension

Flask Flask-Excel

Django django-excel

Pyramid pyramid-excel

And you may make your own by using pyexcel-webio

4.4.1 Read any supported excel and respond its content in json

You can ﬁnd a real world example in examples/memoryﬁle/ directory: pyexcel_server.py. Here is the example snippet

1 def upload():

2 if request.method == 'POST' and 'excel' in request.files:

3 # handle file upload

4 filename = request.files['excel'].filename

5 extension = filename.split(".")[-1]

6 # Obtain the file extension and content

7 # pass a tuple instead of a file name

8 content = request.files['excel'].read()

9 if sys.version_info[0] > 2:

10 # in order to support python 3

11 # have to decode bytes to str

12 content = content.decode('utf-8')

13 sheet = pe.get_sheet(file_type=extension, file_content=content)

(continues on next page)

20 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

14 # then use it as usual

15 sheet.name_columns_by_row(0)

16 # respond with a json

17 return jsonify({"result": sheet.dict})

18 return render_template('upload.html')

request.ﬁles[‘excel’] in line 4 holds the ﬁle object. line 5 ﬁnds out the ﬁle extension. line 13 obtains a sheet instance.

line 15 uses the ﬁrst row as data header. line 17 sends the json representation of the excel ﬁle back to client browser.

4.4.2 Write to memory and respond to download

1 data = [

2 [...],

3 ...

4 ]

6 @app.route('/download')

7 def download():

8 sheet = pe.Sheet(data)

9 output = make_response(sheet.csv)

10 output.headers["Content-Disposition"] = "attachment; filename=export.csv"

11 output.headers["Content-type"] = "text/csv"

12 return output

make_response is a Flask utility to make a memory content as http response.

Note: You can ﬁnd the corresponding source code at examples/memoryﬁle

4.5 Pyexcel data renderers

There exist a few data renderers for pyexcel data. This chapter will walk you through them.

4.5.1 View pyexcel data in ndjson and other formats

With pyexcel-text, you can get pyexcel data in newline delimited json, normal json and other formats.

4.5.2 View the pyexcel data in a browser

You can use pyexcel-handsontable to render your data.

4.5.3 Include excel data in your python documentation

sphinxcontrib-excel help you present your excel data in various formats inside your sphinx documentation.

4.5. Pyexcel data renderers 21

pyexcel Documentation, Release 0.6.0

4.5.4 Draw charts from your excel data

pyexcel-pygal helps you with all charting options and give you charts in svg format.

pyexcel-echarts draws 2D, 3D, geo charts from pyexcel data and has awesome animations too, but it is under develop-

ment.

pyexcel-matplotlib helps you with scentiﬁc charts and is under developmement.

4.5.5 Gantt chart visualization for your excel data

‘pyexcel-gantt‘_ is a specialist renderer for gantt chart.

4.6 Sheet

4.6.1 Random access

To randomly access a cell of Sheet instance, two syntax are available:

sheet[row, column]

or:

sheet['A1']

The former syntax is handy when you know the row and column numbers. The latter syntax is introduced to help you

convert the excel column header such as “AX” to integer numbers.

Suppose you have the following data, you can get value 5 by reader[2, 2].

Here is the example code showing how you can randomly access a cell:

>>> sheet = pyexcel.get_sheet(file_name="example.xls")

>>> sheet.content

+---------+---+---+---+

| Example | X | Y | Z |

+---------+---+---+---+

| a | 1 | 2 | 3 |

+---------+---+---+---+

| b | 4 | 5 | 6 |

+---------+---+---+---+

| c | 7 | 8 | 9 |

+---------+---+---+---+

>>> print(sheet[2, 2])

>>> print(sheet["C3"])

>>> sheet[3, 3] = 10

>>> print(sheet[3, 3])

Note: In order to set a value to a cell, please use sheet[row_index, column_index] = new_value

Random access to rows and columns

22 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

Continue with previous excel ﬁle, you can access row and column separately:

>>> sheet.row[1]

['a', 1, 2, 3]

>>> sheet.column[2]

['Y', 2, 5, 8]

Use custom names instead of index Alternatively, it is possible to use the ﬁrst row to refer to each columns:

>>> sheet.name_columns_by_row(0)

>>> print(sheet[1, "Y"])

>>> sheet[1, "Y"] = 100

>>> print(sheet[1, "Y"])

100

You have noticed the row index has been changed. It is because ﬁrst row is taken as the column names, hence all rows

after the ﬁrst row are shifted. Now accessing the columns are changed too:

>>> sheet.column['Y']

[2, 100, 8]

Hence access the same cell, this statement also works:

>>> sheet.column['Y'][1]

100

Further more, it is possible to use ﬁrst column to refer to each rows:

>>> sheet.name_rows_by_column(0)

To access the same cell, we can use this line:

>>> sheet.row["b"][1]

100

For the same reason, the row index has been reduced by 1. Since we have named columns and rows, it is possible to

access the same cell like this:

>>> print(sheet["b", "Y"])

100

>>> sheet["b", "Y"] = 200

>>> print(sheet["b", "Y"])

200

Play with data

Suppose you have the following data in any of the supported excel formats again:

>>> sheet = pyexcel.get_sheet(file_name="example_series.xls",

... name_columns_by_row=0)

You can get headers:

>>> print(list(sheet.colnames))

['Column 1', 'Column 2', 'Column 3']

You can use a utility function to get all in a dictionary:

4.6. Sheet 23

pyexcel Documentation, Release 0.6.0

>>> sheet.to_dict()

OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6,

˓→9])])

Maybe you want to get only the data without the column headers. You can call rows() instead:

>>> list(sheet.rows())

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

You can get data from the bottom to the top one by calling rrows():

>>> list(sheet.rrows())

[[7, 8, 9], [4, 5, 6], [1, 2, 3]]

You might want the data arranged vertically. You can call columns():

>>> list(sheet.columns())

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

You can get columns in reverse sequence as well by calling rcolumns():

>>> list(sheet.rcolumns())

[[3, 6, 9], [2, 5, 8], [1, 4, 7]]

Do you want to ﬂatten the data? You can get the content in one dimensional array. If you are interested in playing with

one dimensional enumeration, you can check out these functions enumerate(), reverse(), vertical(), and

rvertical():

>>> list(sheet.enumerate())

[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(sheet.reverse())

[9, 8, 7, 6, 5, 4, 3, 2, 1]

>>> list(sheet.vertical())

[1, 4, 7, 2, 5, 8, 3, 6, 9]

>>> list(sheet.rvertical())

[9, 6, 3, 8, 5, 2, 7, 4, 1]

attributes

Attributes:

>>> import pyexcel

>>> content = "1,2,3\n3,4,5"

>>> sheet = pyexcel.get_sheet(file_type="csv", file_content=content)

>>> sheet.tsv

'1\t2\t3\r\n3\t4\t5\r\n'

>>> print(sheet.simple)

csv:

- - -

1 2 3

3 4 5

- - -

What’s more, you could as well set value to an attribute, for example::

>>> import pyexcel

>>> content = "1,2,3\n3,4,5"

(continues on next page)

24 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

>>> sheet = pyexcel.Sheet()

>>> sheet.csv = content

>>> sheet.array

[[1, 2, 3], [3, 4, 5]]

You can get the direct access to underneath stream object. In some situation, it is desired:

>>> stream = sheet.stream.tsv

The returned stream object has tsv formatted content for reading.

What you could further do is to set a memory stream of any supported ﬁle format to a sheet. For example:

>>> another_sheet = pyexcel.Sheet()

>>> another_sheet.xls = sheet.xls

>>> another_sheet.content

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 3 | 4 | 5 |

+---+---+---+

Yet, it is possible assign a absolute url to an online excel ﬁle to an instance of pyexcel.Sheet.

custom attributes

You can pass on source speciﬁc parameters to getter and setter functions.

>>> content = "1-2-3\n3-4-5"

>>> sheet = pyexcel.Sheet()

>>> sheet.set_csv(content, delimiter="-")

>>> sheet.csv

'1,2,3\r\n3,4,5\r\n'

>>> sheet.get_csv(delimiter="|")

'1|2|3\r\n3|4|5\r\n'

4.6.2 Data manipulation

The data in a sheet is represented by Sheet which maintains the data as a list of lists. You can regard Sheet as a

two dimensional array with additional iterators. Random access to individual column and row is exposed by Column

and Row

Column manipulation

Suppose have one data ﬁle as the following:

>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)

>>> sheet

pyexcel sheet:

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+==========+==========+==========+

| 1 | 4 | 7 |

+----------+----------+----------+

(continues on next page)

4.6. Sheet 25

pyexcel Documentation, Release 0.6.0

(continued from previous page)

| 2 | 5 | 8 |

+----------+----------+----------+

| 3 | 6 | 9 |

+----------+----------+----------+

And you want to update Column 2 with these data: [11, 12, 13]

>>> sheet.column["Column 2"] = [11, 12, 13]

>>> sheet.column[1]

[11, 12, 13]

>>> sheet

pyexcel sheet:

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+==========+==========+==========+

| 1 | 11 | 7 |

+----------+----------+----------+

| 2 | 12 | 8 |

+----------+----------+----------+

| 3 | 13 | 9 |

+----------+----------+----------+

Remove one column of a data ﬁle

If you want to remove Column 2, you can just call:

>>> del sheet.column["Column 2"]

>>> sheet.column["Column 3"]

[7, 8, 9]

The sheet content will become:

>>> sheet

pyexcel sheet:

+----------+----------+

| Column 1 | Column 3 |

+==========+==========+

| 1 | 7 |

+----------+----------+

| 2 | 8 |

+----------+----------+

| 3 | 9 |

+----------+----------+

Append more columns to a data ﬁle

Continue from previous example. Suppose you want add two more columns to the data ﬁle

Column 4 Column 5

10 13

11 14

12 15

26 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

Here is the example code to append two extra columns:

>>> extra_data = [

... ["Column 4", "Column 5"],

... [10, 13],

... [11, 14],

... [12, 15]

... ]

>>> sheet2 = pyexcel.Sheet(extra_data)

>>> sheet.column += sheet2

>>> sheet.column["Column 4"]

[10, 11, 12]

>>> sheet.column["Column 5"]

[13, 14, 15]

Here is what you will get:

>>> sheet

pyexcel sheet:

+----------+----------+----------+----------+

+==========+==========+==========+==========+

| 1 | 7 | 10 | 13 |

+----------+----------+----------+----------+

| 2 | 8 | 11 | 14 |

+----------+----------+----------+----------+

| 3 | 9 | 12 | 15 |

+----------+----------+----------+----------+

Cherry pick some columns to be removed

Suppose you have the following data:

>>> data = [

... ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],

... [1,2,3,4,5,6,7,9],

... ]

>>> sheet = pyexcel.Sheet(data, name_columns_by_row=0)

>>> sheet

pyexcel sheet:

+---+---+---+---+---+---+---+---+

| a | b | c | d | e | f | g | h |

+===+===+===+===+===+===+===+===+

| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 |

+---+---+---+---+---+---+---+---+

And you want to remove columns named as: ‘a’, ‘c, ‘e’, ‘h’. This is how you do it:

>>> del sheet.column['a', 'c', 'e', 'h']

>>> sheet

pyexcel sheet:

+---+---+---+---+

| b | d | f | g |

+===+===+===+===+

| 2 | 4 | 6 | 7 |

+---+---+---+---+

4.6. Sheet 27

pyexcel Documentation, Release 0.6.0

What if the headers are in a different row

Suppose you have the following data:

>>> sheet

pyexcel sheet:

+----------+----------+----------+

| 1 | 2 | 3 |

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+----------+----------+----------+

| 4 | 5 | 6 |

+----------+----------+----------+

The way to name your columns is to use index 1:

>>> sheet.name_columns_by_row(1)

Here is what you get:

>>> sheet

pyexcel sheet:

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+==========+==========+==========+

| 1 | 2 | 3 |

+----------+----------+----------+

| 4 | 5 | 6 |

+----------+----------+----------+

Row manipulation

Suppose you have the following data:

>>> sheet

pyexcel sheet:

+---+---+---+-------+

| a | b | c | Row 1 |

+---+---+---+-------+

| e | f | g | Row 2 |

+---+---+---+-------+

| 1 | 2 | 3 | Row 3 |

+---+---+---+-------+

You can name your rows by column index at 3:

>>> sheet.name_rows_by_column(3)

>>> sheet

pyexcel sheet:

+-------+---+---+---+

| Row 1 | a | b | c |

+-------+---+---+---+

| Row 2 | e | f | g |

+-------+---+---+---+

| Row 3 | 1 | 2 | 3 |

+-------+---+---+---+

28 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

Then you can access rows by its name:

>>> sheet.row["Row 1"]

['a', 'b', 'c']

4.6.3 Formatting

Previous section has assumed the data is in the format that you want. In reality, you have to manipulate the data types

a bit to suit your needs. Hence, formatters comes into the scene. use format() to apply formatter immediately.

Note: int, ﬂoat and datetime values are automatically detected in csv ﬁles since pyexcel version 0.2.2

Convert a column of numbers to strings

Suppose you have the following data:

>>> import pyexcel

>>> data = [

... ["userid","name"],

... [10120,"Adam"],

... [10121,"Bella"],

... [10122,"Cedar"]

... ]

>>> sheet = pyexcel.Sheet(data)

>>> sheet.name_columns_by_row(0)

>>> sheet.column["userid"]

[10120, 10121, 10122]

As you can see, userid column is of int type. Next, let’s convert the column to string format:

>>> sheet.column.format("userid", str)

>>> sheet.column["userid"]

['10120', '10121', '10122']

Cleanse the cells in a spread sheet

Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose

we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some ﬁeld had

weird characters, such as “  ”:

>>> data = [

... [" Version", " Comments", " Author  "],

... [" v0.0.1 ", " Release versions","  Eda"],

... ["  v0.0.2 ", "Useful updates    ", "  Freud"]

... ]

>>> sheet = pyexcel.Sheet(data)

>>> sheet.content

+-----------------+------------------------------+----------------------+

| Version | Comments | Author   |

+-----------------+------------------------------+----------------------+

| v0.0.1 | Release versions |  Eda |

(continues on next page)

4.6. Sheet 29

pyexcel Documentation, Release 0.6.0

(continued from previous page)

+-----------------+------------------------------+----------------------+

|   v0.0.2 | Useful updates     |  Freud |

+-----------------+------------------------------+----------------------+

Now try to create a custom cleanse function:

.. code-block:: python

>>> def cleanse_func(v):

... v = v.replace(" ", "")

... v = v.rstrip().strip()

... return v

...

Then let’s create a SheetFormatter and apply it:

.. code-block:: python

>>> sheet.map(cleanse_func)

So in the end, you get this:

>>> sheet.content

+---------+------------------+--------+

| Version | Comments | Author |

+---------+------------------+--------+

| v0.0.1 | Release versions | Eda |

+---------+------------------+--------+

| v0.0.2 | Useful updates | Freud |

+---------+------------------+--------+

4.6.4 Data ﬁltering

use filter() function to apply a ﬁlter immediately. The content is modiﬁed.

Suppose you have the following data in any of the supported excel formats:

Column 1 Column 2 Column 3

1 4 7

2 5 8

3 6 9

>>> import pyexcel

>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)

>>> sheet.content

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+==========+==========+==========+

| 1 | 2 | 3 |

+----------+----------+----------+

| 4 | 5 | 6 |

(continues on next page)

30 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

+----------+----------+----------+

| 7 | 8 | 9 |

+----------+----------+----------+

Filter out some data

You may want to ﬁlter odd rows and print them in an array of dictionaries:

>>> sheet.filter(row_indices=[0, 2])

>>> sheet.content

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+==========+==========+==========+

| 4 | 5 | 6 |

+----------+----------+----------+

Let’s try to further ﬁlter out even columns:

>>> sheet.filter(column_indices=[1])

>>> sheet.content

+----------+----------+

| Column 1 | Column 3 |

+==========+==========+

| 4 | 6 |

+----------+----------+

Save the data

Let’s save the previous ﬁltered data:

>>> sheet.save_as("example_series_filter.xls")

When you open example_series_ﬁlter.xls, you will ﬁnd these data

Column 1 Column 3

2 8

How to ﬁlter out empty rows in my sheet?

Suppose you have the following data in a sheet and you want to remove those rows with blanks:

>>> import pyexcel as pe

>>> sheet = pe.Sheet([[1,2,3],['','',''],['','',''],[1,2,3]])

You can use pyexcel.filters.RowValueFilter, which examines each row, return True if the row should be

ﬁltered out. So, let’s deﬁne a ﬁlter function:

>>> def filter_row(row_index, row):

... result = [element for element in row if element != '']

... return len(result)==0

And then apply the ﬁlter on the sheet:

4.6. Sheet 31

pyexcel Documentation, Release 0.6.0

>>> del sheet.row[filter_row]

>>> sheet

pyexcel sheet:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

4.7 Book

You access each cell via this syntax:

book[sheet_index][row, column]

or:

book["sheet_name"][row, column]

Suppose you have the following sheets:

And you can randomly access a cell in a sheet:

>>> book = pyexcel.get_book(file_name="example.xls")

>>> print(book["Sheet 1"][0,0])

>>> print(book[0][0,0]) # the same cell

Tip: With pyexcel, you can regard single sheet reader as an two dimensional array and multi-sheet excel book reader

as a ordered dictionary of two dimensional arrays.

Write multiple sheet excel ﬁle

Suppose you have previous data as a dictionary and you want to save it as multiple sheet excel ﬁle:

>>> content = {

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ],

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

(continues on next page)

32 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

... [4.0, 3.0, 2.0]

... ]

... }

>>> book = pyexcel.get_book(bookdict=content)

>>> book.save_as("output.xls")

You shall get a xls ﬁle

Read multiple sheet excel ﬁle

Let’s read the previous ﬁle back:

>>> book = pyexcel.get_book(file_name="output.xls")

>>> sheets = book.to_dict()

>>> for name in sheets.keys():

... print(name)

Sheet 1

Sheet 2

Sheet 3

4.7.1 Get content

>>> book_dict = {

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

... [4.0, 3.0, 2.0]

... ],

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ]

... }

>>> book = pyexcel.get_book(bookdict=book_dict)

>>> book

Sheet 1:

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

| 7.0 | 8.0 | 9.0 |

+-----+-----+-----+

Sheet 2:

+-----+-----+-----+

| X | Y | Z |

(continues on next page)

4.7. Book 33

pyexcel Documentation, Release 0.6.0

(continued from previous page)

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

Sheet 3:

+-----+-----+-----+

| O | P | Q |

+-----+-----+-----+

| 3.0 | 2.0 | 1.0 |

+-----+-----+-----+

| 4.0 | 3.0 | 2.0 |

+-----+-----+-----+

>>> print(book.rst)

Sheet 1:

= = =

1 2 3

4 5 6

7 8 9

= = =

Sheet 2:

=== === ===

X Y Z

1.0 2.0 3.0

4.0 5.0 6.0

=== === ===

Sheet 3:

=== === ===

O P Q

3.0 2.0 1.0

4.0 3.0 2.0

=== === ===

You can get the direct access to underneath stream object. In some situation, it is desired.

>>> stream = book.stream.plain

The returned stream object has the content formatted in plain format for further reading.

4.7.2 Set content

Surely, you could set content to an instance of pyexcel.Book.

>>> other_book = pyexcel.Book()

>>> other_book.bookdict = book_dict

>>> print(other_book.plain)

Sheet 1:

1 2 3

4 5 6

7 8 9

Sheet 2:

X Y Z

1.0 2.0 3.0

4.0 5.0 6.0

Sheet 3:

(continues on next page)

34 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

(continued from previous page)

O P Q

3.0 2.0 1.0

4.0 3.0 2.0

You can set via ‘xls’ attribute too.

>>> another_book = pyexcel.Book()

>>> another_book.xls = other_book.xls

>>> print(another_book.mediawiki)

Sheet 1:

{| class="wikitable" style="text-align: left;"

|+

| align="right"| 1 || align="right"| 2 || align="right"| 3

| align="right"| 4 || align="right"| 5 || align="right"| 6

| align="right"| 7 || align="right"| 8 || align="right"| 9

Sheet 2:

{| class="wikitable" style="text-align: left;"

|+

| X || Y || Z

| 1 || 2 || 3

| 4 || 5 || 6

Sheet 3:

{| class="wikitable" style="text-align: left;"

|+

| O || P || Q

| 3 || 2 || 1

| 4 || 3 || 2

Access to individual sheets

You can access individual sheet of a book via attribute:

>>> book = pyexcel.get_book(file_name="book.xls")

>>> book.sheet3

sheet3:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

| 7 | 8 | 9 |

+---+---+---+

or via array notations:

4.7. Book 35

pyexcel Documentation, Release 0.6.0

>>> book["sheet 1"] # there is a space in the sheet name

sheet 1:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

Merge excel books

Suppose you have two excel books and each had three sheets. You can merge them and get a new book:

You also can merge individual sheets:

>>> book1 = pyexcel.get_book(file_name="book1.xls")

>>> book2 = pyexcel.get_book(file_name="book2.xlsx")

>>> merged_book = book1 + book2

>>> merged_book = book1["Sheet 1"] + book2["Sheet 2"]

>>> merged_book = book1["Sheet 1"] + book2

>>> merged_book = book1 + book2["Sheet 2"]

Manipulate individual sheets

4.7.3 merge sheets into a single sheet

Suppose you want to merge many csv ﬁles row by row into a new sheet.

>>> import glob

>>> merged = pyexcel.Sheet()

>>> for file in glob.glob("

.csv"):

... merged.row += pyexcel.get_sheet(file_name=file)

>>> merged.save_as("merged.csv")

How do I read a book, process it and save to a new book

Yes, you can do that. The code looks like this:

import pyexcel

book = pyexcel.get_book(file_name="yourfile.xls")

for sheet in book:

# do you processing with sheet

# do filtering?

pass

book.save_as("output.xls")

What would happen if I save a multi sheet book into “csv” ﬁle

Well, you will get one csv ﬁle per each sheet. Suppose you have these code:

36 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

>>> content = {

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ],

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

... [4.0, 3.0, 2.0]

... ]

... }

>>> book = pyexcel.Book(content)

>>> book.save_as("myfile.csv")

You will end up with three csv ﬁles:

>>> import glob

>>> outputfiles = glob.glob("myfile_

.csv")

>>> for file in sorted(outputfiles):

... print(file)

...

myfile__Sheet 1__0.csv

myfile__Sheet 2__1.csv

myfile__Sheet 3__2.csv

and their content is the value of the dictionary at the corresponding key

Alternatively, you could use save_book_as() function

>>> pyexcel.save_book_as(bookdict=content, dest_file_name="myfile.csv")

After I have saved my multiple sheet book in csv format, how do I get them back

First of all, you can read them back individual as csv ﬁle using meth:~pyexcel.get_sheet method. Secondly, the pyexcel

can do the magic to load all of them back into a book. You will just need to provide the common name before the

separator “__”:

>>> book2 = pyexcel.get_book(file_name="myfile.csv")

>>> book2

Sheet 1:

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

| 7.0 | 8.0 | 9.0 |

+-----+-----+-----+

(continues on next page)

4.7. Book 37

pyexcel Documentation, Release 0.6.0

(continued from previous page)

Sheet 2:

+-----+-----+-----+

| X | Y | Z |

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

Sheet 3:

+-----+-----+-----+

| O | P | Q |

+-----+-----+-----+

| 3.0 | 2.0 | 1.0 |

+-----+-----+-----+

| 4.0 | 3.0 | 2.0 |

+-----+-----+-----+

4.8 Working with databases

4.8.1 How to import an excel sheet to a database using SQLAlchemy

Note: You can ﬁnd the complete code of this example in examples folder on github

Before going ahead, let’s import the needed components and initialize sql engine and table base:

>>> import os

>>> import pyexcel as p

>>> from sqlalchemy import create_engine

>>> from sqlalchemy.ext.declarative import declarative_base

>>> from sqlalchemy import Column , Integer, String, Float, Date

>>> from sqlalchemy.orm import sessionmaker

>>> engine = create_engine("sqlite:///birth.db")

>>> Base = declarative_base()

>>> Session = sessionmaker(bind=engine)

Let’s suppose we have the following database model:

>>> class BirthRegister(Base):

... __tablename__='birth'

... id=Column(Integer, primary_key=True)

... name=Column(String)

... weight=Column(Float)

... birth=Column(Date)

Let’s create the table:

>>> Base.metadata.create_all(engine)

Now here is a sample excel ﬁle to be saved to the table:

Here is the code to import it:

38 Chapter 4. Tutorial

pyexcel Documentation, Release 0.6.0

>>> session = Session() # obtain a sql session

>>> p.save_as(file_name="birth.xls", name_columns_by_row=0, dest_session=session,

˓→dest_table=BirthRegister)

Done it. It is that simple. Let’s verify what has been imported to make sure.

>>> sheet = p.get_sheet(session=session, table=BirthRegister)

>>> sheet

birth:

+------------+----+-------+--------+

+------------+----+-------+--------+

| 2015-02-03 | 1 | Adam | 3.4 |

+------------+----+-------+--------+

| 2014-11-12 | 2 | Smith | 4.2 |

+------------+----+-------+--------+

4.8. Working with databases 39

pyexcel Documentation, Release 0.6.0

40 Chapter 4. Tutorial

CHAPTER 5

Cook book

5.1 Recipes

Warning: The pyexcel DOES NOT consider Fonts, Styles and Charts at all. In the resulting excel ﬁles, fonts,

styles and charts will not be transferred.

These recipes give a one-stop utility functions for known use cases. Similar functionality can be achieved using other

application interfaces.

5.1.1 Update one column of a data ﬁle

Suppose you have one data ﬁle as the following:

example.xls

Column 1 Column 2 Column 3

1 4 7

2 5 8

3 6 9

And you want to update Column 2 with these data: [11, 12, 13]

Here is the code:

>>> from pyexcel.cookbook import update_columns

>>> custom_column = {"Column 2":[11, 12, 13]}

>>> update_columns("example.xls", custom_column, "output.xls")

Your output.xls will have these data:

pyexcel Documentation, Release 0.6.0

Column 1 Column 2 Column 3

1 11 7

2 12 8

3 13 9

5.1.2 Update one row of a data ﬁle

Suppose you have the same data ﬁle:

example.xls

Row 1 1 2 3

Row 2 4 5 6

Row 3 7 8 9

And you want to update the second row with these data: [7, 4, 1]

Here is the code:

>>> from pyexcel.cookbook import update_rows

>>> custom_row = {"Row 1":[11, 12, 13]}

>>> update_rows("example.xls", custom_row, "output.xls")

Your output.xls will have these data:

Column 1 Column 2 Column 3

7 4 1

2 5 8

3 6 9

5.1.3 Merge two ﬁles into one

Suppose you want to merge the following two data ﬁles:

example.csv

Column 1 Column 2 Column 3

1 4 7

2 5 8

3 6 9

example.xls

Column 4 Column 5

10 12

11 13

The following code will merge the tow into one ﬁle, say “output.xls”:

>>> from pyexcel.cookbook import merge_two_files

>>> merge_two_files("example.csv", "example.xls", "output.xls")

42 Chapter 5. Cook book

pyexcel Documentation, Release 0.6.0

The output.xls would have the following data:

Column 1 Column 2 Column 3 Column 4 Column 5

1 4 7 10 12

2 5 8 11 13

3 6 9

5.1.4 Select candidate columns of two ﬁles and form a new one

Suppose you have these two ﬁles:

example.ods

Column 1 Column 2 Column 3 Column 4 Column 5

1 4 7 10 13

2 5 8 11 14

3 6 9 12 15

example.xls

Column 6 Column 7 Column 8 Column 9 Column 10

16 17 18 19 20

>>> data = [

... ["Column 1", "Column 2", "Column 3", "Column 4", "Column 5"],

... [1, 4, 7, 10, 13],

... [2, 5, 8, 11, 14],

... [3, 6, 9, 12, 15]

... ]

>>> s = pyexcel.Sheet(data)

>>> s.save_as("example.csv")

>>> data = [

... ["Column 6", "Column 7", "Column 8", "Column 9", "Column 10"],

... [16, 17, 18, 19, 20]

... ]

>>> s = pyexcel.Sheet(data)

>>> s.save_as("example.xls")

And you want to ﬁlter out column 2 and 4 from example.ods, ﬁlter out column 6 and 7 and merge them:

Column 1 Column 3 Column 5 Column 8 Column 9 Column 10

1 7 13 18 19 20

2 8 14

3 9 15

The following code will do the job:

>>> from pyexcel.cookbook import merge_two_readers

>>> sheet1 = pyexcel.get_sheet(file_name="example.csv", name_columns_by_row=0)

>>> sheet2 = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)

>>> del sheet1.column[1, 3, 5]

>>> del sheet2.column[0, 1]

>>> merge_two_readers(sheet1, sheet2, "output.xls")

5.1. Recipes 43

pyexcel Documentation, Release 0.6.0

5.1.5 Merge two ﬁles into a book where each ﬁle become a sheet

Suppose you want to merge the following two data ﬁles:

example.csv

Column 1 Column 2 Column 3

1 4 7

2 5 8

3 6 9

example.xls

Column 4 Column 5

10 12

11 13

>>> data = [

... ["Column 1", "Column 2", "Column 3"],

... [1, 2, 3],

... [4, 5, 6],

... [7, 8, 9]

... ]

>>> s = pyexcel.Sheet(data)

>>> s.save_as("example.csv")

>>> data = [

... ["Column 4", "Column 5"],

... [10, 12],

... [11, 13]

... ]

>>> s = pyexcel.Sheet(data)

>>> s.save_as("example.xls")

The following code will merge the tow into one ﬁle, say “output.xls”:

>>> from pyexcel.cookbook import merge_all_to_a_book

>>> merge_all_to_a_book(["example.csv", "example.xls"], "output.xls")

The output.xls would have the following data:

example.csv as sheet name and inside the sheet, you have:

Column 1 Column 2 Column 3

1 4 7

2 5 8

3 6 9

example.ods as sheet name and inside the sheet, you have:

Column 4 Column 5

10 12

11 13

44 Chapter 5. Cook book

pyexcel Documentation, Release 0.6.0

5.2 Loading from other sources

5.2.1 Get back into pyexcel

list

>>> import pyexcel as p

>>> two_dimensional_list = [

... [1, 2, 3, 4],

... [5, 6, 7, 8],

... [9, 10, 11, 12],

... ]

>>> sheet = p.get_sheet(array=two_dimensional_list)

>>> sheet

pyexcel_sheet1:

+---+----+----+----+

| 1 | 2 | 3 | 4 |

+---+----+----+----+

| 5 | 6 | 7 | 8 |

+---+----+----+----+

| 9 | 10 | 11 | 12 |

+---+----+----+----+

dict

>>> a_dictionary_of_key_value_pair = {

... "IE": 0.2,

... "Firefox": 0.3

... }

>>> sheet = p.get_sheet(adict=a_dictionary_of_key_value_pair)

>>> sheet

pyexcel_sheet1:

+---------+-----+

| Firefox | IE |

+---------+-----+

| 0.3 | 0.2 |

+---------+-----+

>>> a_dictionary_of_one_dimensional_arrays = {

... "Column 1": [1, 2, 3, 4],

... "Column 2": [5, 6, 7, 8],

... "Column 3": [9, 10, 11, 12],

... }

>>> sheet = p.get_sheet(adict=a_dictionary_of_one_dimensional_arrays)

>>> sheet

pyexcel_sheet1:

+----------+----------+----------+

| Column 1 | Column 2 | Column 3 |

+----------+----------+----------+

| 1 | 5 | 9 |

+----------+----------+----------+

| 2 | 6 | 10 |

+----------+----------+----------+

| 3 | 7 | 11 |

(continues on next page)

5.2. Loading from other sources 45

pyexcel Documentation, Release 0.6.0

(continued from previous page)

+----------+----------+----------+

| 4 | 8 | 12 |

+----------+----------+----------+

records

>>> a_list_of_dictionaries = [

... {

... "Name": 'Adam',

... "Age": 28

... },

... {

... "Name": 'Beatrice',

... "Age": 29

... },

... {

... "Name": 'Ceri',

... "Age": 30

... },

... {

... "Name": 'Dean',

... "Age": 26

... }

... ]

>>> sheet = p.get_sheet(records=a_list_of_dictionaries)

>>> sheet

pyexcel_sheet1:

+-----+----------+

| Age | Name |

+-----+----------+

| 28 | Adam |

+-----+----------+

| 29 | Beatrice |

+-----+----------+

| 30 | Ceri |

+-----+----------+

| 26 | Dean |

+-----+----------+

book dict

>>> a_dictionary_of_two_dimensional_arrays = {

... 'Sheet 1':

... [

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0],

... [7.0, 8.0, 9.0]

... ],

... 'Sheet 2':

... [

... ['X', 'Y', 'Z'],

... [1.0, 2.0, 3.0],

... [4.0, 5.0, 6.0]

(continues on next page)

46 Chapter 5. Cook book

pyexcel Documentation, Release 0.6.0

(continued from previous page)

... ],

... 'Sheet 3':

... [

... ['O', 'P', 'Q'],

... [3.0, 2.0, 1.0],

... [4.0, 3.0, 2.0]

... ]

... }

>>> book = p.get_book(bookdict=a_dictionary_of_two_dimensional_arrays)

>>> book

Sheet 1:

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

| 7.0 | 8.0 | 9.0 |

+-----+-----+-----+

Sheet 2:

+-----+-----+-----+

| X | Y | Z |

+-----+-----+-----+

| 1.0 | 2.0 | 3.0 |

+-----+-----+-----+

| 4.0 | 5.0 | 6.0 |

+-----+-----+-----+

Sheet 3:

+-----+-----+-----+

| O | P | Q |

+-----+-----+-----+

| 3.0 | 2.0 | 1.0 |

+-----+-----+-----+

| 4.0 | 3.0 | 2.0 |

+-----+-----+-----+

How to load a sheet from a url

Suppose you have excel ﬁle somewhere hosted:

>>> sheet = pe.get_sheet(url='http://yourdomain.com/test.csv')

>>> sheet

csv:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

For sheet

Get content

>>> another_sheet = p.Sheet()

>>> another_sheet.url = "https://github.com/pyexcel/pyexcel/raw/master/examples/

˓→basics/multiple-sheets-example.xls"

(continues on next page)

5.2. Loading from other sources 47

pyexcel Documentation, Release 0.6.0

(continued from previous page)

>>> another_sheet.content

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

| 7 | 8 | 9 |

+---+---+---+

For book

How about setting content via a url?

>>> another_book = p.Book()

>>> another_book.url = "https://github.com/pyexcel/pyexcel/raw/master/examples/basics/

˓→multiple-sheets-example.xls"

>>> another_book

Sheet 1:

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

| 7 | 8 | 9 |

+---+---+---+

Sheet 2:

+---+---+---+

| X | Y | Z |

+---+---+---+

| 1 | 2 | 3 |

+---+---+---+

| 4 | 5 | 6 |

+---+---+---+

Sheet 3:

+---+---+---+

| O | P | Q |

+---+---+---+

| 3 | 2 | 1 |

+---+---+---+

| 4 | 3 | 2 |

+---+---+---+

48 Chapter 5. Cook book

CHAPTER 6

Real world cases

6.1 Questions and Answers

1. Python ﬂask writing to a csv ﬁle and reading it

2. PyQt: Import .xls ﬁle and populate QTableWidget?

3. How do I write data to csv ﬁle in columns and rows from a list in python?

4. How to write dictionary values to a csv ﬁle using Python

5. Python convert csv to xlsx

6. How to read data from excel and set data type

7. Remove or keep speciﬁc columns in csv ﬁle

8. How can I put a CSV ﬁle in an array?

6.2 How to inject csv data to database

Here is real case in the stack-overﬂow. Due to the author’s ignorance, the user would like to have the code in matlab

than Python. Hence, I am sharing my pyexcel solution here.

6.2.1 Problem deﬁnition

Here is my CSV ﬁle:

PDB_Id 123442 234335 234336 3549867

a001 6 0 0 8

b001 4 2 0 0

c003 0 0 0 5

I want to put this data in a MYSQL table in the form:

pyexcel Documentation, Release 0.6.0

PROTEIN_ID PROTEIN_KEY VALUE_OF_KEY

a001 123442 6

a001 234335 0

a001 234336 0

a001 3549867 8

b001 123442 4

b001 234335 2

b001 234336 0

c003 123442 0

c003 234335 0

c003 234336 0

c003 3549867 5

I have created table with the following code:

sql = """CREATE TABLE ALLPROTEINS (

Protein_ID CHAR(20),

PROTEIN_KEY INT ,

VALUE_OF_KEY INT

)"""

I need the code for insert.

6.2.2 Pyexcel solution

If you could insert an id ﬁeld to act as the primary key, it can be mapped using sqlalchemy’s ORM:

$ sqlite3 /tmp/stack2.db

sqlite> CREATE TABLE ALLPROTEINS (

...> ID INT,

...> Protein_ID CHAR(20),

...> PROTEIN_KEY INT,

...> VALUE_OF_KEY INT

...> );

Here is the data mapping script vis sqlalchemy:

>>> # mapping your database via sqlalchemy

>>> from sqlalchemy import create_engine

>>> from sqlalchemy.ext.declarative import declarative_base

>>> from sqlalchemy import Column, Integer, String

>>> from sqlalchemy.orm import sessionmaker

>>> # checkout http://docs.sqlalchemy.org/en/latest/dialects/index.html

>>> # for a different database server

>>> engine = create_engine("sqlite:////tmp/stack2.db")

>>> Base = declarative_base()

>>> class Proteins(Base):

... __tablename__ = 'ALLPROTEINS'

... id = Column(Integer, primary_key=True, autoincrement=True) # <-- appended

˓→field

... protein_id = Column(String(20))

... protein_key = Column(Integer)

... value_of_key = Column(Integer)

>>> Session = sessionmaker(bind=engine)

>>>

50 Chapter 6. Real world cases

pyexcel Documentation, Release 0.6.0

Here is the short script to get data inserted into the database:

>>> import pyexcel as p

>>> from itertools import product

>>> # data insertion code starts here

>>> sheet = p.get_sheet(file_name="csv-to-mysql-in-matlab-code.csv", delimiter='\t')

>>> sheet.name_columns_by_row(0)

>>> sheet.name_rows_by_column(0)

>>> print(sheet)

csv-to-mysql-in-matlab-code.csv:

+------+--------+--------+--------+---------+

| | 123442 | 234335 | 234336 | 3549867 |

+======+========+========+========+=========+

| a001 | 6 | 0 | 0 | 8 |

+------+--------+--------+--------+---------+

| b001 | 4 | 2 | 0 | 0 |

+------+--------+--------+--------+---------+

| c003 | 0 | 0 | 0 | 5 |

+------+--------+--------+--------+---------+

>>> results = []

>>> for protein_id, protein_key in product(sheet.rownames, sheet.colnames):

... results.append([protein_id, protein_key, sheet[str(protein_id), protein_key]])

>>>

>>> sheet2 = p.get_sheet(array=results)

>>> sheet2.colnames = ['protein_id', 'protein_key', 'value_of_key']

>>> print(sheet2)

pyexcel_sheet1:

+------------+-------------+--------------+

| protein_id | protein_key | value_of_key |

+============+=============+==============+

| a001 | 123442 | 6 |

+------------+-------------+--------------+

| a001 | 234335 | 0 |

+------------+-------------+--------------+

| a001 | 234336 | 0 |

+------------+-------------+--------------+

| a001 | 3549867 | 8 |

+------------+-------------+--------------+

| b001 | 123442 | 4 |

+------------+-------------+--------------+

| b001 | 234335 | 2 |

+------------+-------------+--------------+

| b001 | 234336 | 0 |

+------------+-------------+--------------+

| b001 | 3549867 | 0 |

+------------+-------------+--------------+

| c003 | 123442 | 0 |

+------------+-------------+--------------+

| c003 | 234335 | 0 |

+------------+-------------+--------------+

| c003 | 234336 | 0 |

+------------+-------------+--------------+

| c003 | 3549867 | 5 |

+------------+-------------+--------------+

>>> sheet2.save_to_database(session=Session(), table=Proteins)

Here is the data inserted:

6.2. How to inject csv data to database 51

pyexcel Documentation, Release 0.6.0

$ sqlite3 /tmp/stack2.db

sqlite> select

from allproteins

...> ;

|a001|123442|6

|a001|234335|0

|a001|234336|0

|a001|3549867|8

|b001|123442|4

|b001|234335|2

|b001|234336|0

|c003|123442|0

|c003|234335|0

|c003|234336|0

|c003|3549867|5

52 Chapter 6. Real world cases

CHAPTER 7

API documentation

7.1 API Reference

This is intended for users of pyexcel.

7.1.1 Signature functions

Obtaining data from excel ﬁle

It is believed that once a Python developer could easily operate on list, dictionary and various mixture of both. This

library provides four module level functions to help you obtain excel data in those formats. Please refer to “A list of

module level functions”, the ﬁrst three functions operates on any one sheet from an excel book and the fourth one

returns all data in all sheets in an excel book.

get_array(**keywords) Obtain an array from an excel source

get_dict([name_columns_by_row]) Obtain a dictionary from an excel source

get_records([name_columns_by_row]) Obtain a list of records from an excel source

get_book_dict(**keywords) Obtain a dictionary of two dimensional arrays

pyexcel.get_array

pyexcel.get_array(**keywords)

Obtain an array from an excel source

It accepts the same parameters as get_sheet() but return an array instead.

Not all parameters are needed. Here is a table

pyexcel Documentation, Release 0.6.0

source parameters

loading from ﬁle ﬁle_name, sheet_name, keywords

loading from string ﬁle_content, ﬁle_type, sheet_name, keywords

loading from stream ﬁle_stream, ﬁle_type, sheet_name, keywords

loading from sql session, table

loading from sql in django model

loading from query sets any query sets(sqlalchemy or django)

loading from dictionary adict, with_keys

loading from records records

loading from array array

loading from an url url

Parameters

ﬁle_name : a ﬁle with supported ﬁle extension

ﬁle_content : the ﬁle content

ﬁle_stream : the ﬁle stream

ﬁle_type : the ﬁle type in ﬁle_content or ﬁle_stream

session : database session

table : database table

model: a django model

adict: a dictionary of one dimensional arrays

url : a download http url for your excel ﬁle

with_keys : load with previous dictionary’s keys, default is True

records : a list of dictionaries that have the same keys

array : a two dimensional array, a list of lists

sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded

start_row [int] defaults to 0. It allows you to skip rows at the begginning

row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.

start_column [int] defaults to 0. It allows you to skip columns on your left hand side

column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.

skip_row_func: It allows you to write your own row skipping functions.

The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-

cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the

reading procedure

skip_column_func: It allows you to write your own column skipping functions.

The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-

cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the

reading procedure

skip_empty_rows: bool Defaults to False. Toggle it to True if the rest of empty rows are useless, but it does

affect the number of rows.

row_renderer: You could choose to write a custom row renderer when the data is being read.

54 Chapter 7. API documentation

pyexcel Documentation, Release 0.6.0

auto_detect_ﬂoat : defaults to True

auto_detect_int : defaults to True

auto_detect_datetime : defaults to True

ignore_inﬁnity : defaults to True

library : choose a speciﬁc pyexcel-io plugin for reading

source_library : choose a speciﬁc data source plugin for reading

parser_library : choose a pyexcel parser plugin for reading

skip_hidden_sheets: default is True. Please toggle it to read hidden sheets

Parameters related to csv ﬁle format

for csv, fmtparams are accepted

delimiter : ﬁeld separator

lineterminator : line terminator

encoding: csv speciﬁc. Specify the ﬁle encoding the csv ﬁle. For example: encoding=’latin1’. Especially,

encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom

header used in parser.

escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to

QUOTE_NONE and the quotechar if doublequote is False.

quotechar : A one-character string used to quote ﬁelds containing special characters, such as the delimiter or

quotechar, or which contain new-line characters. It defaults to ‘”’

quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on

any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.

skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.

Parameters related to xls ﬁle format: Please note the following parameters apply to pyexcel-xls. more details

can be found in xlrd.open_workbook()

logﬁle: An open ﬁle to which messages and diagnostics are written.

verbosity: Increases the volume of trace material written to the logﬁle.

use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.

Current heuristic: mmap is used if it exists.

encoding_override: Used to overcome missing or bad codepage information in older-version ﬁles.

formatting_info: The default is False, which saves memory.

When True, formatting information will be read from the spreadsheet ﬁle. This provides all cells, including

empty and blank cells. Formatting information is available for each cell.

ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the

same size as found in ncols.

True means that there are no empty cells at the ends of rows. This can result in substantial memory savings

if rows are of widely varying sizes. See also the row_len() method.

7.1. API Reference 55

pyexcel Documentation, Release 0.6.0

pyexcel.get_dict

pyexcel.get_dict(name_columns_by_row=0, **keywords)

Obtain a dictionary from an excel source

It accepts the same parameters as get_sheet() but return a dictionary instead.

Speciﬁcally: name_columns_by_row : specify a row to be a dictionary key. It is default to 0 or ﬁrst row.

If you would use a column index 0 instead, you should do:

get_dict(name_columns_by_row=-1, name_rows_by_column=0)

Not all parameters are needed. Here is a table

source parameters

loading from ﬁle ﬁle_name, sheet_name, keywords

loading from string ﬁle_content, ﬁle_type, sheet_name, keywords

loading from stream ﬁle_stream, ﬁle_type, sheet_name, keywords

loading from sql session, table

loading from sql in django model

loading from query sets any query sets(sqlalchemy or django)

loading from dictionary adict, with_keys

loading from records records

loading from array array

loading from an url url

Parameters

ﬁle_name : a ﬁle with supported ﬁle extension

ﬁle_content : the ﬁle content

ﬁle_stream : the ﬁle stream

ﬁle_type : the ﬁle type in ﬁle_content or ﬁle_stream

session : database session

table : database table

model: a django model

adict: a dictionary of one dimensional arrays

url : a download http url for your excel ﬁle

with_keys : load with previous dictionary’s keys, default is True

records : a list of dictionaries that have the same keys

array : a two dimensional array, a list of lists

sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded

start_row [int] defaults to 0. It allows you to skip rows at the begginning

row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.

start_column [int] defaults to 0. It allows you to skip columns on your left hand side

column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.

56 Chapter 7. API documentation

pyexcel Documentation, Release 0.6.0

skip_row_func: It allows you to write your own row skipping functions.

The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-

cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the

reading procedure

skip_column_func: It allows you to write your own column skipping functions.

The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-

cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the

reading procedure

skip_empty_rows: bool Defaults to False. Toggle it to True if the rest of empty rows are useless, but it does

affect the number of rows.

row_renderer: You could choose to write a custom row renderer when the data is being read.

auto_detect_ﬂoat : defaults to True

auto_detect_int : defaults to True

auto_detect_datetime : defaults to True

ignore_inﬁnity : defaults to True

library : choose a speciﬁc pyexcel-io plugin for reading

source_library : choose a speciﬁc data source plugin for reading

parser_library : choose a pyexcel parser plugin for reading

skip_hidden_sheets: default is True. Please toggle it to read hidden sheets

Parameters related to csv ﬁle format

for csv, fmtparams are accepted

delimiter : ﬁeld separator

lineterminator : line terminator

encoding: csv speciﬁc. Specify the ﬁle encoding the csv ﬁle. For example: encoding=’latin1’. Especially,

encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom

header used in parser.

escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to

QUOTE_NONE and the quotechar if doublequote is False.

quotechar : A one-character string used to quote ﬁelds containing special characters, such as the delimiter or

quotechar, or which contain new-line characters. It defaults to ‘”’

quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on

any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.

skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.

Parameters related to xls ﬁle format: Please note the following parameters apply to pyexcel-xls. more details

can be found in xlrd.open_workbook()

logﬁle: An open ﬁle to which messages and diagnostics are written.

verbosity: Increases the volume of trace material written to the logﬁle.

use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.

Current heuristic: mmap is used if it exists.

encoding_override: Used to overcome missing or bad codepage information in older-version ﬁles.

7.1. API Reference 57

pyexcel Documentation, Release 0.6.0

formatting_info: The default is False, which saves memory.

When True, formatting information will be read from the spreadsheet ﬁle. This provides all cells, including

empty and blank cells. Formatting information is available for each cell.

ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the

same size as found in ncols.

True means that there are no empty cells at the ends of rows. This can result in substantial memory savings

if rows are of widely varying sizes. See also the row_len() method.

pyexcel.get_records

pyexcel.get_records(name_columns_by_row=0, **keywords)

Obtain a list of records from an excel source

It accepts the same parameters as get_sheet() but return a list of dictionary(records) instead.

Speciﬁcally: name_columns_by_row : specify a row to be a dictionary key. It is default to 0 or ﬁrst row.

If you would use a column index 0 instead, you should do:

get_records(name_columns_by_row=-1, name_rows_by_column=0)

Not all parameters are needed. Here is a table

source parameters

loading from ﬁle ﬁle_name, sheet_name, keywords

loading from string ﬁle_content, ﬁle_type, sheet_name, keywords

loading from stream ﬁle_stream, ﬁle_type, sheet_name, keywords

loading from sql session, table

loading from sql in django model

loading from query sets any query sets(sqlalchemy or django)

loading from dictionary adict, with_keys

loading from records records

loading from array array

loading from an url url

Parameters

ﬁle_name : a ﬁle with supported ﬁle extension

ﬁle_content : the ﬁle content

ﬁle_stream : the ﬁle stream

ﬁle_type : the ﬁle type in ﬁle_content or ﬁle_stream

session : database session

table : database table

model: a django model

adict: a dictionary of one dimensional arrays

url : a download http url for your excel ﬁle

with_keys : load with previous dictionary’s keys, default is True

records : a list of dictionaries that have the same keys

58 Chapter 7. API documentation