The Comprehensive Guide To Google Sheets With Python

James Phoenix
James Phoenix

Are you looking to level up your google sheets game with automation? In the guide below we’ll be exploring how to use google sheets with python 🐍



Learning Outcomes

  • To learn how to authenticate with Google Sheets within a .py file or jupyter notebook file.
  • To learn how to read from a Google Sheet.
  • To learn how to write (upload) to a Google Sheet.
  • To learn how to manipulate data within Google Sheets.
  • To learn how to store data to a local file from a Google Sheet.
  • To learn how to share a Google Sheet with other gmail users.

Module Imports

In order to run the module imports below you will need to have installed google’s recommended python client:

A quick way to do that would be to run the following command and to restart your Jupyter Notebook server:


pip install --upgrade google-api-python-client

Additionally I would encourage you to download the gcloud software development kit (SDK), which will come in useful for deploying and access your Google Cloud Platform resources:

You can find more information on how to do that from:

  • https://cloud.google.com/python/setup
  • https://cloud.google.com/sdk/docs
pip install --upgrade google-api-python-client
import pandas as pd
import json
import csv
from google.oauth2 import service_account

Download Pygsheets

We will be using a python package called pygsheets to easily interact with a new google sheet that we will create.

If you’re using anaconda you can navigate to your terminal and install the latest version of this package with:


conda install pygsheets

Alternatively if you’re using pip or pip3 to manage your packages you can type either:


pip install pygsheets
pip3 install pygsheets
!pip install pygsheets

As you can see from above I’ve previously installed pygsheets, after you’ve installed the python package if you’re working from a jupyter notebook, then please restart the Kernel:

import pygsheets

Authenticating To A Google Cloud Project With A .JSON Service Account Key

Using a previously obtained local .json file you can authenticate to your google service account.

with open('service_account.json') as source:
    info = json.load(source)
credentials = service_account.Credentials.from_service_account_info(info)

Create A New Google Sheet + Obtain The Unique ID

Firstly we are going to create a new google sheet, then we’ll obtain the id of that specific google sheet which can be found within the URL:


Google Sheet Wizardry With Python Commands

Authenticating With Google Sheets With Pyghseets

You can successfully authenticate to google sheets with a .json key like so:

client = pygsheets.authorize(service_account_file='service_account.json')

How To Connect To a Specific Google sheet

Now that we’ve authenticated pygsheets with our google cloud project let’s connect to a specific google sheet.

In order to do this we will need to grab the URL link and share the spreadsheet.

spreadsheet_url = 
"https://docs.google.com/spreadsheets/d/1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q/edit?usp=sharing"
  • Additionally we will extract the spreadsheet ID which can be located in the URL above between /d/ and /edit?
  • This id will be unique for you so please make sure to code along!
sheet_data = client.sheet.get('1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q')

There are three ways to open the spreadsheet:

  • By the spreadsheet title.
  • By the unique spreadsheet ID.
  • By the exact URL of the ghseet.

🔥 Let’s showcase all three and you can take your pick 🔥

# sheet = client.open('Sheet1') ## You will need to activate the Google Drive API as well as the spreadsheet API for this one to work
sheet = client.open_by_key('1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q')
sheet = client.open_by_url('https://docs.google.com/spreadsheets/d/1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q/edit?usp=sharing')
print(sheet)

How To Select A Specific Google Worksheet

Let’s select the automatically created worksheet called Sheet1:

wks = sheet.worksheet_by_title('Sheet1')
print(wks)

Accessing Rows & Columns Meta-data

After uploading a dataframe into a google sheet, you can see how many columns and rows you have with:


wks.cols # To view the number of columns
wks.rows # To view the number of rows
print(f"There are {wks.cols} columns in the gsheet!")
print(f"There are {wks.rows} rows in the gsheet!")
# There are 19 columns in the gsheet!
# There are 1001 rows in the gsheet!

How To Upload Data To A Google Sheet From A Pandas DataFrame

I am going to use the keyword data that I originally downloaded from Ahrefs, however if you have another csv that’s absolutely fine. I’d encourage you to simply read any csv file and to create a pandas dataframe:

df = pd.read_csv('../1_Keywords/data/digital-marketing-keyword-ideas.csv', 
                 delimiter='t', 
                 encoding='UTF-16')

df.drop(columns=['#'], inplace=True)

To upload a dataframe to this worksheet we can use the following syntax:


set_dataframe(df, start, copy_index=False, copy_head=True, extend=False, fit=False, escape_formulae=False, **kwargs)
wks.set_dataframe(df, start=(1,1))

It’s worth remembering that when you run the above command you will overwrite any cell values.

However if you prefer to progressively append new rows to the bottom of the worksheet, please use the below syntax:

wks.set_dataframe(df, start=(1,1), extend=True)

Pro Tip: If you don’t have enough rows within your existing worksheet, then uploading a pandas dataframe might fail! In order to account for this you can check the existing number of rows within the worksheet vs the number of rows within the dataframe and add extra rows if required. We will do the same checking for the columns too!


# If the number of rows within the worksheet is less than the dataframe:
if wks.rows < df.shape[0]:
    number_of_rows_to_add = df.shape[0] - wks.rows + 1 
    # Adding the required number of rows
    wks.add_rows(number_of_rows_to_add)
# If the number of cols within the worksheet is less than the dataframe:
elif wks.cols < df.shape[1]:
    number_of_cols_to_add = df.shape[1] - wks.cols + 1 
    wks.add_cols(number_of_cols_to_add)
else:
    pass

How To Download A Pandas Dataframe From A Google Sheets Specific Worksheet

We can also get data directly from any google sheets worksheet in the form of a pandas dataframe:


get_as_df(has_header=True, index_column=None, start=None, end=None, numerize=True, empty_value='', value_render=, **kwargs)[source]
dataframe_two = wks.get_as_df()
print(f"This new dataframe has {dataframe_two.shape[0]} rows and {dataframe_two.shape[1]} columns")
# This new dataframe has 1000 rows and 11 columns
dataframe_two.head(6)

How To Get All Of The Google Sheet Values In A Python Format

We can also get all of the values in a list of lists:

all_values = wks.get_all_values()

In my case, notice how this has picked up all of the empty spaces that are located on the right hand side of the worksheet:

print(all_values[0])

['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks', 'CPS', 'Return Rate', 'Parent Keyword', 'Last Update', 'SERP Features', '', '', '', '', '', '', '', '']

A way to quickly remove the empty strings per list would be to do a nested list comprehension:

cleaned_values = [[item for item in unique_list if item ]for unique_list in all_values]

How To Get Cell Ranges In Google Sheets With Python

You can also extract specific ranges (columns and rows) similar to your excel functions:

cell_range = wks.range('A1:F10', 
          returnas='matrix')
print(cell_range)
[['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks'], ['coast', 'gb', '42', '70000', '2.5', '64715'], ['hubspot', 'gb', '67', '63000', '5', '59708'], ['digital marketing', 'gb', '74', '19000', '7', '11033'], ['digital', 'gb', '89', '16000', '2.5', '5912'], ['content meaning', 'gb', '45', '4400', '17', '622'], ['digital media', 'gb', '24', '3600', '3', '1671'], ['digital marketing agency', 'gb', '57', '3400', '18', '3925'], ['digital uk', 'gb', '24', '3100', '1.2', '2402'], ['what is digital marketing', 'gb', '74', '3100', '2.5', '2119']]

How To Get A Single Row For Extracting The Column Headers

We can get a single row with:


wks.get_row(row, returnas='matrix', include_tailing_empty=True, **kwargs)
headers = wks.get_row(1, include_tailing_empty=False)
print(headers)
['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks', 'CPS', 'Return Rate', 'Parent Keyword', 'Last Update', 'SERP Features']

☝️☝️☝️ Also notice how we’ve changed the include_tailing_empty to False which automatically removed any empty strings in that row ☝️☝️☝️


How To Extract A Single Column

Sometimes you might want to select a specific column from your worksheet:

first_column = wks.get_col(1)
first_column_data = first_column[1:] # We are doing a python slice here to avoid 
# extrecting the column names from the first row (keyword)

How To Sort By A Column

You can easily sort the google sheet by a specific column with:


wks.sort_range(start, end, basecolumnindex=0, sortorder='ASCENDING')
# Sorts the data in rows based on the given column index.
wks.sort_range(start='A2', end='L1001',basecolumnindex=4, sortorder='DESCENDING' )

Its worth knowing that np.nans (not a number) are treated as large numbers within google sheets. Therefore let’s clear the sheet, remove all of the nans before uploading the dataframe, then we will perform this operation again.


How To Clear A Google Sheet With Python

You can clear all of the existing rows and columns within the sheet by referencing the same workspace variable (wks) and the .clear() syntax:

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

wks.clear()

Now I’d recommend removing any nans before uploading your dataframe! So let’s remove any np.nans within the Volume Column:

df.dropna(subset=['Volume'], inplace=True)
wks.set_dataframe(df, start=(1,1))

Let’s try sorting again by the Volume column with pygsheets:

wks.sort_range(start='A2', end='L1001', basecolumnindex=4, sortorder='DESCENDING')

How To Loop Over Every Row

You can iterate over row in your google sheet with a for loop:

for row in wks:
    print(row)

Updating Specific Parts Of Your Google Sheet

Updating parts of your Google sheet gives you the flexibility to do a raw data dump into google sheets.

Afterwards and if other situations or API’s change, you can specifically update any column, row or a range within your google sheet.


How To Replace Specific Values Within Your Google Sheet

Replacing values within your google sheets is incredibly easy and can be done via:


wks.replace(pattern, replacement=None, **kwargs)

Let’s for example try replacing any np.nans (not a number) in the Clicks and CPS column:

wks.replace("NaN", replacement="0")

Although this has indeed replaced the “NaN” values, the operation has occurred across all columns and rows within the worksheet. This might not be exactly what you’re looking for.

For example maybe you’re simply looking to replace cell values within 2 columns?


Introducing Find And Update Values

wks.clear()
wks.set_dataframe(df, start=(1,1))

We will perform the following operations:

  • Find all of the cells between column 3 and 6 (exclusive so only two columns will be selected).
  • Then match against NaN (not a number) cells.
cells = wks.find("NaN", searchByRegex=False, matchCase=False, 
     matchEntireCell=False, includeFormulas=False, 
    cols=(3,6), rows=None, forceFetch=True)
print(cells[0:5])
[, , , , ]

How To Update A Range of Values

If you’d like to update a range of values use the following syntax, for this example we’ll assign the selected nan values to “Other”:

wks.update_values(crange=None, values=None, cell_list=None, extend=False, majordim='ROWS', parse=None)
for cell in cells:
    cell.value = "Other"
wks.update_values(cell_list=cells)

As you can see, we’ve only updated the CPC and Clicks column “NaN” values with “Other” and the other column values remain the same!

In the example above we’ve specifically referenced a cell_list, however you could also use the two methods below:


# Update a single cell.
wks.update_value('A1', "Numbers on Stuff")

# Update the worksheet with the numpy array values. Beginning at cell 'A2'.
wks.update_values('A2', my_numpy_array.to_list())

Column Data Manipulation

How To Add Multiple Columns

To add columns to your existing worksheet you can use this syntax:

wks.add_cols(5)

How To Get The Column Names With The Column Index Positions

Firstly we will extract all of the headers on row 1, then we will wrap this variable with enumerate and convert it into a python dictionary.

This will allow us to create a lookup table for index positions!


Notice below how we have also changed:


include_tailing_empty=True

The reason why is that we just want to make sure that if we had empty column values between tables, we would also pick up any empty string values between columns.


headers = wks.get_row(1, include_tailing_empty=True)
enumerated_headers = list(enumerate(headers))
print(f"{enumerated_headers}")
[(0, 'Keyword'), (1, 'Country'), (2, 'Difficulty'), (3, 'Volume'), (4, 'CPC'), (5, 'Clicks'), (6, 'CPS'), (7, 'Return Rate'), (8, 'Parent Keyword'), (9, 'Last Update'), (10, 'SERP Features'), (11, ''), (12, ''), (13, ''), (14, ''), (15, ''), (16, ''), (17, ''), (18, '')]

Now let’s remove any of the enumerated headers that have an empty string with a list comprehension:

enumerated_headers = [tuple_object for tuple_object in enumerated_headers if tuple_object[1]]
print(f"These are the cleaned, enumerated headers: n n {enumerated_headers}")
These are the cleaned, enumerated headers: 

 [(0, 'Keyword'), (1, 'Country'), (2, 'Difficulty'), (3, 'Volume'), (4, 'CPC'), (5, 'Clicks'), (6, 'CPS'), (7, 'Return Rate'), (8, 'Parent Keyword'), (9, 'Last Update'), (10, 'SERP Features')]

We can then convert this list of tuples into a python dictionary to create a look up table for columns versus their index positions:

lookup_table = dict(enumerated_headers)
lookup_table_reversed = {value: key for key, value in lookup_table.items()}
print(f"From this you can see how we've created a lookup table via a python dictionary where the key is the column name and the value is the index position: nn {lookup_table_reversed}")
From this you can see how we've created a lookup table via a python dictionary where the key is the column name and the value is the index position: 

 {'Keyword': 0, 'Country': 1, 'Difficulty': 2, 'Volume': 3, 'CPC': 4, 'Clicks': 5, 'CPS': 6, 'Return Rate': 7, 'Parent Keyword': 8, 'Last Update': 9, 'SERP Features': 10}

How To Delete Columns

You will need to pass two parameters when deleting columns via pyghseets:


wks.delete_cols(index, number=1)

index – Index of first column to delete.
number – Number of columns to delete.
print(f"Currently we have {wks.cols} columns in the google sheet.")
# Currently we have 19 columns in the google sheet.
max(lookup_table.keys()) + 1 
# 11

As we can see the highest index position that we have as a true column from row 1 is 10. Therefore let’s delete all of the columns from index position 12 and greater:


Pro tip: Now the reason why it is 12 and not 11 is because pygsheets starts its index positioning for columns and rows at 1, whilst python starts its indexing at position 0. Also pygsheets is inclusive with the starting index (1 means from 1 onwards, rather than column 2 onwards).

wks.delete_cols(index=12,number=wks.cols - 10)

You can now easily delete multiple columns by creating your own column index with enumerate!


How To Insert Columns

The following syntax allows you to insert multiple columns:


wks.insert_cols(col, number=1, values=None, inherit=False)

Parameters:
col – Index of the col at which the values will be inserted.
number – Number of columns to be inserted.
values – Content to be inserted into new columns.
inherit – New cells will inherit properties from the column to the left (True) or to the right (False).

How To Update A Single Column

In order to update a single column, use this syntax:


wks.update_col(index, values, row_offset=0)

We can use the previous lookup table that we created to search for a specific column:

data = wks.update_col(index = lookup_table_reversed['CPS'] + 1
               ,values = wks.rows * ["NA"], 
               row_offset=1)

Protip: If you need to create a list of a specific size in python use the following code:


this_list_will_be_10_5s = [5] * 10
print([5] * 10)

# [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

How To Adjust Column Width

Do you need to adjust specific columns to be a certain width? No problem!


wks.adjust_column_width(start, end=None, pixel_size=None)

Also its worth remembering that whenever you use wks.clear(), all of the formatting will be erased from the worksheet.


We can just provide pygsheets with a start index and end index for columns with a specific pixel size like so:

wks.adjust_column_width(start=1, end=3, pixel_size=300)

Row Data Manipulation

How To Select A Single Row

To select a single row you can just use:


wks.get_row(row, returnas='matrix', include_tailing_empty=True, **kwargs)

include_tailing_empty – whether to include empty trailing cells/values after last non-zero value
row – index of row
kwargs – all parameters of pygsheets.Worksheet.get_values()
returnas – (‘matrix’, ‘cell’, ‘range’) return as cell objects or just 2d array or range object
wks.get_row(1)
['Keyword',
 'Country',
 'Difficulty',
 'Volume',
 'CPC',
 'Clicks',
 'CPS',
 'Return Rate',
 'Parent Keyword',
 'Last Update',
 'SERP Features']

How To Insert Multiple Rows

Selecting multiple rows is very easy and can be done like so:


wks.insert_rows(row, number=1, values=None, inherit=False)

Let’s find the last row to add a row, and we can insert a new row at index position 3.

values = wks.cols * ["Test"]
print(f"The last row is {wks.rows}")
print(f"These are the values which will be appended to the new row: n n {values}")
The last row is 1001
These are the values which will be appended to the new row: 

 ['Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test', 'Test']
wks.insert_rows(3, values=values )

Pro Tip: When you insert new rows, they must be below the index number of the final row. If you need to create more rows, I’d recommend using this method instead:


wks.add_rows(500)

How To Delete Rows

To delete rows, utilise this syntax:


wks.delete_rows(index, number=1)

:param index:   Index of first row to delete
:param number:  Number of rows to delete
wks.delete_rows(5, number=7)

Formatting Google Sheets In Python

Frozen Rows + Columns

You can easily set the number of Frozen Rows + Columns with the two commands:

wks.frozen_rows = 1
wks.frozen_cols = 2

How To Bold Cells

Let’s bold all of the cells in the first row:

from pygsheets.datarange import DataRange
model_cell = wks.cell('A1')
model_cell.set_text_format('bold', True)
DataRange('A1','K1', worksheet=wks).apply_format(model_cell)

The first image below shows the BEFORE, and the second image shows the AFTER after of applying the DataRange based upon the model cell!

Before:

After:


Batching Your API Calls

If you are going to make a lot of requests to the Google Sheets API, its recommended to batch your API requests. This reduces the chances that your API requests will be unsuccesful due to rate limits. The way that pygsheets implements API batching is as follows:


wks.unlink()
for i in range(10):
    wks.update_value((1, i), i) # wont call api
wks.link() # will do all the updates

How To Share A Google Sheet With An Email Address In Python

You can easily share any connected spreadsheet with:


.share(email_or_domain, role='reader', type='user', **kwargs)

Additionally you can include an email message to your user!

sheet.share('[email protected]', role='commenter', 
             type='user', emailMessage='Here is the spreadsheet we talked about!')

How To Export A Google Sheet To A .CSV

Also its possible to easily export your google sheet to a .csv file with:


wks.export(file_format=, filename=None, path='')

However we’ll need to enable the Google Drive API for this to work, so quickly go to your Google Cloud Project and activate this API for the following command to work:

wks.export(filename='this_is_a_csv_file')

How To Convert A Google Sheet Into JSON

Sometimes when communicating with another developer, they might want to receive your data in JSON format.

You can easily turn your google sheets data into a JSON API and feed it directly into another REST API by chaining these two commands:


wks.get_as_df().to_json()

This will do two things:

  • Firstly we will obtain the entire worksheet as a pandas dataframe.
  • Secondly we’ll convert the pandas dataframe into a JSON object.
wks.get_as_df().to_json()

How To Use Google’s Python Native API For Google Sheets

The above commands solely work within Pygsheets, however its important that you learn how to connect with Google’s Native Python API for google sheets.


How To Connect To The API

from googleapiclient.discovery import build

with open('service_account.json') as source:
    info = json.load(source)

credentials = service_account.Credentials.from_service_account_info(info)

Sending Your First Request

In this tutorial, we are going to be specifically look at the batchClear, batchGet, batchUpdate as these three end points are incredibly useful.


service = discovery.build('sheets', 'v4', credentials=credentials)

# The ID of the spreadsheet to update.
spreadsheet_id = 'my-spreadsheet-id'  # TODO: Update placeholder value.

batch_clear_values_request_body = {
    # The ranges to clear, in A1 notation.
    'ranges': [],  # TODO: Update placeholder value.

    # TODO: Add desired entries to the request body.
}

request = service.spreadsheets().values().batchClear(spreadsheetId=spreadsheet_id, body=batch_clear_values_request_body)
response = request.execute()

# TODO: Change code below to process the `response` dict:
print(response)

Let’s Understand This Step By Step:

  1. You will need to build a service that authenticates with the native google sheets api via a correct credentials object.
  2. Secondly you’ll need the relevant spreadsheet id.
  3. Thirdly you’ll need to create a batchClear request object.
  4. Finally we will attach the service object with the batchClear request object.

Clearing A Column / Range With batchClear

with open('service_account.json') as source:
    info = json.load(source)

credentials = service_account.Credentials.from_service_account_info(info)
google_ghseets_api= build('sheets', 'v4', credentials=credentials, cache_discovery=False)

spreadsheet_id = '1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q'

batch_clear_values_request_body = {
    # The ranges to clear, in A1 notation.
    'ranges': ['A1:A1000'],  # TODO: Update placeholder value..
}

request = google_ghseets_api.spreadsheets().values().batchClear(spreadsheetId=spreadsheet_id, 
                                                                body=batch_clear_values_request_body)
response = request.execute()

print(response)
{'spreadsheetId': '1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q', 'clearedRanges': ['Sheet1!A1:A989']}

How To Add Data Validation Boxes With batchUpdate

Adding data validation boxes is relatively easy, the code below will add several data validation boxes to the second column.

request = {
      "requests": [
        {
          "setDataValidation": {
            "range": {
              "sheetId": 0,
              "startRowIndex": 1,
              "startColumnIndex": 1,
               "endColumnIndex":2
            },
            "rule": {
              "condition": {
                "type": "ONE_OF_LIST",
          "values": [
            {
              "userEnteredValue": "EUROPE_LR"
            },
            {
              "userEnteredValue": "EUROPE_HR"
            },
            {
              "userEnteredValue": "WORLD_HR"
            },
            {
              "userEnteredValue": "WORLD_LR"
            }
            ]
           }
          }
        }
      }
   ]
 }

request = google_ghseets_api.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, 
                                                                body=request)
response = request.execute()

How To Retrieve A Range of Cells

# The A1:D1 notation of the values to retrieve.
ranges = ['A1:D1000']  # TODO: Update placeholder value.
request = google_ghseets_api.spreadsheets().values().batchGet(spreadsheetId=spreadsheet_id, ranges=ranges)
response = request.execute()
spreadsheet_data = response['valueRanges'][0]['values']
print(spreadsheet_data[0:3])
# [['', 'Country', 'Difficulty', 'Volume'], ['', 'gb', '42', '70000'], ['', 'EUROPE_LR', '67', '63000']]

Hopefully this guide has provided you with a comprehensive overview of how best to use Google Sheets with Python!
For more advanced techniques I’d recommend viewing the following documentation:

  • Pygsheets
  • Google’s Native RESTful Spreadsheet API

TaggedPython For SEO


More Stories

Cover Image for Soft Skills for Programmers: Why They Matter and How to Develop Them

Soft Skills for Programmers: Why They Matter and How to Develop Them

Overview You need a variety of soft skills in addition to technical skills to succeed in the technology sector. Soft skills are used by software professionals to collaborate with their peers effectively and profitably. Finding out more about soft skills and how they are used in the workplace will help you get ready for the job if you are interested in a…

James Phoenix
James Phoenix
Cover Image for What Are Webhooks? And How Do They Relate to Data Engineering?

What Are Webhooks? And How Do They Relate to Data Engineering?

Webhooks are a simple and powerful method for receiving real-time notifications when certain events occur. They enable a whole host of automated and interconnected applications. Broadly speaking, your apps can communicate via two main ways: polling and webhooks.  Polling is like going to a shop and asking for pizza – you have to ask whenever you want it. Webhooks are almost the…

James Phoenix
James Phoenix