Save CSV Spreadsheets to SQLite Database using Python Pandas Dataframes


Use a For-Loop to Backup a Folder of Spreadsheets

In this quick tutorial, I want to show you how to save or backup your spreadsheets into a SQLite database using python. SQLite is a native database that comes with python and will be saved on your computer.

The goal here is to be able to use SQLite to backup a folder of spreadsheets by importing them into a database with each spreadsheet existing as a table in the database.

Step 1 will be importing the libraries we are going to need. The first import is sqlite3 that helps working with the database. Next, the pandas module is for reading the spreadsheets, turning them into dataframes and inserting them into the SQLite database. Then final two modules or libraries are to help us loop through the spreadsheets and insert them into the database.

Imports:

import sqlite3
import pandas as pd
import glob
import os

We will next specify the folders we want to save the database in and where our spreadsheets currently are.

Folders:

db_save_to_folder = 'C:/MySaveFolder'
Spreadsheets_Folder = 'C:/MySpreadsheetsFolder'

Then we will need to create a database connection. Then we will name the database and specify its folder location using the format method. The database will be created if it does not exist already (which it does not).

Database creation:

#create database connection
connection = sqlite3.connect('{}/BackUps.db'.format(db_save_to_folder))

Here is the bulk of our code. We are going to run a for-loop that will take each spreadsheet in the folder, read the data, turn the data into a pandas dataframe and then insert that data into the database as a unique database table.

Our Main Loop:

for f in glob.glob("{}/*.csv".format(Spreadsheets_Folder)):
      base = os.path.basename(f)
      basename = os.path.splitext(base)[0]

      #read csv file
      spreadsheet_data = pd.read_csv(f)

      #create dataframe from csv data
      csv_df = pd.DataFrame(data=spreadsheet_data)
      csv_df.to_sql(basename, connection, if_exists='replace', index = False)

print('Loop is done!')

The f in the above code represents the current spreadsheet in the loop. The *.csv in the first line tells the for-loop to process any and all csv files in that folder.

After that, we use the os module to basically break off just the name of the current csv file in use. We use that name later as the name for the unique database table. For example, AAPL.csv will later be a database table named AAPL with all its data. MSFT.csv will do the same, and so on.

Next, we read the current csv using the pandas read_csv method and then turn the data into a pandas dataframe called csv_df.

Finally, we will take the dataframe (csv_df) and insert it into the SQLite database using to_sql method. We will specify the name of the sheet (the os module variable ‘basename’), the database (‘connection’), what to do if the table exists (we will replace or overwrite any data with that name) and whether we want to have an index (False).

The final steps to insert the database is to run the commit to database code below. And then we can close the database connection to finish it off.

Our Final Steps, Commit and Close:

#commit changes
connection.commit()

#close connection
connection.close()

From here, we have taken all of our csv spreadsheets in our folder and inserted them into our new database (called BackUps). Each title of the spreadsheets is now the title of the tables of our database.

There are many reasons you may want to use the SQLite database including better organization, efficiency and speed.

Bonus:

A handy tool to use I have found to help you easily visualize what’s in your database is the free and open-source tool called DB Browser for SQLite. Between your python code and this tool, you should be able to tinker and experiment with SQLite databases.


Article by Zachary Storella – See more programming posts on our Python Page