Install packages
As we are going to work with an excel using pandas and mysql database, first we need to install the required packages.- pip install pandas
- pip install openpyxl
- pip install sqlalchemy
- pip install pymysql
複製代碼
openpyxl is required to work with .xlsx files. sqlalchemy and pymysql is required for connecting with mysql. Be ready with a sample .xlsx file
I have created a sample employee file 'employee.xlsx' having employee's first name, last name, age and profession.
Make a connection to your mysql
You should have your MySQL database ready. create_engine() lets you create an engine instance, basically a starting point of sqlalchemy application. You have to pass here a URL string that indicates database dialect and connection parameters. Replace username, password, host, port, database with your corresponding values. - from sqlalchemy import create_engineengine = create_engine("mysql+pymysql://" + "username" + ":" + "password" + "@" + "host" + ":" + "port" + "/" + "database" + "?" + "charset=utf8mb4")
複製代碼
After creating the instance, now you should establish a connection to the database using connect().
Now that we have our connection established, it's time to work a little with Pandas. Reading an excel file using Pandas
First create an excel file obj using pandas ExcelFile() and pass your filename inside. - import pandas as pdexcel_file = pd.ExcelFile('employee.xlsx')
複製代碼
After creating the excel file obj, now it's time to parse the excel sheet using parse() function. Pass the sheet name that you want to process.- excel_dataframe = excel_file.parse(sheet_name="employee_data")
複製代碼
And yes! The data has been copied to a dataframe.
Let's move on to the last step where we load this into our mysql table. Load the data into mysql
To load our excel_dataframe to mysql, we use to_sql() with table name and sql connection arguments. Here I have given table name as employee_data. The parameter if_exists checks if the table already exists in the database. If yes, in this case it will append the data. - excel_dataframe.to_sql("employee_data", conn, if_exists="append")
複製代碼
Don't worry if you don't have the table already. A table gets created with the name that you pass into to_sql(). Now go and query for the table results. You will see the data inserted.
method2 script
- import xlrd
- import MySQLdb
- # Open the workbook and define the worksheet
- book = xlrd.open_workbook("pytest.xls")
- sheet = book.sheet_by_name("source")
- # Establish a MySQL connection
- database = MySQLdb.connect (host="localhost", user = "root", passwd = "", db = "mysqlPython")
- # Get the cursor, which is used to traverse the database, line by line
- cursor = database.cursor()
- # Create the INSERT INTO sql query
- query = """INSERT INTO orders (product, customer_type, rep, date, actual, expected, open_opportunities, closed_opportunities, city, state, zip, population, region) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"""
- # Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
- for r in range(1, sheet.nrows):
- product = sheet.cell(r,).value
- customer = sheet.cell(r,1).value
- rep = sheet.cell(r,2).value
- date = sheet.cell(r,3).value
- actual = sheet.cell(r,4).value
- expected = sheet.cell(r,5).value
- open = sheet.cell(r,6).value
- closed = sheet.cell(r,7).value
- city = sheet.cell(r,8).value
- state = sheet.cell(r,9).value
- zip = sheet.cell(r,10).value
- pop = sheet.cell(r,11).value
- region = sheet.cell(r,12).value
- # Assign values from each row
- values = (product, customer, rep, date, actual, expected, open, closed, city, state, zip, pop, region)
- # Execute sql Query
- cursor.execute(query, values)
- # Close the cursor
- cursor.close()
- # Commit the transaction
- database.commit()
- # Close the database connection
- database.close()
- # Print results
- print ""
- print "All Done! Bye, for now."
- print ""
- columns = str(sheet.ncols)
- rows = str(sheet.nrows)
- print "I just imported " %2B columns %2B " columns and " %2B rows %2B " rows to MySQL!"
複製代碼 method3
- import pymysql
- import pandas
- file = r'D:\Python\finreport\tablebasicinfo.xlsx' ## 文件
- df = pandas.read_excel(file) ## 读文件
- print(df)
- infodata = pandas.DataFrame(df).values
- db = pymysql.connect(host='localhost',
- user='root',
- password='***',
- port=3307,
- database='reportdb')
- cursor = db.cursor()
- for i in range(0, len(infodata)): #
- data_each = []
- data_each = infodata[i]
- print(data_each)
- sql = "insert into table_basic_info values {}".format(tuple(data_each))
- try:
- cursor.execute(sql)
- db.commit()
- print(i)
- except:
- db.rollback()
- db.close # 关闭数据库连接
複製代碼
|