在现实世界中的公司业务环境中,大多数数据可能不会存储在文本或Excel文件中。基于SQL的关系数据库(例如Oracle,SQL Server,PostgreSQL和MySQL)得到了广泛使用,许多替代数据库已变得非常流行。
数据库的选择通常取决于应用程序的性能,数据完整性和可伸缩性需求。
在此示例中,我们将如何创建sqlite3数据库。sqllite默认与python安装一起安装,并且不需要任何进一步的安装。如果不确定,请尝试以下操作。我们还将进口熊猫。
从SQL将数据加载到DataFrame相当简单,pandas具有一些简化过程的功能。
import sqlite3 import pandas as pd print(f"Output \n {sqlite3.version}")
输出结果
2.6.0
输出结果
# connection object conn = sqlite3.connect("example.db") # customers data customers = pd.DataFrame({ "customerID" : ["a1", "b1", "c1", "d1"] , "firstName" : ["Person1", "Person2", "Person3", "Person4"] , "state" : ["VIC", "NSW", "QLD", "WA"] }) print(f"Output \n *** Customers info -\n {customers}")
输出结果
*** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA
# orders data orders = pd.DataFrame({ "customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"] , "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"] }) print(f"Output \n *** orders info -\n {orders}")
输出结果
*** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses
# write to the db customers.to_sql("customers", con=conn, if_exists="replace", index=False) orders.to_sql("orders", conn, if_exists="replace", index=False)
输出结果
# frame an sql to fetch the data. q = """ select orders.customerID, customers.firstName, count(*) as productQuantity from orders left join customers on orders.customerID = customers.customerID group by customers.firstName; """
输出结果
# run the sql. pd.read_sql_query(q, con=conn)
7.全部放在一起。
import sqlite3 import pandas as pd print(f"Output \n {sqlite3.version}") # connection object conn = sqlite3.connect("example.db") # customers data customers = pd.DataFrame({ "customerID" : ["a1", "b1", "c1", "d1"] , "firstName" : ["Person1", "Person2", "Person3", "Person4"] , "state" : ["VIC", "NSW", "QLD", "WA"] }) print(f"*** Customers info -\n {customers}") # orders data orders = pd.DataFrame({ "customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"] , "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"] }) print(f"*** orders info -\n {orders}") # write to the db customers.to_sql("customers", con=conn, if_exists="replace", index=False) orders.to_sql("orders", conn, if_exists="replace", index=False) # frame an sql to fetch the data. q = """ select orders.customerID, customers.firstName, count(*) as productQuantity from orders left join customers on orders.customerID = customers.customerID group by customers.firstName; """ # run the sql. pd.read_sql_query(q, con=conn)
输出结果
2.6.0 *** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA *** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses customerID firstName productQuantity ____________________________________ 0 a1 Person1 3 1 c1 Person3 2 2 d1 Person4 1