Fixing Errors When Importing csv Tables into pgadmin4¶
Create a table in pgadmin4¶
At first, create a table in pgadmin4 that matches columns of the table in my local directory that I want to import to. Right-click on the table in pgadmin4, then click 'Import/Export data' to copy data into pgadmin4. If two tables not matching columns on names, order, format that will cause errors.
The way that reduces errors is to copy names of columns via python function 'dataset.columns' as follow:
import pandas as pd
df = pd.read_csv("/Users/nnthieu/Downloads/synthea_sample_data_csv_latest/supplies.csv")
df.shape
print(df.columns)
print(df.info())
Index(['DATE', 'PATIENT', 'ENCOUNTER', 'CODE', 'DESCRIPTION', 'QUANTITY'], dtype='object') <class 'pandas.core.frame.DataFrame'> RangeIndex: 1376 entries, 0 to 1375 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 DATE 1376 non-null object 1 PATIENT 1376 non-null object 2 ENCOUNTER 1376 non-null object 3 CODE 1376 non-null int64 4 DESCRIPTION 1376 non-null object 5 QUANTITY 1376 non-null int64 dtypes: int64(2), object(4) memory usage: 64.6+ KB None
'DATE', 'PATIENT', 'ENCOUNTER', 'CODE', 'DESCRIPTION', 'QUANTITY'
Copy this results I will have the names of columns of the table 'supplies' exactly in the names, order.
Also see the info above to know the data type of columns. Then I create a table using SQL codes as follow:
CREATE TABLE supplies (
DATE DATE
,PATIENT VARCHAR(100)
,ENCOUNTER VARCHAR(100)
,CODE VARCHAR(100)
,DESCRIPTION VARCHAR(100)
,QUANTITY INT );
These codes will create a table in pgadmin4 with columns and their data type as above.
Then right-click on the name of table in pgadmin4 to import table from local directory into.
Potential errors¶
Errors if not matching names, oder of columns¶
'DATE', 'PATIENT', 'ENCOUNTER', 'CODE', 'DESCRIPTION', 'QUANTITY'
This way makes the names, order correct.
Errors if not matching format of data types.¶
Errors may happen for DATE, INT, VARCHAR and long enough for VARCHAR.
3 types of DATE format:
DATE for '2024-05-15'
TIMESTAMP for '2024-05-15 05:20:34'
TIMESTAMPTZ for '2024-05-15T05:20:34Z'
Check the .csv file before in order to set data type for the table correctly. If the data type not matching exactly that will cause errors.
PATIENT VARCHAR(100) is long enough
DESCRIPTION VARCHAR(100) may not long enough, so set
DESCRIPTION VARCHAR(500) is long enough
QUANTIY VARCHAR(100) will cause error, it should be
QUANTITY INT
If errors happen, check the 'error message' to know what cause errors, at what columns. Then fix the errors exactly.
If in .csv file, the DATE column has a value of '20240415' that will cause error, so I have to correct this value before importing the table into pgadmin4 in order to avoid errors.
.