This data set is based on the Connecticut state libraries. There are a total of 29 columns that was updated on 6/5/24. It has collected individual library serves and the financial data of each library. A brief overview of the data set can show the count of who is entering the library, renewals, registered borrowers, town tax appropriation and other funds set for the library. It can be found through this url https://data.ct.gov/Education/Public-Libraries/kf75-36tt/about_data
In our bar graph we visited the top 20 libraries that were visited in 2023. The Fairland county had majority of their libraries take hold. The highest being 457,667 visits at the Greenwich library. It’s interesting to note that the New Haven county only had 1 library that made it in the top 20. Our second graph is looking at one of the counties that did not make the top 20 specifically Tolland county. Using a scatterplot, we want to see if there is a relationship between the area and that of the borrowers. So if there is a population size of x amount and borrowers are close to that amount it would show that the CT library would want to provide more resources to the Tolland county. What we can see is that there is a shift when the population gets to be around 15,000, we do see a large increase of registered borrowers. This could mean that there needs to be a minimum of around 15k in the area to have the max amount of people registered.
We next look at the use of computers over the fiscal year. This is important for libraries as computers are becoming more accessible and the population has the ability to use their own. As expected there is a large dip overe the years but the height of the dip is at 2021. What’s interesting to note is that in 2022 we can see that there is a spike in computer usages again. Now looking at the wage and salaries expenditures over the fiscal years of 2015-2021 we see that 2020 was the most expensive year most likely due to COVID-19 and the fact that there was hazard pay, salaries had to shift due to budget, etc.
We will last look at the attendance program of the counties.The top 2 counties are Fairland and Hartford. Windham and Tolland have a low attendance record through out it’s years.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
from matplotlib.ticker import FuncFormatter
import wget
url= 'https://data.ct.gov/api/views/kf75-36tt/rows.csv?accessType=DOWNLOAD'
path= "C:/Users/jenni/OneDrive/Documents/ds736data/"
filename= "public_lib.csv"
# using pandas to read csv and saying how many rows we want to see and if we want to skip rows or not
# include what columns i want
df=pd.read_csv(path + filename, skiprows=0, usecols= ['Fiscal Year', 'Library', 'County','Total Library Visits', 'Population of Service Area',
'Library Visits Per Capita Served', 'Total Registered Borrowers', 'Library Materials Expenditures', 'Wages & Salaries Expenditures', 'Operating Expenditures', 'Total Programs (Synchronous + Prerecorded)',
'Total Program Attendance & Views', 'Use of Public Internet Computers',
'Total Program Attendance & Views Per Capita Served'])
# lets drop na
df.dropna(inplace=True) # want to maintain this new df
# og shape is 5105 rows
# filter by year
df2023=df[df['Fiscal Year']== 2023]
top20=df2023.nlargest(20, 'Total Library Visits')
plt.figure(figsize=(30,18)) #wide x height
lib=sns.barplot(x='Library', y= 'Total Library Visits', data=top20, hue= 'County', palette='Pastel1');
plt.xticks(rotation=90);
#annotate each bar w a value
for value in lib.patches:
plt.text(value.get_x() +value.get_width()/2, value.get_height(),
f'{value.get_height():,}', ha='center', va='bottom', fontsize=12)
plt.xlabel('Library', fontsize=20)
plt.ylabel('Total Library Visits', fontsize=15)
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{x:,.0f}'))
plt.title('Top 20 Libraries that were Visited in 2023' , fontsize=20 )
plt.show()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
from matplotlib.ticker import FuncFormatter
scat=df[df['County']=='Tolland'][['Population of Service Area', 'Total Registered Borrowers']]
scat['tot count']=round(scat['Total Registered Borrowers']/10, 0 )
# lets make a scatterplot
plt.figure(figsize=(20,8))
plt.scatter(scat['Population of Service Area'],scat['Total Registered Borrowers'], marker='o', c= scat['tot count'], cmap='plasma', s=scat['tot count'], edgecolors='grey')
plt.title('The Relationship between Population of Service Area and Total Registered Borrowers\n in the Tolland county area ')
plt.xlabel('Population of Service Area')
plt.ylabel('Total Registered Borrowers')
# gives legend
cbar=plt.colorbar()
cbar.set_label('# of registered borrowers', rotation= 270, color= 'black', labelpad=30)
cbartick= [*range(100, int(scat['tot count'].max()), 100)]
cbar.set_ticks(cbartick)
myticklab=[*range(1000, int(scat['Total Registered Borrowers'].max()), 1000)];
myticklab=['{:,}'.format(each) for each in myticklab];
cbar.set_ticklabels(myticklab)
# formatting commas to labels
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{x:,.0f}'))
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, _: f'{x:,.0f}'))
plt.show()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
from matplotlib.ticker import FuncFormatter
compdf= df.groupby(['Fiscal Year', 'County'])[ 'Use of Public Internet Computers'].sum().reset_index()
fig=plt.figure(figsize=(18,10))
ax=fig.add_subplot(1,1,1)
# make variable for each county
my_colors = {
('Fairfield',): 'blue',
('Hartford',): 'pink',
('Litchfield',): 'grey',
('Middlesex',): 'orange',
('New Haven',): 'purple',
('New London',): 'green',
('Tolland',): 'red',
('Windham',): 'black'
}
my_colors={key[0]: color for key, color in my_colors.items()} # my groupby made the counties a tuple . this will make it a string
for key, grp in compdf.groupby(['County']):
key=key[0] # my groupby made the counties a tuple . this will make it a string
grp.plot(ax=ax, kind='line', x='Fiscal Year', y='Use of Public Internet Computers', color=my_colors.get(key,'yellow'), label=key, marker='o')
#labels
plt.title(' Computer Usage over the fiscal years', fontsize=20)
plt.ylabel('Usage count (per million)', fontsize=14)
plt.xlabel('Year', fontsize=14)
plt.xticks(range(2005, 2024), fontsize=10);
fig.show()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
from matplotlib.ticker import FuncFormatter
exdf = df[df['Fiscal Year'].isin([2015,2016,2017,2018,2019,2020, 2021])][['County','Fiscal Year','Wages & Salaries Expenditures']]
exdf=exdf.groupby(['Fiscal Year', 'County']).sum().reset_index()
#lets pivot
exdf=exdf.pivot(index='Fiscal Year', columns='County', values='Wages & Salaries Expenditures')
#now make ranks
exdf=exdf.rank(0,ascending=False, method='min')
#transpose it
exdf=exdf.T
# PLOT
fig=plt.figure(figsize=(18,10))
ax=fig.add_subplot(1,1,1)
exdf.plot(kind='line', ax=ax,marker='o', markeredgewidth=1, linewidth=5, markersize=20, markerfacecolor='white')
ax.invert_yaxis()
#tells you row and column amt
num_rows=exdf.shape[0]
num_col=exdf.shape[1]
# labels
plt.ylabel('Expenditures ranking', fontsize=15)
plt.xlabel('County', fontsize=15)
plt.yticks(range(1, num_col+1, 1), fontsize=14);
plt.title('Ranks of the Wages & Salaries Expenditures for the Counties from 2015-2020', fontsize= 20, pad=12)
#fixing legend
handles, labels =ax.get_legend_handles_labels()
handles=[handles[6], handles[5],handles[4], handles[3],handles[2], handles[1],handles[0] ]
labels=[ labels[6], labels[5], labels[4], labels[3], labels[2], labels[1], labels[0]]
ax.legend(handles, labels, bbox_to_anchor=(1.01,1.01), fontsize=14 ,
labelspacing=1,markerscale =0.4, borderpad=.5, handletextpad=0.8)
fig.show();
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
from matplotlib.ticker import FuncFormatter
attend_df= df.groupby(['Fiscal Year', 'County'])['Total Program Attendance & Views'].sum().reset_index(name='sum of attendance')
attend_df['count of 100k']=round(attend_df['sum of attendance']/1000,0)
attend_df=attend_df.pivot(index='Fiscal Year', columns='County', values='count of 100k')
#drop nan
attend_df=attend_df.dropna()
attend_df=attend_df.T
fig=plt.figure(figsize=(25,15))
ax=fig.add_subplot(1,1,1)
comma_fmt=FuncFormatter(lambda x, p: f"{int(x )}K")
ax=sns.heatmap(attend_df, linewidth=0.5,
annot=True,
cmap='plasma',
fmt=',.0f',
square=False,
annot_kws={'size':13}, # annotation for font
cbar_kws={'format':comma_fmt, 'orientation':'vertical'}); # colorbar
#titles
plt.title('Heatmap of Lowest and Highest program Attendance in the Counties( 1996-2023) \n This is in bases of 100K')
plt.xlabel('Fiscal Year', fontsize=14, labelpad=14)
plt.ylabel('County', fontsize=14, labelpad=14)
plt.yticks(rotation=0);
fig.show();
Based on our graphs, looking back at our bar graph where Fairland had the most visits, it was also the one with the most library usages. It could mean that they have a large population in general which therefore requires more supplies and effort.