The polling and open data initiative at UW (POD) sent out its first poll to students pursuing a degree in informatics, receiving 42 replies. I analyzed some of the data regarding the different tracks under the iSchool, plans after college, and grad school interest.
Some of the data were visualized using matplotlib to find trends, while visuals to be published at POD used Flourish.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
poll = pd.read_excel("poll1.xlsx")
poll.head(8)
table = poll[7:]
table.columns = poll.iloc[6]
table.loc[:, "total"] = np.ones(table.shape[0])
table.loc[:, "researchInt"] = 0
table.loc[table.research == "Yes", "researchInt"] = 1
table.head()
# Adding seperate categorical variables for each option under "plans"
table.loc[:, "workInTech"] = table.plans.str.contains("Work in information technology").astype(int)
table.loc[:, "nonprofit"] = table.plans.str.contains("Work in a nonprofit organization").astype(int)
table.loc[:, "travel"] = table.plans.str.contains("Travel abroad").astype(int)
table.loc[:, "educate"] = table.plans.str.contains("Work in education").astype(int)
table.loc[:, "unrelated"] = table.plans.str.contains("Work in a field unrelated to informatics").astype(int)
table.loc[:, "masters"] = table.plans.str.contains("Pursue a Master").astype(int)
table.loc[:, "phd"] = table.plans.str.contains("Pursue a PhD").astype(int)
table.masters.sum()
table.phd.sum()
# Nobody signified phd but not masters; so we can use "msaters" as interest in Grad School
table.loc[(table.phd == 1) & (table.masters == 0)]
mastersInterest = table.groupby("GradYear").masters.sum()
_ = plt.bar(mastersInterest.index.astype(str), mastersInterest)
table.groupby("GradYear").researchInt.sum()
table.loc[table.track == "Custom"].groupby("GradYear").total.sum()
As expected, upperclassmen have more research experience, and are more interested in grad school than underclassmen.
researchers = table.loc[table.researchInt == 1]
researchAreas = researchers.groupby(researchers.researchField).total.sum()
pieLabels = researchAreas.index
_ = plt.pie(researchAreas, labels=pieLabels)
print(researchers.shape[0], "total students with research experience")
customTracked = researchers.loc[researchers.track == "Custom"]
researchAreasC = customTracked.groupby(customTracked.researchField).total.sum()
pieLabelsC = researchAreasC.index
_ = plt.pie(researchAreasC, labels=pieLabelsC)
researchAreasC
researchVGrad = table.groupby("researchInt").sum()[["total", "masters"]]
researchVGrad
Correlation between interset in pursuing grad school and research experience seem to align, without neccessarily implying causation one way or the other.
planSums = table[["workInTech", "nonprofit", "travel", "educate", "unrelated", "masters", "phd", "researchInt"]].sum()
planSums
trackSums = table.groupby("track").total.sum()
trackSums
summaryTable = table.groupby("track").sum().T
summaryTable["sums"] = planSums
summaryTable
tracks = summaryTable.T
tracks["sums"] = trackSums
tracks
table.to_csv("table.csv")
summaryTable.to_csv("trackPlansSummaryTable.csv")
tracks.to_csv("tracksSummary.csv")