Analyzing data from POD's first survey to the iSchool¶

The polling and open data initiative at UW (POD) sent out its first poll to students pursuing a degree in informatics, receiving 42 replies. I analyzed some of the data regarding the different tracks under the iSchool, plans after college, and grad school interest.

Some of the data were visualized using matplotlib to find trends, while visuals to be published at POD used Flourish.

Cleaning the data¶

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

poll = pd.read_excel("poll1.xlsx")
poll.head(8)

table = poll[7:]
table.columns = poll.iloc[6]
table.loc[:, "total"] = np.ones(table.shape[0])
table.loc[:, "researchInt"] = 0
table.loc[table.research == "Yes", "researchInt"] = 1
table.head()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:844: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:965: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s

# Adding seperate categorical variables for each option under "plans"
table.loc[:, "workInTech"] = table.plans.str.contains("Work in information technology").astype(int)
table.loc[:, "nonprofit"] = table.plans.str.contains("Work in a nonprofit organization").astype(int)
table.loc[:, "travel"] = table.plans.str.contains("Travel abroad").astype(int)
table.loc[:, "educate"] = table.plans.str.contains("Work in education").astype(int)
table.loc[:, "unrelated"] = table.plans.str.contains("Work in a field unrelated to informatics").astype(int)
table.loc[:, "masters"] = table.plans.str.contains("Pursue a Master").astype(int)
table.loc[:, "phd"] = table.plans.str.contains("Pursue a PhD").astype(int)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:965: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s

Graduation Year vs GradSchool Interest¶

table.masters.sum()

19

table.phd.sum()

3

# Nobody signified phd but not masters; so we can use "msaters" as interest in Grad School
table.loc[(table.phd == 1) & (table.masters == 0)]

mastersInterest = table.groupby("GradYear").masters.sum()
_ = plt.bar(mastersInterest.index.astype(str), mastersInterest)

table.groupby("GradYear").researchInt.sum()

GradYear
2021    13
2022     2
2023     0
2024     0
Name: researchInt, dtype: int64

table.loc[table.track == "Custom"].groupby("GradYear").total.sum()

GradYear
2021    11.0
2022     1.0
2023     3.0
2024     2.0
Name: total, dtype: float64

As expected, upperclassmen have more research experience, and are more interested in grad school than underclassmen.

Research Areas¶

researchers = table.loc[table.researchInt == 1]

researchAreas = researchers.groupby(researchers.researchField).total.sum()
pieLabels = researchAreas.index
_ = plt.pie(researchAreas, labels=pieLabels)

print(researchers.shape[0], "total students with research experience")

15 total students with research experience

Custom tracked students¶

customTracked = researchers.loc[researchers.track == "Custom"]
researchAreasC = customTracked.groupby(customTracked.researchField).total.sum()
pieLabelsC = researchAreasC.index
_ = plt.pie(researchAreasC, labels=pieLabelsC)

researchAreasC

researchField
Human-Computer Interaction    5.0
Information Architecture      1.0
Name: total, dtype: float64

research vs grad school plans¶

researchVGrad = table.groupby("researchInt").sum()[["total", "masters"]]
researchVGrad

Correlation between interset in pursuing grad school and research experience seem to align, without neccessarily implying causation one way or the other.

Plans after the iSchool¶

planSums = table[["workInTech", "nonprofit", "travel", "educate", "unrelated", "masters", "phd", "researchInt"]].sum()
planSums

6
workInTech     39
nonprofit       8
travel         16
educate         8
unrelated       7
masters        19
phd             3
researchInt    15
dtype: int64

trackSums = table.groupby("track").total.sum()
trackSums

track
Biomedical & Health Informatics             1.0
Custom                                     17.0
Data Science                                8.0
Human-Computer Interaction                  7.0
Information Assurance and Cybersecurity     6.0
Undecided                                   3.0
Name: total, dtype: float64

summaryTable = table.groupby("track").sum().T
summaryTable["sums"] = planSums
summaryTable

tracks = summaryTable.T
tracks["sums"] = trackSums
tracks

table.to_csv("table.csv")
summaryTable.to_csv("trackPlansSummaryTable.csv")
tracks.to_csv("tracksSummary.csv")

	Unnamed: 0	Unnamed: 1	Unnamed: 2	Instruction Text	Unnamed: 4	Unnamed: 5	Unnamed: 6	Unnamed: 7	Unnamed: 8	Unnamed: 9	...	Unnamed: 18	Unnamed: 19	Unnamed: 20	On a scale from 1 to 5, 1 being not important and 5	Unnamed: 22	Unnamed: 23	Unnamed: 24	Unnamed: 25	Unnamed: 26	Unnamed: 27
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	being most important, rate the following facto...	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	on their impact on your answer to the previous...	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	Question	What is your graduation year?	What gender do you identify with?	Are you a transfer student?	Which Informatics degree option are you pursuing?	Are you double majoring? (If yes, please specify)	Are you minoring? (If yes, please specify)	...	If you answered ‚Äúwork in information technol...	If you answered ‚Äúpursue a Master's degree‚Äù...	Favorite Informatics class? (format: INFO XXX)	Professor/quality of teaching	Amount of work	Interesting peers	Relevance to job and career opportunities	Interest in course content	Class length and schedule	Check this box if you'd like to receive a one-...
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	question #13, which would you rather work at?	"pursue a PhD" to question #13, what is your t...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	with pictures of Rachel Kinkley's dog as our t...
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	graduate degree?	NaN	NaN	NaN	NaN	NaN	NaN	NaN	for completing this poll!
6	Participant ID	Start Date	Finish Date	NaN	GradYear	Gender	transfer	track	doubleMajor	minor	...	privacy	gradSchool	favClass	Professor/quality of teaching	Amount of work	Interesting peers	Relevance to job and career opportunities	Interest in course content	Class length and schedule	NaN
7	20219549	2020-11-21 21:07:00	2020-11-21 21:10:00	NaN	2022	Female	No, I started at UW	Data Science	No	No	...	No preference	Not applicable	INFO 340	3	3	2	4	5	3	NaN

6	Participant ID	Start Date	Finish Date	NaN	GradYear	Gender	transfer	track	doubleMajor	minor	...	favClass	Professor/quality of teaching	Amount of work	Interesting peers	Relevance to job and career opportunities	Interest in course content	Class length and schedule	NaN	total	researchInt
7	20219549	2020-11-21 21:07:00	2020-11-21 21:10:00	NaN	2022	Female	No, I started at UW	Data Science	No	No	...	INFO 340	3	3	2	4	5	3	NaN	1.0	0
8	20218830	2020-11-21 01:12:00	2020-11-21 01:15:00	NaN	2023	Male	No, I started at UW	Data Science	No	Undecided	...	INFO 201	3	1	2	4	5	1	NaN	1.0	0
9	20218718	2020-11-20 22:14:00	2020-11-20 22:20:00	NaN	2021	Male	No, I started at UW	Custom	No	No	...	INFO 441	3	3	4	5	4	3	999	1.0	0
10	20218435	2020-11-20 18:12:00	2020-11-20 18:14:00	NaN	2021	Male	No, I started at UW	Custom	No	No	...	INFO 441	3	2	4	5	5	2	NaN	1.0	0
11	20217086	2020-11-20 10:44:00	2020-11-20 11:21:00	NaN	2021	Male	Yes, I am a transfer student (transferred from...	Information Assurance and Cybersecurity	Yes: International Studies	No	...	INFO450	5	4	2	4	4	4	NaN	1.0	1

track	Biomedical & Health Informatics	Custom	Data Science	Human-Computer Interaction	Information Assurance and Cybersecurity	Undecided	sums
6
total	1.0	17.0	8.0	7.0	6.0	3.0	NaN
researchInt	0.0	6.0	2.0	3.0	3.0	1.0	15.0
workInTech	1.0	17.0	7.0	6.0	6.0	2.0	39.0
nonprofit	1.0	4.0	2.0	0.0	1.0	0.0	8.0
travel	1.0	7.0	3.0	1.0	2.0	2.0	16.0
educate	0.0	4.0	1.0	2.0	0.0	1.0	8.0
unrelated	1.0	2.0	1.0	2.0	1.0	0.0	7.0
masters	1.0	6.0	4.0	2.0	4.0	2.0	19.0
phd	0.0	1.0	1.0	0.0	1.0	0.0	3.0

6	total	researchInt	workInTech	nonprofit	travel	educate	unrelated	masters	phd	sums
track
Biomedical & Health Informatics	1.0	0.0	1.0	1.0	1.0	0.0	1.0	1.0	0.0	1.0
Custom	17.0	6.0	17.0	4.0	7.0	4.0	2.0	6.0	1.0	17.0
Data Science	8.0	2.0	7.0	2.0	3.0	1.0	1.0	4.0	1.0	8.0
Human-Computer Interaction	7.0	3.0	6.0	0.0	1.0	2.0	2.0	2.0	0.0	7.0
Information Assurance and Cybersecurity	6.0	3.0	6.0	1.0	2.0	0.0	1.0	4.0	1.0	6.0
Undecided	3.0	1.0	2.0	0.0	2.0	1.0	0.0	2.0	0.0	3.0
sums	NaN	15.0	39.0	8.0	16.0	8.0	7.0	19.0	3.0	NaN