STAT 19000: Project 6 — Spring 2022
Motivation: We will pause in our series of pandas and numpy projects to learn one of the most important parts of writing programs — functions! Functions allow us to reuse snippets of code effectively. Functions are a great way to reduce the repetition of code and also keep the code organized and readable.
Context: We are focusing on learning about writing functions in Python.
Scope: python, functions, pandas, matplotlib
The following questions will use the following dataset(s):
Question 1
When submitting your .ipynb file for this project, if the .ipynb file doesn’t render in Gradescope, please export the notebook as a PDF and submit that as well — you will be helping the graders a lot! |
In project 5, you read in two separate, but related datasets, and used the pandas
method to combine them. In this project, we’ve provided you with a combined dataset already, called combined.csv
Read in combined.csv
into a data frame called dat
Your friend shared the following code with you.
import as px
def plot_stations(df, *ids):
df = df.groupby("station_id").head(1).loc[df['station_id'].isin(ids), ('station_id', 'latitude', 'longitude')]
fig = px.scatter_geo(df, lat="latitude", lon="longitude", scope="usa",
fig.update_layout(geo = dict(projection_scale=7, center=dict(lat=df['latitude'].iloc[0], lon=df['longitude'].iloc[0])))"jpg")
In order for your plotly maps to show up properly in Gradescope, you must use the |
Please do the following:
Give a 1-2 sentence explanation of what this function does.
Use the function to plot 2 or more stations, BUT, use the function in two different ways (with the same result). Use tuple unpacking in 1 call of the
function, and do not in the other.
I would highly recommend taking the time to read through the entire article here. It is a very detailed article going through all the things you can do with functions in Python. The section on tuple packing and tuple unpacking may be particularly useful to you! |
The documentation for the |
Code used to solve this problem.
Output from running the code.
Question 2
In project 5, question 5, you wrote a function called plot_stations
that given a data frame, would plot the locations of the stations on the map.
Modify your plot_stations
function so that it has an argument called weighted
that defaults to False
. If weighted
is True
, then the stations will be plotted with a size proportional to the number observations at each station.
plot_stations(df) # plots all stations same size
plot_stations(df, weighted=False) # plots all stations same size
plot_stations(df, weighted=True) # plots all stations with size proportional to the number of observations for the station
You can find the documentation on the |
This section will review default parameters. |
Code used to solve this problem.
Output from running the code.
Question 3
There are many columns in our data frame with numeric data. Some examples are: temperature_high
, temperature_low
, barometric_pressure
, wind_speed_high
, etc. Wouldn’t it be (kind of) cool to have an option in our plot_stations
function that would weight the size of the points on the map based on those values instead of the number of observations?
Modify the function so that it has another argument called weight_by
that defaults to None
. If weight_by
is None
(and weighted
is True
), the points on the plot should be sized by number of observations (like in question 2). Otherwise, weight_by
can accept a string with the name of the column to base the point sizes on. For example: plot_stations(dat, weighted=True, weight_by="temperature_high"
would create a plot where the size of the points are based on the median value of temperature_high
by station.
Please note, if weighted is |
Of course, not all of the columns in our dataset are appropriate to weight by. Please demonstrate your function works by running the following calls to plot_stations
plot_stations(dat, weighted=True, weight_by="temperature_high")
plot_stations(dat, weighted=True, weight_by="temperature_low")
plot_stations(dat, weighted=True, weight_by="wind_speed_high")
plot_stations(dat, weighted=False, weight_by="barometric_pressure")
plot_stations(dat, weighted=True, weight_by=None)
The wind_speed_high plot will have the most pronounced differences in size, but still rather small. |
Code used to solve this problem.
Output from running the code.
Question 4
You’ve learned a lot about plotting maps in plotly, the groupby
method (most likely), and hopefully functions as well!
Check out all of the datasets in the /depot/datamine/data/flights/subset
directory. Write a function that creates any new plot using some or all of the data in the subset
directory. The plots could be maps, other plots, anything you want! The goal should be to make the function useful for exploring flight data in the provided format. Take advantage of the tuple packing and unpacking, default arguments, etc. You could even have a function inside another function (a helper function). Do you best to challenge yourself and have fun. Any solid effort will receive full credit.
Code used to solve this problem.
Output from running the code.
Question 5 (optional, 0 pts)
Write a function that accepts the WHIN weather dataset (as a data frame), and an argument n. This function should plot the largest n distances between stations on a map. See here for examples of plotting lines on a map.
If you are feeling very adventurous, there is a data structure called a kdtree that you can use to very efficiently find the n closest or furthest points, however, this is probably not necessary as there are not that many distances to calculate for this dataset.
Code used to solve this problem.
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |