STAT 39000: Project 5 — Fall 2021
Testing Python: part II
Motivation: Code, especially newly written code, is refactored, updated, and improved frequently. It is for these reasons that testing code is imperative. Testing code is a good way to ensure that code is working as intended. When a change is made to code, you can run a suite a tests, and feel confident (or at least more confident) that the changes you made are not introducing new bugs. While methods of programming like TDD (test-driven development) are popular in some circles, and unpopular in others, what is agreed upon is that writing good tests is a useful skill and a good habit to have.
Context: This is the second in a series of two projects that explore writing unit tests, and doc tests. In The Data Mine, we will focus on using pytest
, and mypy
, while writing code to manipulate and work with data.
Scope: Python, testing, pytest, mypy
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/apple/health/2021/*
Questions
At this end of this project you will have only 3 files to submit:
Make sure that the output from running the cells is displayed and saved in your notebook before submitting. |
Question 1
First, setup your workspace in Brown. Create a new folder called project05
in your $HOME
directory. Make sure your apple_watch_parser
package from the previous project is in your $HOME/project05
directory. In addition, create your new notebook, firstname-lastname-project05.ipynb
in your $HOME/project05
directory. Great!
An updated
Note that this updated package is not required by any means to complete this project. |
Now, in $HOME/project05/apple_watch_parser/test_watch_data.py
, modify the test_time_difference
function to use parametrizing to test the time difference between 100 sets of times.
The following is an example of how to generate a list of 100 datetimes 1 day and 1 hour apart.
|
An example of how to convert a datetime to a string in the same format our
|
See the very first example here for how to parametrize a test for a function accepting 2 arguments instead of 1. |
The
|
You do not need to manually calculate the expected result for each combination of datetime’s that you will pass to the |
Run the pytest
tests from a bash cell in your notebook.
%%bash
cd $HOME/project05
python -m pytest
Relevant topics: parametrizing tests
-
Code used to solve this problem.
-
Output from running the code.
Question 2
Read and understand the filter_elements
method in the WatchData
class. This is an example of a function that accepts another function as an argument. A term for functions that accept other functions as arguments are higher-order functions. Think about the filter_elements
method, and list at least 2 reasons why this may be a good idea.
The docstring contains an explanation of the filter_elements
method. In addition, the module provides a function called example_filter
that can be used with the filter_elements
method.
Use the example_filter
function to filter the WatchData
class, and print the first 5 results.
Remember to import and use the package, make sure that the notebook is in the same directory as the
|
When passing a function as an argument to another function, you should not include the opening and closing parentheses in the argument. For example, the following is not correct.
Why? Because the |
-
Code used to solve this problem.
-
Output from running the code.
Question 3
Write your own *_filter
function in a Python code cell in your notebook (like example_filter
) that can be used with the filter_elements
method. Be sure to include a Google style docstring (no doctests are needed).
Does it work as intended? Print the first 5 results when using your filter.
-
Code used to solve this problem.
-
Output from running the code.
Question 4
In the previous project, we did not test out the filter_elements
method in our WatchData
class. Testing this method is complicated for two main reasons.
-
The method accepts any function following a set of rules (described in our docstring) as an argument. This (a
*_filter
function) may not be something that is available after immediately importing theWatchData
class — normally there wouldn’t be anexample_filter
function in the module for you to use, as this would be a function that a user of the library would create for their own purposes. -
In order to be able to test the
filter_elements
method, we would need a dataset that is similarly structured as the intended dataset (Apple Watch exports), that we know the expected output for, so we can test.
pytest
supports writing fixtures that can be used to solve these problems.
To address problem (1):
-
Remove the
example_filter
function from thewatch_data.py
module, and instead, modify thetest_watch_data.py
file and add theexample_filter
function to thetest_watch_data.py
module as apytest
fixture. Read this and 2 or 3 following sections. In addition, see this stackoverflow answer to better understand how to create a fixture that is a function that can accept arguments.
You may need to import lxml and other libraries in your
|
Why do we need to do something like the stackoverflow post describes? The reason is, by default, |
In the example in the stackoverflow post, the As a side note, sometimes helper functions (functions defined and used inside of another function) are called helper functions, and it is good practice to name them starting with an underscore — just like the |
You can start by cutting the |
To address problem (2):
-
Create a new
test_data
directory in yourapple_watch_parser
package. So,$HOME/project05/apple_watch_parser/test_data
should now exist. Add/depot/datamine/data/apple/health/2021/sample.xml
to this directory, and rename it toexport.xml
. So,$HOME/project05/apple_watch_parser/test_data/export.xml
should now exist.sample.xml
is a small sample of the the watch data that we can use for out tests. It is small enough to be portable, yet is similar enough to the intended types of datasets that it will be a good way to test ourWatchData
class and its methods. Since we renamed it toexport.xml
, it will work with ourWatchData
class. -
Create a
test_filter_elements
function in yourtest_watch_data.py
module. Use this library (already installed), to handle properly copying thetest_data/export.xml
file to a temporary directory for the test. Examples 2 and 3 here will be particularly helpful.You may be wondering why we would want to use this library for our test rather than just hard-coding the path to our test files in our test function(s). The reason is the following. What if one of your functions had a side-effect that modified your test data? Then, any other tests you run using the same data would be tainted and potentially fail! Bad news. This package allows for a systematic way to first copy our test data to a temporary location, and then run our test using the data in that temporary location.
In addition, if you have many test function that work on the same dataset, you can do something like the following to re-use the code over and over again.
export_xml_decorator = pytest.mark.datafiles(...) @export_xml_decorator def test_1(datafiles): pass @export_xml_decorator def test_2(datafiles): pass
Each of the tests,
test_1
andtest_2
, will work on the same example dataset, but will have a fresh copy of the dataset each time. Very cool!The decorator,
@pytest.mar.datafiles()
is expecting a path to the test data,export.xml
. To get the absolute path to the test data,$HOME/project05/apple_watch_parser/test_data/export.xml
, you can use thepathlib
library.test_watch_data.pyimport watch_data # since watch_data.py is in the same directory as test_watch_data.py, we can import it directly from pathlib import Path # To get the path of the watch_data Python module this_module_path = Path(watch_data.__file__).resolve().parent print(this_module_path) # $HOME/project05/apple_watch_parser # To get the test_data folders absolute path, we could then do print(this_module_path / 'test_data') # $HOME/project05/apple_watch_parser/test_data # To get the test_data/export.xml absolute path, we could then do ...? # HINT: The answer to this question is _exactly_ what should be passed to the `@pytest.mark.datafiles()` decorator. @pytest.mark.datafiles(answer_here) def test_filter_elements(datafiles, example_filter_fixture): # replace example_filter_fixture with the name of your fixture function pass
Okay, great! Your test_watch_data.py
module should now have 2 additional functions, "symbolically" something like this:
# from https://stackoverflow.com/questions/44677426/can-i-pass-arguments-to-pytest-fixtures
@pytest.fixture
def my_fixture():
def _method(a, b):
return a*b
return _method
@pytest.mark.datafiles(answer_here)
def test_filter_elements(datafiles, my_fixture):
pass
Fill in the test_filter_elements
function with at least 1 assert
statements that tests the filter_elements
function. It could be as simple as comparing the length of the output when using the example_filter
function as our filter. test_data/example.xml
should return 2 elements using our example_filter
function as the filter.
As a reminder, to run
|
If you get an error that says pytest.mark.datafiles isn’t defined, or something similar, do not worry, this can be ignored. Alternatively, if you add a file called pytest.ini
[pytest] markers = datafiles: mark a test as a datafiles. |
-
Code used to solve this problem.
-
Output from running the code.
Question 5
Create an additional method in the WatchData
class in the watch_data.py
module that does something interesting or useful with the data. Be sure to include a Google style docstring (no doctests are needed). In addition, write 1 or more pytest
tests for your new method that uses fixtures. Make sure your test passes (you can run your pytest
tests from a bash
cell in your notebook).
If you are up for a bigger challenge, design your new method to be similar to filter_elements
in that a user can write their own functions or classes that can be passed to it (as arguments) in order to accomplish something useful that they may want to be customized.
We will count the use of the |
Make sure to run the pytest
tests from a bash cell in your notebook.
%%bash
cd $HOME/project05
python -m pytest
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. |