Authors:
(1) Joseph Latessa, Department of Computer Science Wayne State University, Detroit MI USA (jlatessa@wayne.edu);
(2) Aadi Huria, Senior, Salem High School Canton, MI USA (huria.aadi@gmail.com);
(3) Deepak Raju, Senior, Salem High School, Canton MI USA (Deepak.Raju294@outlook.com).
Table of Links
Conclusions, Acknowledgement and References
4 PROJECT IMPLEMENTATION
To meet the project’s goal of creating an experience that emphasizes learning while also providing students an opportunity to write and deploy meaningful tests, we chose the following three tests to implement: an HTML validator, a link checker, and a series of unit tests to ensure the correctness of the JETSCAPE [1] XML reader, which is a function of the JETSCAPE framework that parses input parameters. Since the first two tests don’t involve unit tests, an introduction to unit testing and the Python UNITTEST framework can be postponed until after the first two tasks are completed.
4.1 HTML Validator
The JETSCAPE [1] and GOMC [2] websites are both static sites intended to convey information about the JETSCAPE and GOMC scientific computing applications. The websites are written with HTML, CSS, and JavaScript, and the sites are hosted and deployed using GitHub Pages. The sites can be updated through pull requests to the repositories’ respective main branches, and members of the collaboration can propose pull requests. Since the sites are collaboratively maintained, it would be useful to have an automated test to ensure that all HTML code conforms to the HTML5 standard. Invalid HTML can often result in small easy-to-miss cosmetic inconsistencies, so this automated test would prove useful.
The students explored the GitHub Actions Marketplace and together we read through the documentation of a Marketplace Action [10] that looked relevant to our task. The Marketplace Action referenced and extended an open-source HTML5 Validator [11] built with Python and available through pip. We installed the validator locally to test it on our repository code, and purposefully added HTML tag typos to force the tests to fail.
After thoroughly testing the validator locally, we then collaboratively wrote the YAML file to automate running the test on push operations and pull requests. Having only previously seen the example YAML file to check whether a website is reachable, we referred to that example file as a starting point, removed what wasn’t relevant for this workflow and added the step to call the validator Marketplace Action [10].
The students then pushed their committed changes to their respective forks and viewed the running tests in the Actions tab on GitHub. An issue arose regarding how to format the path to the repository checked out on GitHub’s runner. Since the path to the repository on the runner is different from the path to the repository on one’s local machine, the students gained experience reading the test logs and examining why the test that had passed locally was failing on GitHub’s runners.
Although the HTML validator was a simple test to automate, we consider it to have been a good starting point. The Marketplace Action [10] does most of the work and successfully validates the repository’s code without requiring any additional scripting, so the students could focus on testing the implementation and writing the YAML file to handle the automation. And while Git operations remain a new concept, the students continued to develop fluency with each subsequent sequence of commits and pushes.
4.2 Link Checker
The next test built upon the skill acquired implementing the HTML validator and required some additional scripting and troubleshooting.
A link checker is a useful website debugging tool that identifies broken links. We discussed the usefulness of setting the link checker to run on a schedule in addition to on pushes and pull requests. While the link checker will catch broken links caused by a developer mistyping a URL in the code, it is also reasonable to expect that a presently working link could at some point stop working. Many website links don’t just point to internally hosted pages and content but point to external conference pages and journal articles. Links to old conference sites can change or simply go offline. To regularly check the validity of every link would be a tedious task to perform manually. With respect to the JETSCAPE [1] and GOMC [2] sites, no one was checking links regularly and once implemented, our tool identified several broken links on each site.
The students began by researching GitHub Marketplace Actions and other available open-source tools as they did for the HTML validator. We found a Python tool, LinkChecker [12], that supported a command line interface. We installed the tool locally and tested it. The tool takes a URL as a command line argument, recursively visits pages, and checks internal links reachable from the initial URL. A flag can be set to instruct the tool to also check external links.
In our testing, we discovered that when passing our websites’ root domain, https://jetscape.org or https://gomc-wsu.org respectively, the links on the index.html page were checked, but no other site page was visited. The students investigated and realized that the navigation bars were inserted with JavaScript from text on a separate nav.html page, and the tool itself does not render JavaScript. To solve the issue, we passed the URL of the nav.html page instead of the domain’s root address. The students examined the test logs and determined that every .html page was now being visited, but many of the external conference and journal links specifically on the JETSCAPE website were not being checked.
JETSCAPE’s site design uses several JSON files where the conference and publication data are stored. JavaScript parses the JSON data, which includes the conference and publication links, and writes them to the relevant tables on page load. Therefore, the tool passes over the most important links we would want to check. To solve the problem, we decided to supplement the opensource tool, which successfully checks the links specifically written in the HTML files, with our own link checking script designed to identify and check links found in the JSON files.
A Python script skeleton was provided to the students with function stubs to be implemented. Providing a code skeleton was familiar and consistent with the methodology used to introduce version control with the calculator application. Code from the URL reachability script and YAML file that introduced GitHub Actions was also applicable and could be referenced when implementing the link checker. The students were guided to produce a Python script that accepted a directory path as input and parsed the text of every JSON file found in that path to identify and check URLs. We successfully tested this script locally, passing the path to our repository’s data folder where the relevant JSON files were stored.
We then wrote our YAML file to automate a test that used the original open-source link checker to test the links written explicitly in the HTML files as well as our own Python script to check the links identified in the JSON files. We then tested the automation by including purposefully broken links. During this testing, we realized that the original open-source link checker was checking the deployed site rather than the code we had just changed and were about to deploy. This was not what we intended. We wanted to check the code we were pushing to ensure that proposed changes didn’t introduce new problems. To solve this issue, we amended our YAML file to launch a local server on the GitHub runners. Instead of passing the deployed site address to the link checker, we passed the address of the site launched on the GitHub Actions runner’s local server. This solution required a discussion and demonstration of how to test websites using a local server and how to run programs as background processes at the command line. Completing these tasks, the students now had multiple experiences practicing Git operations, working at their local command prompts, and writing commands as YAML file job steps to facilitate automation with GitHub Actions.
4.3 Introducing Unit Testing
The third automated test required an introduction to the concept of unit testing. For this introduction, we returned to the calculator application used to introduce version control. Building upon that code, we explored Python’s UNITTEST framework and created a test class with functions to test the calculator’s arithmetic operations. It was noted how several test cases could be executed with one simple call to a Python script. We then wrote a YAML file to automate running those tests on pushes and pull requests to the calculator repository.
4.4 Unit Testing the XML Reader
Our final automated test applies to the JETSCAPE [1] code repository instead of the website repositories. The JETSCAPE application provides an XML reader to receive a user’s input parameters. JETSCAPE maintains a main.xml file in which default parameters are set. A user can provide a separate user.xml file to override some or all the main.xml tags. To prevent typos that could arise in the user.xml files, the JETSCAPE application will exit with an error message if the user.xml file contains a tag that is not also included in the main.xml file.
Many example user.xml files are included in the repository to demonstrate different use cases, and as new features are added, developers can include new example XML files. The automated test described here will use Python’s UNITTEST framework to call JETSCAPE’s XML reader and test every example user.xml file included in the repository.
The requirements of this test exhibit comparable functionality to the link checker. While the link checker needed to identify every JSON file in a directory, here we look for every user.xml file in a subtree of directories. This offers students an opportunity to revisit and review logic and syntax seen for the first time with the link checker test.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.