There’s a tool for that

2019-03-29

This talk

Assume some basic python (mainly; and a bit of R)

No: solving problems with code
Yes: easing the problems of coding

Links

My expertise wordcloud (according to Stack-Overflow)

… happy to answer R queries through Yammer / email

Made with github::dgrtwo/stackr and wordcloud packages

What tools / libraries help you …

… package stuff

[py] cookiecutter / setuptools; [R] devtools / usethis

… document stuff

[py] sphinx / restructuredText; [R] roxygen2

… test stuff

[py] pytest / hypothesis; [R] testthat / hedgehog

… style stuff

[py] black / pylint / flake8; [R] styler / lintr

… interact with the user

[py] click; [R] optparse

Rosalind

Initial Solution for `Longest Increasing Subsequence` from rosalind.info

Rosalind

Bioinformatics algorithmic challenges (akin to Project Euler / Codewars / Codesignal)

Longest Increasing Subsequence

Pulled out my solution code (written ~ 6 years ago)
I know that my code is bad. That’s why I chose it.

Input: unique positive integers
Partition entries into levels (based on values of preceding entries)
Identify parent(s) of each entry

Initial Code

cat rosa/initial.py

stream = open('data_LGIS.txt', 'r').read().splitlines()

maxnum = int(stream[0])
perm   = [int(x) for x in stream[1].split()]

def lgis(maxnum, perm):
  #
  levels = []
  graph  = {}
  for i in perm:
    if len(levels) == 0:
      levels.append([i])
      graph[i] = 0
    else:
      # append to the highest level where i is greater than some
      # value in the next lowest level
      for lev in reversed(range(len(levels) + 1)):
        if i in graph.keys():
          break
        if lev == 0:
          levels[0].append(i)
          graph[i] = 0
          break
        lower_lev = levels[lev-1]
        lt = [x for x in lower_lev if x < i]
        if len(lt) > 0:
          if len(levels) == lev:
            levels.append([i])
          else:
            levels[lev].append(i)
            # drop entries in this level if they are greater than i
            lev_gt = [x for x in levels[lev] if x > i]
            levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
          graph[i] = lt[0]
  # returnable data
  res = []
  i = levels[-1][0]
  while(i != 0):
    res.append(i)
    i = graph[i]
  res = list(reversed(res))
  return res

print(" ".join([str(x) for x in lgis(maxnum, perm)]))
print(" ".join(
  list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))

Automate packaging: cookiecutter

Python packages look like this

tree minimal_py_package

## minimal_py_package
## ├── LICENSE
## ├── minimal_py_package
## │   └── __init__.py
## ├── README.rst
## ├── setup.py
## ├── tests
## │   └── test_sample.py
## └── tox.ini
## 
## 2 directories, 6 files

Stuff works better when you structure it better

`cookiecutter`

Templates that help structure a programming project

cookiecutter docs

Examples:

Define a minimal python package

To use a cookiecutter:

cookiecutter git@github.com:<some_repo>.git

Then answer some config questions:

cookiecutter git@github.com:kragniz/cookiecutter-pypackage-minimal.git

You've downloaded /home/ah327h/.cookiecutters/cookiecutter-pypackage-minimal before. Is it okay to delete and re-download it? [yes]:
author_name [Louis Taylor]: Russell Hyde
author_email [louis@kragniz.eu]: me AT somewhere.uk
package_name [cookiecutter_pypackage_minimal]: rosa_initial
package_version [0.1.0]:
...

Resulting directory:

I copied in my lgis.py code into the resulting directory

## rosa_initial
## ├── LICENSE
## ├── README.rst
## ├── rosa_initial
## │   ├── __init__.py
## │   └── lgis.py
## ├── setup.py
## ├── tests
## │   └── test_sample.py
## └── tox.ini
## 
## 2 directories, 7 files

Automate (trivial bits of) code-review: pylint

`pylint`

Link

finds non-idiomatic and inconsistently styled bits of code
highly configurable (alternative: flake8)

pylint --long-help | tail -n20 | head -n 8

##     There are 5 kind of message types :
##     * (C) convention, for programming standard violation
##     * (R) refactor, for bad code smell
##     * (W) warning, for python specific problems
##     * (E) error, for probable bugs in the code
##     * (F) fatal, if an error occurred which prevented pylint from doing
##     further processing.

Booooring lints in the initial code

(cd rosa_initial/ && pylint rosa_initial --exit-zero)

## ************* Module rosa_initial.lgis
## rosa_initial/lgis.py:4:7: C0326: Exactly one space required before assignment
## perm   = [int(x) for x in stream[1].split()]
##        ^ (bad-whitespace)
## rosa_initial/lgis.py:8:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:9:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:9:9: C0326: Exactly one space required before assignment
##   graph  = {}
##          ^ (bad-whitespace)
## rosa_initial/lgis.py:10:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:11:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:12:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:13:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:14:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:17:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:18:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:19:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:20:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:21:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:22:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:23:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:24:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:25:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:26:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:27:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:28:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:29:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:30:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:32:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:33:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:34:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:36:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:37:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:38:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:38:0: C0325: Unnecessary parens after 'while' keyword (superfluous-parens)
## rosa_initial/lgis.py:39:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:40:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:41:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:42:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:46:0: C0330: Wrong hanging indentation (add 2 spaces).
##   list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))  ^ | (bad-continuation)
## rosa_initial/lgis.py:46:0: C0304: Final newline missing (missing-final-newline)
## rosa_initial/lgis.py:1:0: C0111: Missing module docstring (missing-docstring)
## rosa_initial/lgis.py:1:0: C0103: Constant name "stream" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:3:0: C0103: Constant name "maxnum" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:4:0: C0103: Constant name "perm" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:6:9: W0621: Redefining name 'maxnum' from outer scope (line 3) (redefined-outer-name)
## rosa_initial/lgis.py:6:17: W0621: Redefining name 'perm' from outer scope (line 4) (redefined-outer-name)
## rosa_initial/lgis.py:6:0: C0111: Missing function docstring (missing-docstring)
## rosa_initial/lgis.py:11:7: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_initial/lgis.py:25:8: C0103: Variable name "lt" doesn't conform to snake_case naming style (invalid-name)
## rosa_initial/lgis.py:26:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_initial/lgis.py:6:9: W0613: Unused argument 'maxnum' (unused-argument)
## 
## --------------------------------------------------------------------
## Your code has been rated at -2.37/10 (previous run: -2.37/10, +0.00)

Automate Styling : black

`black`

Link

formats files in-place
uncompromising and unconfigurable

# - Made another package `rosa_cleaned` (not shown)
#
# - Ensure the original code is present in the `rosa_cleaned` package
# to begin with
cp rosa_initial/rosa_initial/lgis.py \
    rosa_cleaned/rosa_cleaned/lgis.py

black rosa_cleaned/*/lgis.py

## reformatted rosa_cleaned/rosa_cleaned/lgis.py
## All done! ✨ 🍰 ✨
## 1 file reformatted.

Example `black` differences

diff rosa_{initial,cleaned}/*/lgis.py | grep -e ">\|<" | head -n6

## < stream = open('data_LGIS.txt', 'r').read().splitlines()
## > stream = open("data_LGIS.txt", "r").read().splitlines()
## < perm   = [int(x) for x in stream[1].split()]
## > perm = [int(x) for x in stream[1].split()]
## > 
## <   #

These are trivial differences
Value comes when applied to large projects

The styled code

cat rosa_cleaned/*/lgis.py

stream = open("data_LGIS.txt", "r").read().splitlines()

maxnum = int(stream[0])
perm = [int(x) for x in stream[1].split()]


def lgis(maxnum, perm):
    #
    levels = []
    graph = {}
    for i in perm:
        if len(levels) == 0:
            levels.append([i])
            graph[i] = 0
        else:
            # append to the highest level where i is greater than some
            # value in the next lowest level
            for lev in reversed(range(len(levels) + 1)):
                if i in graph.keys():
                    break
                if lev == 0:
                    levels[0].append(i)
                    graph[i] = 0
                    break
                lower_lev = levels[lev - 1]
                lt = [x for x in lower_lev if x < i]
                if len(lt) > 0:
                    if len(levels) == lev:
                        levels.append([i])
                    else:
                        levels[lev].append(i)
                        # drop entries in this level if they are greater than i
                        lev_gt = [x for x in levels[lev] if x > i]
                        levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
                    graph[i] = lt[0]
    # returnable data
    res = []
    i = levels[-1][0]
    while i != 0:
        res.append(i)
        i = graph[i]
    res = list(reversed(res))
    return res


print(" ".join([str(x) for x in lgis(maxnum, perm)]))
print(" ".join(list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))

After `black` the lints are non-trivial

(cd rosa_cleaned && pylint */lgis.py --exit-zero)

## ************* Module rosa_cleaned.lgis
## rosa_cleaned/lgis.py:1:0: C0111: Missing module docstring (missing-docstring)
## rosa_cleaned/lgis.py:1:0: C0103: Constant name "stream" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:3:0: C0103: Constant name "maxnum" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:4:0: C0103: Constant name "perm" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:7:9: W0621: Redefining name 'maxnum' from outer scope (line 3) (redefined-outer-name)
## rosa_cleaned/lgis.py:7:17: W0621: Redefining name 'perm' from outer scope (line 4) (redefined-outer-name)
## rosa_cleaned/lgis.py:7:0: C0111: Missing function docstring (missing-docstring)
## rosa_cleaned/lgis.py:12:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_cleaned/lgis.py:26:16: C0103: Variable name "lt" doesn't conform to snake_case naming style (invalid-name)
## rosa_cleaned/lgis.py:27:19: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_cleaned/lgis.py:7:9: W0613: Unused argument 'maxnum' (unused-argument)
## 
## ------------------------------------------------------------------
## Your code has been rated at 6.86/10 (previous run: 6.86/10, +0.00)

Manually fixed the remaining lints:

Changes:

Added module / function description strings
Varnames:
- Global constant vars \(\rightarrow\) upper-case
- Short varnames fixed lt \(\rightarrow\) parents
Removed an unused argument
Changed empty-list tests:
- if len(my_list) > 0: ... \(\rightarrow\) if my_list: ...

Styled & Lint-free code:

cat rosa/nolints.py

"""Script and functions to find the `longest increasing subsequence` in
a sequence (as defined at rosalind.info/problems/lgis).

Searches the working directory for a file called "data_LGIS.txt". The
data file should contain two lines - the first containing the length of
a sequence of integers and the second containing the sequence of
integers.

The integer sequence should consist of positive integers only
(repetitions are allowed).

Prints out both the longest increasing sequence, and the longest
decreasing sequence to stdout.
"""

STREAM = open("data_LGIS.txt", "r").read().splitlines()
MAXNUM = int(STREAM[0])
PERM = [int(x) for x in STREAM[1].split()]


def lgis(perm):
    """Obtain the longest increasing subsequence from a list of
    positive integers

    Args:
        perm (list(int)): A list of integers within which the longest
        increasing subsequence is to be found.

    Returns: A list of integers, a subsequence of the input list of
        integers.
    """
    #
    levels = []
    graph = {}
    for i in perm:
        if not levels:
            levels.append([i])
            graph[i] = 0
        else:
            # append to the highest level where i is greater than some
            # value in the next lowest level
            for lev in reversed(range(len(levels) + 1)):
                if i in graph.keys():
                    break
                if lev == 0:
                    levels[0].append(i)
                    graph[i] = 0
                    break
                lower_lev = levels[lev - 1]
                parents = [x for x in lower_lev if x < i]
                if parents:
                    if len(levels) == lev:
                        levels.append([i])
                    else:
                        levels[lev].append(i)
                        # drop entries in this level if they are greater than i
                        lev_gt = [x for x in levels[lev] if x > i]
                        levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
                    graph[i] = parents[0]
    # returnable data
    res = []
    i = levels[-1][0]
    while i != 0:
        res.append(i)
        i = graph[i]
    res = list(reversed(res))
    return res


print(" ".join([str(x) for x in lgis(PERM)]))
print(" ".join(list(reversed([str(x) for x in lgis(list(reversed(PERM)))]))))

Passing lints != Good code

extensibility / maintainability
- Untestable / unimportable
- Pulls data from a hard-coded filepath
correctness
- edge-case failures (see later)
idioms
- Use with when opening files
- Poor loops: for ... in range(...), i in my_dict.keys()
general
- Poor docs / Poor names / Nested-conditionals / Long-function
comments?

Python Book of Antipatterns

Passing lints != Importable code

# The lint-free code is in ./rosa/nolints.py
try:
    import rosa.nolints
except FileNotFoundError as e:
    print(e)

## [Errno 2] No such file or directory: 'data_LGIS.txt'

# Hardcoded file in global-env at the offending line:
grep "data_LGIS.txt" rosa/nolints.py

## Searches the working directory for a file called "data_LGIS.txt". The
## STREAM = open("data_LGIS.txt", "r").read().splitlines()

Automate running your tests: pytest

Make the file importable / testable

Made another python package: rosa_testable

lgis.py now contains a guard-block around the global-env code

cat rosa_testable/*/lgis.py | grep -A7 "__main__"

if __name__ == "__main__":
    # This block only runs when this file is called as a script ...
    STREAM = open("data_LGIS.txt", "r").read().splitlines()
    MAXNUM = int(STREAM[0])
    PERM = [int(x) for x in STREAM[1].split()]
    print(" ".join([str(x) for x in lgis(PERM)]))
    print(" ".join(list(reversed([str(x) for x in lgis(list(reversed(PERM)))]))))

Make the function importable / testable

Also, we ensured the function was exported by the rosa_testable package:

cat rosa_testable/rosa_testable/__init__.py

"""rosa_testable - -"""

# user can now `from rosa_testable import lgis`
# .. rather than `from rosa_testable.lgis import lgis`
# .. they needn't know which file (lgis.py) the function is stored in
from rosa_testable.lgis import lgis

__version__ = '0.1.0'
__author__ = 'Russell Hyde <me AT somewhere.uk>'
__all__ = []

Check importability

pip install -e rosa_testable # [bash]

Obtaining file:///home/ah327h/workshops/bfx_201903/rosa_testable
Installing collected packages: rosa-testable
  Found existing installation: rosa-testable 0.1.0
    Uninstalling rosa-testable-0.1.0:
      Successfully uninstalled rosa-testable-0.1.0
  Running setup.py develop for rosa-testable
Successfully installed rosa-testable

import rosa_testable as rt # [python] successful import
print(rt.lgis([1,4,3,6,7,2,5]))

## [1, 3, 6, 7]

Preliminary tests

cat rosa_testable/tests/test_lgis.py

from rosa_testable import lgis

def test_it_isnt_greedy():
    assert lgis([1, 4, 2, 3, 5]) != [1, 4, 5]

def test_increasing_sequence():
    assert lgis([1, 3, 4, 2, 5, 7, 4]) == [1, 3, 4, 5, 7]

pytest rosa_testable --quiet # run all tests in rosa_testable/tests

..                                                                       [100%]
2 passed in 0.01 seconds

Automate test-case generation: hypothesis

`hypothesis`

Link

Randomly create your test data

from hypothesis import given, strategies as st

@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        max_size=1)
    )
def test_trivial_sequences(my_ints):
    assert my_ints == rt.lgis(my_ints)

This generates

lots of separate integer-lists (st.lists(st.integers, ....))
each list contains at-most one element (... max_size=1)
each integer is \(\geq1\) (st.integers(min_value=1))

But …

try:
    test_trivial_sequences()
except Exception as e:
    print(e)

## Falsifying example: test_trivial_sequences(my_ints=[])
## list index out of range

Randomised tests identified an edge-case failure for lgis: empty-list

Any data that fails should be converted into a unit test

Will this test pass for all input?

@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        min_size=2,
        max_size=5
    )
)
def test_with_sorted_input(my_ints):
    assert sorted(my_ints) == rt.lgis(sorted(my_ints))

… No!

try:
    test_with_sorted_input()
except Exception as e:
    print(e)

## Falsifying example: test_with_sorted_input(my_ints=[1, 1])

This was a human error

Moral: You still have to think about the tests and the test-data

Fix: ensure there are no repeated numbers in the input

@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        min_size=2,
        max_size=5,
        unique=True
    )
)
def test_with_sorted_unique_input(my_ints):
    assert sorted(my_ints) == rt.lgis(sorted(my_ints))

test_with_sorted_unique_input() # the test now passes

Use-cases for randomised testing

Identify edge-case failures that you hadn’t considered
Compare a fancy algorithm to a brute-force solution (for small input)
Check the speed of an algorithm
Provides alternative, property-based, viewpoint, eg here:
- the output should be increasing
- the output should be a subsequence of the input
- If we dovetail two sequences, the output should be no shorter than that for either sequence

Automate documentation generation: sphinx / click

Documentation

The tools above improve your efficiency
Good docs improve the effiency of the user and your collaborators (incl, future you)
Many levels:
- Code comments & Exception / error handling
- [click / argparse]
  - --help / usage strings
- [sphinx / ReadTheDocs]
  - API / program reference
  - Tutorials / Cheatsheets / Cookbooks etc

Docstrings \(\rightarrow\) Sphinx \(\rightarrow\) Documentation

Google-formatted docstrings used here. Alternative formats.

grep -A10 "def lgis" rosa_testable/*/lgis.py

def lgis(perm):
    """Obtain the longest increasing subsequence from a list of
    positive integers

    Args:
        perm (list(int)): A list of integers within which the longest
        increasing subsequence is to be found.

    Returns: A list of integers, a subsequence of the input list of
        integers.
    """

Sphinx setup :0(

The workflow for generating docs with sphinx looks like:

# 1) - Setup a package structure
# 2) - Add some source code, with docstrings associated with the
# modules / classes / functions

# 3) Initialise the `docs` file structure
cd my_pkg
mkdir docs && cd docs
sphinx-quickstart # answer lots of questions
sphinx-apidoc -o source ../my_pkg
make html

# 4) Modify source/conf.py, swear a lot, question your sanity and try
# again

# 5) Ask Russ "What happened to all the automated tools you were going
# to talk about?"

sphinx-quickstart options

Sam Nicholls on sphinx setup

Try a full-featured python-package cookie-cutter

cookiecutter git@github.com:audreyr/cookiecutter-pypackage.git

This has a Makefile for: linting, testing, installation and documentation.

full_name [Audrey Roy Greenfeld]: Russell Hyde
email [audreyr@example.com]: russ AT somewhere.net
github_username [audreyr]: russHyde
project_name [Python Boilerplate]: detailed_py_package
project_slug [detailed_py_package]:
project_short_description [Python Boilerplate contains all the boilerplate you need to create a Python package.]: -
pypi_username [russHyde]: -
version [0.1.0]:
use_pytest [n]: y
...

Detailed package structure

tree detailed_py_package

detailed_py_package
├── CONTRIBUTING.rst
├── detailed_py_package
│   ├── cli.py
│   ├── detailed_py_package.py
│   └── __init__.py
├── docs
│   ├── conf.py
│   ├── contributing.rst
│   ├── history.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── make.bat
│   ├── Makefile
│   ├── readme.rst
│   └── usage.rst
├── HISTORY.rst
├── LICENSE
├── Makefile
├── MANIFEST.in
├── README.rst
├── requirements_dev.txt
├── setup.cfg
├── setup.py
├── tests
│   └── test_detailed_py_package.py
└── tox.ini

3 directories, 23 files

Made a `sphinx`-ready package

# use `make install` in normal use
pip install -e rosa_with_docs

## Obtaining file:///home/ah327h/workshops/bfx_201903/rosa_with_docs
## Requirement already satisfied: Click>=6.0 in /home/ah327h/tools/miniconda3/envs/bfx_201903/lib/python3.6/site-packages (from rosa-with-docs==0.1.0) (7.0)
## Installing collected packages: rosa-with-docs
##   Found existing installation: rosa-with-docs 0.1.0
##     Uninstalling rosa-with-docs-0.1.0:
##       Successfully uninstalled rosa-with-docs-0.1.0
##   Running setup.py develop for rosa-with-docs
## Successfully installed rosa-with-docs

Make the docs

# Not ran while building presentation
(cd rosa_with_docs && make docs)

(Not on rpubs) Link to documentation

[example output]
rm -f docs/rosa_with_docs.rst
rm -f docs/modules.rst
sphinx-apidoc -o docs/ rosa_with_docs
Creating file docs/rosa_with_docs.rst.
Creating file docs/modules.rst.
make -C docs clean
make[1]: Entering directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
Removing everything under '_build'...
make[1]: Leaving directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
make -C docs html
make[1]: Entering directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
Running Sphinx v1.8.4
making output directory...
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 9 source files that are out of date
updating environment: 9 added, 0 changed, 0 removed
reading sources... [100%] usage
/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs/index.rst:2: WARNING: Title underline too short.

Welcome to rosa_with_docs's documentation!
======================================
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] usage
generating indices... genindex py-modindex
highlighting module code... [100%] rosa_with_docs.lgis
writing additional pages... search
copying static files... WARNING: html_static_path entry '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs/_static' does not exist
done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 2 warnings.

The HTML pages are in _build/html.
make[1]: Leaving directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
python -c "$BROWSER_PYSCRIPT" docs/_build/html/index.html

Easy `--help` strings / arg-parsing: `click`

Another use of docstrings

Command line programs:

Standard solution is argparse
click
- is less verbose
- decorators define commands / arguments
- docstrings define help messages

# Wanted
<program_name> <command_name> [arguments]

Implementation with `click`

cat rosa_cli/rosa_cli/cli.py # I have renamed the `lgis` function

# -*- coding: utf-8 -*-

"""Console script for rosa_cli."""
import sys
import click

from . import longest_increasing_subsequence, longest_decreasing_subsequence


@click.group()
def main():
    """Console script for rosa_cli."""
    return 0


@main.command(
    "lgis", short_help="Longest-monotonic subsequences"
)
@click.argument(
    "filepath", metavar="<filepath>"
)
def lgis(filepath):
    """This command prints the longest increasing and the longest decreasing
    subsequence from a sequence of positive integers.
    """
    stream = open(filepath, "r").read().splitlines()
    perm = [int(x) for x in stream[1].split()]
    print(" ".join([str(x) for x in longest_increasing_subsequence(perm)]))
    print(" ".join([str(x) for x in longest_decreasing_subsequence(perm)]))


if __name__ == "__main__":
    sys.exit(main())  # pragma: no cover

Main program help-string

rosa_cli --help

Usage: rosa_cli [OPTIONS] COMMAND [ARGS]...

  Console script for rosa_cli.

Options:
  --help  Show this message and exit.

Commands:
  lgis  Longest-monotonic subsequences

Subcommand help-string

rosa_cli lgis --help

Usage: rosa_cli lgis [OPTIONS] <filepath>

  This command prints the longest increasing and the longest decreasing
  subsequence from a sequence of positive integers.

Options:
  --help  Show this message and exit.

Run the tool

echo 6 > temp.txt
echo "1 5 3 2 6 5" >> temp.txt

rosa_cli lgis temp.txt

1 2 6
3 2

Comparison to R tools

Job	Python	R
IDE	PyCharm	Rstudio
Styling	black	styler
Linting	pylint / flake8	lintr
Testing	pytest / hypothesis	testthat / hedgehog
Packaging	cookiecutter / setuptools	devtools / usethis
Docs	sphinx / restructuredText	roxygen2 / rmarkdown
CLI	click / argparse	optparse

This talk

My expertise wordcloud (according to Stack-Overflow)

What tools / libraries help you …

Rosalind

Initial Solution for Longest Increasing Subsequence from rosalind.info

Initial Code

Automate packaging: cookiecutter

Python packages look like this

cookiecutter

Define a minimal python package

Resulting directory:

Automate (trivial bits of) code-review: pylint

pylint

Booooring lints in the initial code

Automate Styling : black

black

Example black differences

The styled code

After black the lints are non-trivial

Manually fixed the remaining lints:

Styled & Lint-free code:

Passing lints != Good code

Passing lints != Importable code

Automate running your tests: pytest

Make the file importable / testable

Make the function importable / testable

Check importability

Preliminary tests

Automate test-case generation: hypothesis

hypothesis

But …

Will this test pass for all input?

… No!

Fix: ensure there are no repeated numbers in the input

Use-cases for randomised testing

Automate documentation generation: sphinx / click

Documentation

Docstrings \(\rightarrow\) Sphinx \(\rightarrow\) Documentation

Sphinx setup :0(

Try a full-featured python-package cookie-cutter

Detailed package structure

Made a sphinx-ready package

Make the docs

Easy --help strings / arg-parsing: click

Implementation with click

Main program help-string

Subcommand help-string

Run the tool

Comparison to R tools

Initial Solution for `Longest Increasing Subsequence` from rosalind.info

`cookiecutter`

`pylint`

`black`

Example `black` differences

After `black` the lints are non-trivial

`hypothesis`

Made a `sphinx`-ready package

Easy `--help` strings / arg-parsing: `click`

Implementation with `click`