2019-03-29

This talk

My expertise wordcloud (according to Stack-Overflow)

… happy to answer R queries through Yammer / email

  • Made with github::dgrtwo/stackr and wordcloud packages

What tools / libraries help you …

package stuff

  • [py] cookiecutter / setuptools; [R] devtools / usethis

document stuff

  • [py] sphinx / restructuredText; [R] roxygen2

test stuff

  • [py] pytest / hypothesis; [R] testthat / hedgehog

style stuff

  • [py] black / pylint / flake8; [R] styler / lintr

interact with the user

  • [py] click; [R] optparse

Rosalind

Initial Solution for Longest Increasing Subsequence from rosalind.info

Rosalind

  • Bioinformatics algorithmic challenges (akin to Project Euler / Codewars / Codesignal)

Longest Increasing Subsequence

  • Pulled out my solution code (written ~ 6 years ago)

  • I know that my code is bad. That’s why I chose it.

  • Input: unique positive integers

  • Partition entries into levels (based on values of preceding entries)

  • Identify parent(s) of each entry

Initial Code

cat rosa/initial.py
stream = open('data_LGIS.txt', 'r').read().splitlines()

maxnum = int(stream[0])
perm   = [int(x) for x in stream[1].split()]

def lgis(maxnum, perm):
  #
  levels = []
  graph  = {}
  for i in perm:
    if len(levels) == 0:
      levels.append([i])
      graph[i] = 0
    else:
      # append to the highest level where i is greater than some
      # value in the next lowest level
      for lev in reversed(range(len(levels) + 1)):
        if i in graph.keys():
          break
        if lev == 0:
          levels[0].append(i)
          graph[i] = 0
          break
        lower_lev = levels[lev-1]
        lt = [x for x in lower_lev if x < i]
        if len(lt) > 0:
          if len(levels) == lev:
            levels.append([i])
          else:
            levels[lev].append(i)
            # drop entries in this level if they are greater than i
            lev_gt = [x for x in levels[lev] if x > i]
            levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
          graph[i] = lt[0]
  # returnable data
  res = []
  i = levels[-1][0]
  while(i != 0):
    res.append(i)
    i = graph[i]
  res = list(reversed(res))
  return res

print(" ".join([str(x) for x in lgis(maxnum, perm)]))
print(" ".join(
  list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))

Automate packaging: cookiecutter

Python packages look like this

tree minimal_py_package
## minimal_py_package
## ├── LICENSE
## ├── minimal_py_package
## │   └── __init__.py
## ├── README.rst
## ├── setup.py
## ├── tests
## │   └── test_sample.py
## └── tox.ini
## 
## 2 directories, 6 files
  • Stuff works better when you structure it better

cookiecutter

Define a minimal python package

To use a cookiecutter:

cookiecutter git@github.com:<some_repo>.git

Then answer some config questions:

cookiecutter git@github.com:kragniz/cookiecutter-pypackage-minimal.git

You've downloaded /home/ah327h/.cookiecutters/cookiecutter-pypackage-minimal before. Is it okay to delete and re-download it? [yes]:
author_name [Louis Taylor]: Russell Hyde
author_email [louis@kragniz.eu]: me AT somewhere.uk
package_name [cookiecutter_pypackage_minimal]: rosa_initial
package_version [0.1.0]:
...

Resulting directory:

I copied in my lgis.py code into the resulting directory

## rosa_initial
## ├── LICENSE
## ├── README.rst
## ├── rosa_initial
## │   ├── __init__.py
## │   └── lgis.py
## ├── setup.py
## ├── tests
## │   └── test_sample.py
## └── tox.ini
## 
## 2 directories, 7 files

Automate (trivial bits of) code-review: pylint

pylint

Link

  • finds non-idiomatic and inconsistently styled bits of code

  • highly configurable (alternative: flake8)

pylint --long-help | tail -n20 | head -n 8
##     There are 5 kind of message types :
##     * (C) convention, for programming standard violation
##     * (R) refactor, for bad code smell
##     * (W) warning, for python specific problems
##     * (E) error, for probable bugs in the code
##     * (F) fatal, if an error occurred which prevented pylint from doing
##     further processing.

Booooring lints in the initial code

(cd rosa_initial/ && pylint rosa_initial --exit-zero)
## ************* Module rosa_initial.lgis
## rosa_initial/lgis.py:4:7: C0326: Exactly one space required before assignment
## perm   = [int(x) for x in stream[1].split()]
##        ^ (bad-whitespace)
## rosa_initial/lgis.py:8:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:9:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:9:9: C0326: Exactly one space required before assignment
##   graph  = {}
##          ^ (bad-whitespace)
## rosa_initial/lgis.py:10:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:11:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:12:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:13:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:14:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:17:0: W0311: Bad indentation. Found 6 spaces, expected 12 (bad-indentation)
## rosa_initial/lgis.py:18:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:19:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:20:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:21:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:22:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:23:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:24:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:25:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:26:0: W0311: Bad indentation. Found 8 spaces, expected 16 (bad-indentation)
## rosa_initial/lgis.py:27:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:28:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:29:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:30:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:32:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:33:0: W0311: Bad indentation. Found 12 spaces, expected 24 (bad-indentation)
## rosa_initial/lgis.py:34:0: W0311: Bad indentation. Found 10 spaces, expected 20 (bad-indentation)
## rosa_initial/lgis.py:36:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:37:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:38:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:38:0: C0325: Unnecessary parens after 'while' keyword (superfluous-parens)
## rosa_initial/lgis.py:39:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:40:0: W0311: Bad indentation. Found 4 spaces, expected 8 (bad-indentation)
## rosa_initial/lgis.py:41:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:42:0: W0311: Bad indentation. Found 2 spaces, expected 4 (bad-indentation)
## rosa_initial/lgis.py:46:0: C0330: Wrong hanging indentation (add 2 spaces).
##   list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))  ^ | (bad-continuation)
## rosa_initial/lgis.py:46:0: C0304: Final newline missing (missing-final-newline)
## rosa_initial/lgis.py:1:0: C0111: Missing module docstring (missing-docstring)
## rosa_initial/lgis.py:1:0: C0103: Constant name "stream" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:3:0: C0103: Constant name "maxnum" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:4:0: C0103: Constant name "perm" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_initial/lgis.py:6:9: W0621: Redefining name 'maxnum' from outer scope (line 3) (redefined-outer-name)
## rosa_initial/lgis.py:6:17: W0621: Redefining name 'perm' from outer scope (line 4) (redefined-outer-name)
## rosa_initial/lgis.py:6:0: C0111: Missing function docstring (missing-docstring)
## rosa_initial/lgis.py:11:7: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_initial/lgis.py:25:8: C0103: Variable name "lt" doesn't conform to snake_case naming style (invalid-name)
## rosa_initial/lgis.py:26:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_initial/lgis.py:6:9: W0613: Unused argument 'maxnum' (unused-argument)
## 
## --------------------------------------------------------------------
## Your code has been rated at -2.37/10 (previous run: -2.37/10, +0.00)

Automate Styling : black

black

Link

  • formats files in-place

  • uncompromising and unconfigurable

# - Made another package `rosa_cleaned` (not shown)
#
# - Ensure the original code is present in the `rosa_cleaned` package
# to begin with
cp rosa_initial/rosa_initial/lgis.py \
    rosa_cleaned/rosa_cleaned/lgis.py
black rosa_cleaned/*/lgis.py
## reformatted rosa_cleaned/rosa_cleaned/lgis.py
## All done! ✨ 🍰 ✨
## 1 file reformatted.

Example black differences

diff rosa_{initial,cleaned}/*/lgis.py | grep -e ">\|<" | head -n6
## < stream = open('data_LGIS.txt', 'r').read().splitlines()
## > stream = open("data_LGIS.txt", "r").read().splitlines()
## < perm   = [int(x) for x in stream[1].split()]
## > perm = [int(x) for x in stream[1].split()]
## > 
## <   #
  • These are trivial differences

  • Value comes when applied to large projects

The styled code

cat rosa_cleaned/*/lgis.py
stream = open("data_LGIS.txt", "r").read().splitlines()

maxnum = int(stream[0])
perm = [int(x) for x in stream[1].split()]


def lgis(maxnum, perm):
    #
    levels = []
    graph = {}
    for i in perm:
        if len(levels) == 0:
            levels.append([i])
            graph[i] = 0
        else:
            # append to the highest level where i is greater than some
            # value in the next lowest level
            for lev in reversed(range(len(levels) + 1)):
                if i in graph.keys():
                    break
                if lev == 0:
                    levels[0].append(i)
                    graph[i] = 0
                    break
                lower_lev = levels[lev - 1]
                lt = [x for x in lower_lev if x < i]
                if len(lt) > 0:
                    if len(levels) == lev:
                        levels.append([i])
                    else:
                        levels[lev].append(i)
                        # drop entries in this level if they are greater than i
                        lev_gt = [x for x in levels[lev] if x > i]
                        levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
                    graph[i] = lt[0]
    # returnable data
    res = []
    i = levels[-1][0]
    while i != 0:
        res.append(i)
        i = graph[i]
    res = list(reversed(res))
    return res


print(" ".join([str(x) for x in lgis(maxnum, perm)]))
print(" ".join(list(reversed([str(x) for x in lgis(maxnum, list(reversed(perm)))]))))

After black the lints are non-trivial

(cd rosa_cleaned && pylint */lgis.py --exit-zero)
## ************* Module rosa_cleaned.lgis
## rosa_cleaned/lgis.py:1:0: C0111: Missing module docstring (missing-docstring)
## rosa_cleaned/lgis.py:1:0: C0103: Constant name "stream" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:3:0: C0103: Constant name "maxnum" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:4:0: C0103: Constant name "perm" doesn't conform to UPPER_CASE naming style (invalid-name)
## rosa_cleaned/lgis.py:7:9: W0621: Redefining name 'maxnum' from outer scope (line 3) (redefined-outer-name)
## rosa_cleaned/lgis.py:7:17: W0621: Redefining name 'perm' from outer scope (line 4) (redefined-outer-name)
## rosa_cleaned/lgis.py:7:0: C0111: Missing function docstring (missing-docstring)
## rosa_cleaned/lgis.py:12:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_cleaned/lgis.py:26:16: C0103: Variable name "lt" doesn't conform to snake_case naming style (invalid-name)
## rosa_cleaned/lgis.py:27:19: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
## rosa_cleaned/lgis.py:7:9: W0613: Unused argument 'maxnum' (unused-argument)
## 
## ------------------------------------------------------------------
## Your code has been rated at 6.86/10 (previous run: 6.86/10, +0.00)

Manually fixed the remaining lints:

Changes:

  • Added module / function description strings

  • Varnames:

    • Global constant vars \(\rightarrow\) upper-case
    • Short varnames fixed lt \(\rightarrow\) parents
  • Removed an unused argument

  • Changed empty-list tests:

    • if len(my_list) > 0: ... \(\rightarrow\) if my_list: ...

Styled & Lint-free code:

cat rosa/nolints.py
"""Script and functions to find the `longest increasing subsequence` in
a sequence (as defined at rosalind.info/problems/lgis).

Searches the working directory for a file called "data_LGIS.txt". The
data file should contain two lines - the first containing the length of
a sequence of integers and the second containing the sequence of
integers.

The integer sequence should consist of positive integers only
(repetitions are allowed).

Prints out both the longest increasing sequence, and the longest
decreasing sequence to stdout.
"""

STREAM = open("data_LGIS.txt", "r").read().splitlines()
MAXNUM = int(STREAM[0])
PERM = [int(x) for x in STREAM[1].split()]


def lgis(perm):
    """Obtain the longest increasing subsequence from a list of
    positive integers

    Args:
        perm (list(int)): A list of integers within which the longest
        increasing subsequence is to be found.

    Returns: A list of integers, a subsequence of the input list of
        integers.
    """
    #
    levels = []
    graph = {}
    for i in perm:
        if not levels:
            levels.append([i])
            graph[i] = 0
        else:
            # append to the highest level where i is greater than some
            # value in the next lowest level
            for lev in reversed(range(len(levels) + 1)):
                if i in graph.keys():
                    break
                if lev == 0:
                    levels[0].append(i)
                    graph[i] = 0
                    break
                lower_lev = levels[lev - 1]
                parents = [x for x in lower_lev if x < i]
                if parents:
                    if len(levels) == lev:
                        levels.append([i])
                    else:
                        levels[lev].append(i)
                        # drop entries in this level if they are greater than i
                        lev_gt = [x for x in levels[lev] if x > i]
                        levels[lev] = list(set(levels[lev]).difference(set(lev_gt)))
                    graph[i] = parents[0]
    # returnable data
    res = []
    i = levels[-1][0]
    while i != 0:
        res.append(i)
        i = graph[i]
    res = list(reversed(res))
    return res


print(" ".join([str(x) for x in lgis(PERM)]))
print(" ".join(list(reversed([str(x) for x in lgis(list(reversed(PERM)))]))))

Passing lints != Good code

  • extensibility / maintainability
    • Untestable / unimportable
    • Pulls data from a hard-coded filepath
  • correctness
    • edge-case failures (see later)
  • idioms
    • Use with when opening files
    • Poor loops: for ... in range(...), i in my_dict.keys()
  • general
    • Poor docs / Poor names / Nested-conditionals / Long-function
  • comments?

Python Book of Antipatterns

Passing lints != Importable code

# The lint-free code is in ./rosa/nolints.py
try:
    import rosa.nolints
except FileNotFoundError as e:
    print(e)
## [Errno 2] No such file or directory: 'data_LGIS.txt'
# Hardcoded file in global-env at the offending line:
grep "data_LGIS.txt" rosa/nolints.py
## Searches the working directory for a file called "data_LGIS.txt". The
## STREAM = open("data_LGIS.txt", "r").read().splitlines()

Automate running your tests: pytest

Make the file importable / testable

Made another python package: rosa_testable

  • lgis.py now contains a guard-block around the global-env code
cat rosa_testable/*/lgis.py | grep -A7 "__main__"
if __name__ == "__main__":
    # This block only runs when this file is called as a script ...
    STREAM = open("data_LGIS.txt", "r").read().splitlines()
    MAXNUM = int(STREAM[0])
    PERM = [int(x) for x in STREAM[1].split()]
    print(" ".join([str(x) for x in lgis(PERM)]))
    print(" ".join(list(reversed([str(x) for x in lgis(list(reversed(PERM)))]))))

Make the function importable / testable

Also, we ensured the function was exported by the rosa_testable package:

cat rosa_testable/rosa_testable/__init__.py
"""rosa_testable - -"""

# user can now `from rosa_testable import lgis`
# .. rather than `from rosa_testable.lgis import lgis`
# .. they needn't know which file (lgis.py) the function is stored in
from rosa_testable.lgis import lgis

__version__ = '0.1.0'
__author__ = 'Russell Hyde <me AT somewhere.uk>'
__all__ = []

Check importability

pip install -e rosa_testable # [bash]
Obtaining file:///home/ah327h/workshops/bfx_201903/rosa_testable
Installing collected packages: rosa-testable
  Found existing installation: rosa-testable 0.1.0
    Uninstalling rosa-testable-0.1.0:
      Successfully uninstalled rosa-testable-0.1.0
  Running setup.py develop for rosa-testable
Successfully installed rosa-testable
import rosa_testable as rt # [python] successful import
print(rt.lgis([1,4,3,6,7,2,5]))
## [1, 3, 6, 7]

Preliminary tests

cat rosa_testable/tests/test_lgis.py
from rosa_testable import lgis

def test_it_isnt_greedy():
    assert lgis([1, 4, 2, 3, 5]) != [1, 4, 5]

def test_increasing_sequence():
    assert lgis([1, 3, 4, 2, 5, 7, 4]) == [1, 3, 4, 5, 7]
pytest rosa_testable --quiet # run all tests in rosa_testable/tests
..                                                                       [100%]
2 passed in 0.01 seconds

Automate test-case generation: hypothesis

hypothesis

Link

  • Randomly create your test data
from hypothesis import given, strategies as st
@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        max_size=1)
    )
def test_trivial_sequences(my_ints):
    assert my_ints == rt.lgis(my_ints)

This generates

  • lots of separate integer-lists (st.lists(st.integers, ....))

  • each list contains at-most one element (... max_size=1)

  • each integer is \(\geq1\) (st.integers(min_value=1))

But …

try:
    test_trivial_sequences()
except Exception as e:
    print(e)
## Falsifying example: test_trivial_sequences(my_ints=[])
## list index out of range

Randomised tests identified an edge-case failure for lgis: empty-list

Any data that fails should be converted into a unit test

Will this test pass for all input?

@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        min_size=2,
        max_size=5
    )
)
def test_with_sorted_input(my_ints):
    assert sorted(my_ints) == rt.lgis(sorted(my_ints))

… No!

try:
    test_with_sorted_input()
except Exception as e:
    print(e)
## Falsifying example: test_with_sorted_input(my_ints=[1, 1])

This was a human error

Moral: You still have to think about the tests and the test-data

Fix: ensure there are no repeated numbers in the input

@given(
    my_ints=st.lists(
        st.integers(min_value=1),
        min_size=2,
        max_size=5,
        unique=True
    )
)
def test_with_sorted_unique_input(my_ints):
    assert sorted(my_ints) == rt.lgis(sorted(my_ints))
test_with_sorted_unique_input() # the test now passes

Use-cases for randomised testing

  • Identify edge-case failures that you hadn’t considered

  • Compare a fancy algorithm to a brute-force solution (for small input)

  • Check the speed of an algorithm

  • Provides alternative, property-based, viewpoint, eg here:

    • the output should be increasing

    • the output should be a subsequence of the input

    • If we dovetail two sequences, the output should be no shorter than that for either sequence

Automate documentation generation: sphinx / click

Documentation

  • The tools above improve your efficiency

  • Good docs improve the effiency of the user and your collaborators (incl, future you)

  • Many levels:

    • Code comments & Exception / error handling

    • [click / argparse]

      • --help / usage strings
    • [sphinx / ReadTheDocs]

      • API / program reference
      • Tutorials / Cheatsheets / Cookbooks etc

Docstrings \(\rightarrow\) Sphinx \(\rightarrow\) Documentation

Google-formatted docstrings used here. Alternative formats.

grep -A10 "def lgis" rosa_testable/*/lgis.py
def lgis(perm):
    """Obtain the longest increasing subsequence from a list of
    positive integers

    Args:
        perm (list(int)): A list of integers within which the longest
        increasing subsequence is to be found.

    Returns: A list of integers, a subsequence of the input list of
        integers.
    """

Sphinx setup :0(

The workflow for generating docs with sphinx looks like:

# 1) - Setup a package structure
# 2) - Add some source code, with docstrings associated with the
# modules / classes / functions

# 3) Initialise the `docs` file structure
cd my_pkg
mkdir docs && cd docs
sphinx-quickstart # answer lots of questions
sphinx-apidoc -o source ../my_pkg
make html

# 4) Modify source/conf.py, swear a lot, question your sanity and try
# again

# 5) Ask Russ "What happened to all the automated tools you were going
# to talk about?"

sphinx-quickstart options

Sam Nicholls on sphinx setup

Try a full-featured python-package cookie-cutter

Detailed package structure

tree detailed_py_package
detailed_py_package
├── CONTRIBUTING.rst
├── detailed_py_package
│   ├── cli.py
│   ├── detailed_py_package.py
│   └── __init__.py
├── docs
│   ├── conf.py
│   ├── contributing.rst
│   ├── history.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── make.bat
│   ├── Makefile
│   ├── readme.rst
│   └── usage.rst
├── HISTORY.rst
├── LICENSE
├── Makefile
├── MANIFEST.in
├── README.rst
├── requirements_dev.txt
├── setup.cfg
├── setup.py
├── tests
│   └── test_detailed_py_package.py
└── tox.ini

3 directories, 23 files

Made a sphinx-ready package

# use `make install` in normal use
pip install -e rosa_with_docs
## Obtaining file:///home/ah327h/workshops/bfx_201903/rosa_with_docs
## Requirement already satisfied: Click>=6.0 in /home/ah327h/tools/miniconda3/envs/bfx_201903/lib/python3.6/site-packages (from rosa-with-docs==0.1.0) (7.0)
## Installing collected packages: rosa-with-docs
##   Found existing installation: rosa-with-docs 0.1.0
##     Uninstalling rosa-with-docs-0.1.0:
##       Successfully uninstalled rosa-with-docs-0.1.0
##   Running setup.py develop for rosa-with-docs
## Successfully installed rosa-with-docs

Make the docs

# Not ran while building presentation
(cd rosa_with_docs && make docs)

(Not on rpubs) Link to documentation

[example output]
rm -f docs/rosa_with_docs.rst
rm -f docs/modules.rst
sphinx-apidoc -o docs/ rosa_with_docs
Creating file docs/rosa_with_docs.rst.
Creating file docs/modules.rst.
make -C docs clean
make[1]: Entering directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
Removing everything under '_build'...
make[1]: Leaving directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
make -C docs html
make[1]: Entering directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
Running Sphinx v1.8.4
making output directory...
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 9 source files that are out of date
updating environment: 9 added, 0 changed, 0 removed
reading sources... [100%] usage
/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs/index.rst:2: WARNING: Title underline too short.

Welcome to rosa_with_docs's documentation!
======================================
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] usage
generating indices... genindex py-modindex
highlighting module code... [100%] rosa_with_docs.lgis
writing additional pages... search
copying static files... WARNING: html_static_path entry '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs/_static' does not exist
done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 2 warnings.

The HTML pages are in _build/html.
make[1]: Leaving directory '/home/ah327h/workshops/bfx_201903/rosa_with_docs/docs'
python -c "$BROWSER_PYSCRIPT" docs/_build/html/index.html

Easy --help strings / arg-parsing: click

Another use of docstrings

Command line programs:

  • Standard solution is argparse

  • click

    • is less verbose
    • decorators define commands / arguments
    • docstrings define help messages
# Wanted
<program_name> <command_name> [arguments]

Implementation with click

cat rosa_cli/rosa_cli/cli.py # I have renamed the `lgis` function
# -*- coding: utf-8 -*-

"""Console script for rosa_cli."""
import sys
import click

from . import longest_increasing_subsequence, longest_decreasing_subsequence


@click.group()
def main():
    """Console script for rosa_cli."""
    return 0


@main.command(
    "lgis", short_help="Longest-monotonic subsequences"
)
@click.argument(
    "filepath", metavar="<filepath>"
)
def lgis(filepath):
    """This command prints the longest increasing and the longest decreasing
    subsequence from a sequence of positive integers.
    """
    stream = open(filepath, "r").read().splitlines()
    perm = [int(x) for x in stream[1].split()]
    print(" ".join([str(x) for x in longest_increasing_subsequence(perm)]))
    print(" ".join([str(x) for x in longest_decreasing_subsequence(perm)]))


if __name__ == "__main__":
    sys.exit(main())  # pragma: no cover

Main program help-string

rosa_cli --help
Usage: rosa_cli [OPTIONS] COMMAND [ARGS]...

  Console script for rosa_cli.

Options:
  --help  Show this message and exit.

Commands:
  lgis  Longest-monotonic subsequences

Subcommand help-string

rosa_cli lgis --help
Usage: rosa_cli lgis [OPTIONS] <filepath>

  This command prints the longest increasing and the longest decreasing
  subsequence from a sequence of positive integers.

Options:
  --help  Show this message and exit.

Run the tool

echo 6 > temp.txt
echo "1 5 3 2 6 5" >> temp.txt

rosa_cli lgis temp.txt
1 2 6
3 2

Comparison to R tools

Job Python R
IDE PyCharm Rstudio
Styling black styler
Linting pylint / flake8 lintr
Testing pytest / hypothesis testthat / hedgehog
Packaging cookiecutter / setuptools devtools / usethis
Docs sphinx / restructuredText roxygen2 / rmarkdown
CLI click / argparse optparse