Python for ecologists: Glossary

Key Points

Short Introduction to Programming in Python
Starting With Data
Indexing, Slicing and Subsetting DataFrames in Python
Data Types and Formats
Combining DataFrames with pandas
Data workflows and automation
Making Plots With ggplot
Data Ingest & Visualization - Matplotlib & Pandas
Accessing SQLite Databases Using Python & Pandas

Glossary

0-based indexing: is a way of assigning indices to elements in a sequential, ordered data structure starting from 0, i.e. where the first element of the sequence has index 0.
CSV (file): is an acronym which stands for Comma-Separated Values file. CSV files store tabular data, either numbers, strings, or a combination of the two, in plain text with columns separated by a comma and rows by the carriage return character.
database: is an organized collection of data.
dataframe: is a two-dimensional labeled data structure with columns of (potentially) different type.
data structure: is a particular way of organizing data in memory.
data type: is a particular kind of item that can be assigned to a variable, defined by by the values it can take, the programming language in use and the operations that can be performed on it.
dictionary: is an unordered Python data structure designed to contain key-value pairs, where both the key and the value can be integers, floats or strings. Elements of a dictionary can be accessed by their key and can be modified.
docstring: is an optional documentation string to describe what a Python function does.
faceting: is the act of plotting relationships between set variables in multiple subsets of the data with the results appearing as different panels in the same figure.
float: is a Python data type designed to store positive and negative decimal numbers by means of a floating point representation.
function: is a group of related statements that perform a specific task.
integer: is a Python data type designed to store positive and negative integer numbers.
interactive mode: is an online mode of operation in which the user writes the commands directly on the command line one-by-one and execute them immediately by pressing a button on the keyword, usually Enter.
join key: is a variable or an array representing the column names over which pandas.DataFrame.join() merge together columns of different data sets.
library: is a set of functions and methods grouped together to perform some specific sort of tasks.
list: is a Python data structure designed to contain sequences of integers, floats, strings and any combination of the previous. The sequence is ordered and indexed by integers, starting from 0. Elements of a list can be accessed by their index and can be modified.
loop: is a sequence of instructions that is continually repeated until a condition is satisfied.
NaN: is an acronym for Not-a-Number and represents that either a value is missing or the calculation cannot output any meaningful result.
None: is an object that represents no value.
scripting mode: is an offline mode of operation in which the user writes the commands to be executed in a text file (with .py extension for Python) which is then compiled or interpreted to run the program. Notes that Python interprets script on run-time and compiles a binary version of the program to speed up the execution time.
Sequential (data structure): is an ordered group of objects stored in memory which can be accessed specifying their index, i.e. their position, in the structure.
SQL: or Structured Query Language, is a domain-specific language for managing data stored in a relational database management system (RDBMS).
SQLite: is a self-contained, public domain SQL database engine.
string: is a Python data type designed to store sequences of characters.
tuple: is a Python data structure designed to contain sequences of integers, floats, strings and any combination of the previous. The sequence is ordered and indexed by integers, starting from 0. Elements of a tuple can be accessed by their index but cannot be modified.