Files and folders control

Digital ecosystem to organize and keep track of your projects

Marco Dalla Vecchia

IGBMC

9/12/22

Project management

  1. Use knowledge and strategy to achieve a goal within certain constraints (alone or with a team)
  2. Manage the components of a project → not only people but also parts and resources

Files and Folders management

In the modern world most project have an extremely heavy digital component, often combined with huge amounts of data!

Many components to take into account

  • How many projects at the same time?
  • Are they separated or do they interact?
  • What is a structure of a project?
  • Where to place files? How to make them interact?
  • How to deliver/transfer the project?
  • How to communicate about a project?
  • How to reproduce a project?

Why is it important?

  1. How easily can you find a file/folder older than one year?
  2. How easily can you reconnect all the parts of a project? (i.e. find the data, analysis, manuscript etc..)
  3. How easily can you reproduce the project?


We need a digital organization!

Common digital organizations are sub-optimal

Let’s consider data, metadata and analysis acquired and processed during a PhD.

Typical situation

%%{init: {'theme': 'dark'} }%%
graph TB
subgraph Research
a1[Research Question] --> b1[Design Experiment]
b1 --"Run Experiment"--> c1[Get Data]
c1 --Data Analysis--> d1[Obtain Insights]
d1 --> a1
end

subgraph Experiment
a2[Biological model] ----- exp[Experiment]
b2[Sample Prep] ---- exp
c2[Experimental Method] --- exp
exp --> Data
end

By experiment type


What if experiment has multiples types? (Same sample for qPCR and WB?)

By date


What if experiment was done in multiple days? (Multiple sampling or start and end data far apart)

By topic


What if data/experiment belongs to different topics?

A better organization: by project!

What is an experiment?

  • We do experiments every day
  • It doesn’t have to be in a lab
  • Every attempt, test or analysis is an experiment!
  • If useful, separate biological from computational experiments

When you try something new, don’t work on top of a previous test, start a new experiment

Advantages of project-driven structure:

  • Self-contained
  • Easily shared or transferred
  • Can be reproduced
  • Better documentation
  • Easier to find
  • Driven by overarching goal / research question

Leverage this structure!

  • Every part is self-standing and self-documented
  • Don’t place parts elsewhere → this is THE project
  • Don’t create copies of the data or output
  • Don’t bend the structure
    • If you see it’s not for you, stop and change it

Important points:

  • Each unique project and experiment should have unique ID:
    • Typically: data + code
  • Make documentation along the way!
    • Sample prep
    • Acquisition
    • Analysis

Important points (continued):

  • Experiments are linked together and directly to presentation/reports/publications
  • Leave an output for most tests (even if you won’t use it)
  • Be descriptive in the names and leave documentation!
  • Don’t use special characters in names (only inside documentation)

Digital tools of the trade

Although basic folders and files organization can do the job, there are some tools out there to help you!

Jupyter notebooks

Creation of digital computational documents.

Incredible combination of:

  • Code
  • Documentation
  • Compatible with over 100 different programming languages (R, Python, Java, …)

Allows for a reproducible, self-documented analysis.

LaTeX

  • High-quality type-setting system for the creation of professional documents → PhD thesis, publications, conference presentations / posters.

  • Best you can have for creation of PDF both for customization and quality.

Markdown

Simple and easy-to-use markup language to create quick, well-structured and good-looking documents.

  • Basic formatting
  • Super simple syntax
  • Simple text
  • Fully integrated with many tools (Jupyter, Github, websites, formulas, diagrams …)

Pandoc

A universal document converter.

  • Can convert almost every text format to every other text format!
  • Create PDF from markdown
  • Create a HTML from LaTeX
  • Create HTML page from Jupyter notebook!
  • much more!

Quarto

An open-source scientific and technical publishing system built on Pandoc!

  • Create PDF / Word / HTML documents
  • Make presentations
  • Articles
  • Entire blogs / websites!

All from simple markdown source!

File manipulations and analysis

Requires scripting or heavy documentation

  • ImageJ macros
  • Jupyter notebooks
  • R + RMarkdown

Communication and outreach

Presentation, reports, publications etc..

  • LaTeX
  • Markdown
  • HTML

Quarto can do all in one for you!

A common theme

Use text-based programs instead of self-contained software (i.e. no Microsoft office, yes Notepad).

Allows for:

  • Fetching external files (no embedding)
  • On-the-spot re-compilation / update
  • Easy to read and access (no proprietary formats)
  • Lighter files
  • Easy Git integration (see next)!

DEMO

Digital ecosystem for digital projects

  1. It all starts with a good project structure
  2. Professional documentation and publications
  3. Simple, light-weight files and softwares
  4. Create documentation while you work
  5. Keep all that matters in the same place

Version control

What is version control?

  • The most robust paradigm to structure and organize your projects
  • A system that allows to effectively track changes and merge files
  • A hub to connect project developers
  • A platform to share locally (your computer folder) and remotely your projects

Version control is mostly used in the context of software development, but it can be used in the development of any well-structured digital project!

Why version control?

Let’s say you want to write a document with a team (maybe an assignment or a publication)




Has this ever happened to you?

You start the project and give it a version number

You send it to Bob and Ana to work on their parts

In the meantime you continue working on your part!

So many versions of the same file!

Version control allows you to:

  • Define and design project with structure: which parts and how they interact
  • Avoid thousands of versions and their storage for the same file(s)
  • Document changes as the files are built and go back to previous versions
  • Collaborate effectively with other people when making projects
  • Backup your work if you use remote repositories

Each part (or feature) is colored differently.

Each user gets access to its own version

Each user will work only on what previously agreed!

Ana’s work will add her feature to the original file

Bob’s work will modify the feature of the original file

Version control tools

Version control systems

  • Git
  • Mercurial
  • SVN Apache

Managers of remote repositories

  • GitHub
  • GitLab
  • Bitbucket
  • SourceForge

Git works locally

Use it from the Git terminal/bash

Software alternatives with integrated Graphics User Interface (GUI)

Version control main concepts

Repository

  • A folder that is version controlled
  • Has a ‘.git’ folder inside
  • It can be either local or remote

Commits

  • A ‘checkpoint’ of your work
  • Commit frequently and descriptively
  • You can move back and forth between commits

Branches

  • A separate ‘line of work
  • Allow to test new features
    • Discard easily if bad
    • Merge with main work if good

Basic workflow



$ git add .
$ git commit -m "commit message"
$ git push origin main

%%{init: {'theme': 'dark'} }%%
graph TB
A(Work on files)--> B(Add files to keep track);
B-->C(Commit checkpoints of work);
C-->A;
C-->D(Push to remote);

Git use case examples

  • Manuscript submission
  • Complex figure making
  • New features in pipeline
  • Variable inputs for analysis

Thank you

Files and folders control: Digital ecosystem to organize and keep track of your projects


Marco Dalla Vecchia

Scan the QR code to get the slides from Gitlab!