File and Folder Organization

Ensuring consistent and clear file naming is critical to ensure that everyone on the project can find the data they need. Here’s some best practices to aim for.

Folder Organization

  1. Git Repo First: For a project, the top folder (or root directory) should be a GitHub repo to ensure version control and backups.

  2. Limited Folders in Root Directory: Within the repository, aim to have folders with the main categories of files that may be stored in the project (i.e. data, code, communication, presentations, writing, figures). You should have less than 10 of these folders.

  3. Nested Folders: Within each of your main folders, create additional sub-folders with descriptive names. For example, within the ‘data’ folder you may have a folder for each unique type of data. Then within each of those folders you may have ‘raw-data’, ‘processed-data’, ‘metadata’.

  4. Avoid spaces or special characters: Use “-” or “_” between words instead of spaces in folder names.

  5. Descriptive Names: Folder names should contain information that leads to easy retrieval and identification. Assume that you’ll forget what’s in the folder immediately after you create the file name when you name it. Try to use a name that will be descriptive to other people as well as yourself. But don’t overdo it–avoid extra‐long folder names.

  6. Separate by Date: If you have files that you save periodically, you may consider creating folders by year or year and month

  7. Set the Folder Order: If you’d like your folders to show up in a specific order, you can use number prefixes (01-, 02-) to force the order instead of allowing them to be organized alphabetically.

File Organization

  1. Avoid special characters ( .  / : * ? “ < > | [ ] & $): These characters are frequently used for specific tasks in an electronic environment and, therefore, should not be included in files names.

  2. Avoid blank spaces: these can cause issues with some programs. Use “-” or “_” between words instead of spaces.

  3. Be concise:

    • Avoid using function words (i.e. a, and, of, the, to)

    • Use keywords to describe the file contents

    • Standard abbreviations can be used to reduce file name length

  4. Descriptive File Names: Ensure the file name is descriptive enough to know what it is without the file structure. Files are frequently copied to other folders, downloaded, and emailed. It is important to ensure that the file name, independent of the folder where the original file lives, is sufficiently descriptive.

  5. File Version-ing: If your file lives on GitHub (which is should) or Office Online (for shared Office files), do not include dates or other methods of keeping track of when a file was created, who last edited the file, or what version a file is. This is what Git/GitHub is for.

  6. Don’t Duplicate Files: Avoid making copies of the same file within a folder as this can make it confusing to determine which the most current. Instead make changes to the original file and commit changes to the file (frequently), knowing you can revert back to an older version if needed.

Git/GitHub Best Practices

When to push/pull

  • You should pull right before you’re about to make edits to a file. This ensures you will be working off the most up to date version.

  • You should also pull right before you commit. This again ensures you are committing your changes to the most up to date version.

  • You should push directly after you commit your changes. When you commit, you are sending your changes into the version control system on your local machine. Pushing then sends those changes up to GitHub for collaborators to be able to view and access.

Committing Changes

The first six minutes of this video gives a great explanation of what a commit message is, and what not to do.

Basically:

  • Commit Often: This helps you follow what you did and revert back more easily if needed. A commit should only contain changes from one issue or one file.

  • Describe what you did in the commit message: Have a descriptive header for the commit but then add details of why you made changes or what you changed. This is super helpful when looking back to see what and why something changed.

  • Once you commit, make sure to also push: Commit saves the changes on your local computer, but if you’re working on a project with others, they won’t see those changes unless you push.

Special Files

.gitignore

This is a special file created when you start tracking a project with git. Anything in this file will be ignored by git and not tracked. So be careful what goes in here. A great starting point is to go to gitignore.io and generate a .gitignore file for your operating system (i.e. Windows, Mac OS) and microsoft office which will ignore files that your computer automatically generates to run that are not needed.

  • If you have a folder with lots of large datasets or large file you may also want to add that to this file. GitHub has a limit of 100 MB per file, so if you commit large files, you won’t be able to push your repository to GitHub. If you do this, ensure these files are backed up elsewhere, like Box [see the frequent issues tutorial].

  • Adding files or folders to the this file can be done easily from GitHub desktop. When looking at the changed files on the left side, simply right click on the file and select “ignore file” or “ignore folder”.

  • Another great this to add to .gitignore is the line

README.md

This is the landing page for your repository and will be the first thing that people see when then open the repository on GitHub. At a minimum your readme file should contain:

  • A summary of the project: so others know what the project is about

  • Contact name and email: so people who know to reach out to if they have questions

  • This is also a great place to store links to important folders or files that aren’t stored on GitHub.

    • As most of your files should be on GitHub to take advantage of the version control/backup. These should only be files where you need to take advantage of Office online (excel sheets, word documents) where multiple people may need to work on them together.

Sources and Further Reading