This is a repository of data, R scripts, and codes for Linux-based commands for the bioinformatics analyses of amplicon sequencing data from fungi and bacteria microbiomes within the nectar samples of Sticky Monkeyflower (Diplacus aurantiacus).
The repository is comprised of 4 folders:
01_Data folder includes two CSV files:
- "sampling_sheet_regional_survey_2015_final_corrected.csv": metadata for the DNA samples, documenting the site ID, plant ID, flower ID of each sample, as well as the corresponding concentration of fungi and bacteria unit forming colonies (CFUs) in each sample.
- "2015_survey_siteinfo_location_envi.csv": documents the environmental data and coordinates for each flower.
- "Wu_Metagenome.environmental.1.0_B_N_SUB13559541.xlsx": documents the biosample information for each flower for sample names started from B to N
- "Wu_Metagenome.environmental.1.0_O_S_SUB13567828.xlsx": documents the biosample information for each flower for sample names started from O to S
- "SRA_metadata_site_started_with_B_N.xlsx": documents the sequencing information for each fastq file, as related to the biosample info in "Wu_Metagenome.environmental.1.0_B_N_SUB13559541.xlsx".
- "SRA_metadata_site_started_with_O_S.xlsx": documents the sequencing information for each fastq file, as related to the biosample info in "Wu_Metagenome.environmental.1.0_O_S_SUB13567828.xlsx".
02_Rscripts includes R script used in the bioinformatics analyses:
- "make_Map_20230413.Rmd": R code that makes the Figure 1 map, showing the distributions and locations of samples.
- "Bioinformatics_ITS1_DADA2_CONSTAXtaxa_20230202.Rmd": R code that implements the Dada2 pipeline on fungi ITS1 sequences.
- "Bioinformatics_16S_DADA2_SILVAtaxa_20230207.Rmd": R code that implements the Dada2 pipeline on bacteria 16S sequences.
- "make_phyloseq_objects_&_run_CLAM_test_20220203.Rmd": R code that generates phyloseq objects for ITS1 and 16S sequences, respectively; the code also uses the data from plantings -- densities of bacteria and fungi colony-forming units (CFUs) to categorize whether the nectar samples are (1) bacteria-dominated flowers, (2) fungi-dominated flowers, (3) co-dominated flowers, and (4) flowers with too few microbes to be classified into any of the three other groups.
- "diversity_analyses_fungi_threshold=clam_20230413.Rmd": R code that analyzes the alpha and beta diversity of fungi sequences (ITS1); data analyses include pairwise two-sample permutation tests for alpha diversity, permutational multivariate ANOVA for species composition, differential abundance analyses, etc.
- "diversity_analyses_bacteria_threshold=clam_20230413.Rmd": R code that analyzes the alpha and beta diversity of bacteria sequences (16S); data analyses similar to those included in "diversity_analyses_fungi_threshold=clam_20230413.Rmd".
- "ASVlevel_Co_occurence_network_NetCoMi_pearson_sparcc_r0.1_clam_20220429.rmd": R code that conducts co-occurrence network analyses for fungi and bacteria sequences using Pearson correlation network and SparCC (Sparse Correlations for Compositional data) network method.
- "ASVlevel_co_occurence_network_NetCoMi_spieceasi_r0.1_clam_20220504.rmd": R code that conducts co-occurrence network analyses for fungi and bacteria sequences using SPIEC-EASI (Sparse InversE Covariance estimation for Ecological Association and Statistical Inference) network method.
03_Output includes key output files from the bioinformatics pipeline:
- "ITS1.unpooled.ASVs.fa": fasta file that documents the representative sequence for each fungi ITS1 ASV (amplicon sequence variant);
- "16S.unpooled.ASVs.fa": fasta file that documents the representative sequence for each bacteria 16S ASV (amplicon sequence variant);
- "Appendix1_ITS1_fungi_ASV.count.ordered.csv": csv file that documents the total counts of reads, relative abundance within all samples, and species taxonomy of each fungi ITS1 ASV.
- "Appendix2_16S_bacteria_ASV.count.ordered.csv": csv file that documents the total counts of reads, relative abundance within all samples, and species taxonomy of each bacteria 16S ASV.
04_Docs includes a file that documents other bioinformatics steps not conducted in R:
- "Wu_Monkeyflower_bioinformatics_steps.docx": a docx file that records the bioinformatics steps from demultiplexing to species assignment, based in Linux environment.