logo Bilille


TP on Galaxy - SarTools

This TP was created in the context of the R Bilille training program in 2026.

Open usegalaxy.fr and log in.

1 Exercise 1

Differential analysis

  • create a new story

  • upload data

    • Target.txt, chose tsv while importing
    • Lobel2016Count.zip

The file Lobel2016Count.zip is composed of counts from the paper Lobel L, Herskovits AA (2016) Systems Level Analyses Reveal Multiple Regulatory Activities of CodY Controlling Metabolism, Motility and Virulence in Listeria monocytogenes. PLoS Genet 12(2): e1005870. doi:10.1371/journal.pgen.1005870.

The file Target.txt gets the description of the samples to be analysed in SarTools : 11 replicates for 2 conditions (6 WT vs 5 codY).

  • Visualize the target.txt file and check that

    • it gets 11 lines
    • it is in tsv or tabular format

1.1 Launch SarTools

Open the SARTools DESeq2 tool from the Tools panel.

  • fill in the Design with target.txt
  • fill in the Zip file with Lobel2016Count.zip
  • in ‘Factor of interest’, indicate strain, which corresponds to the third column of target file and contains conditions to compare
  • in ‘Reference biological condition’, indicate WT,which is the condition of reference
  • leave the other parameters and run the analysis


1.2 Read the report

  • open the report

  • look at the graphs on raw data

  • look at SERE / PCA and HC graphs : what can you observe ?

  • launch again the report replacing :

    • ‘Factor of interest’ -> medium
    • ‘Reference biological condition’ -> BHI
  • look at PCA : we confirm that medium separates samples on the first axis.

1.3 Launch SarTools with batch effect

As we observe that ‘medium’ has a strong effect on data, we have to take this into account while launching the analysis on strain. To do so, we launch again the analysis, including this effect as blocking factor.

In a new SarTools analysis :

  • fill in again wih strain and WT in ‘Factor of interest’ and ‘Reference biological condition’

  • click on ‘advanced parameters’

    • add a blocking factor -> Yes
    • blocking factor value -> medium
    • leave the other parameters
    • launch the tool

  • open the report :

    • PCA is still the same, OK

    • Analysis :

    • histogram of raw p-values is OK

    • there are more differentially expressed genes when taking the batch effect into account

    • independant filtering is performed

1.4 Conclusion

SarTools creates several files :

  • the report
  • the tables in an exportable format
  • the figures in an exportable format
  • the Rlog, allowing to check; especially if there are errors
  • the R objects, that allow you to import the whole analysis in R
  • The SarTools analysis can be performed on R directly, and offers more flexibility
  • If the condition has several modalities, SarTools will perform every comparison
  • If errors occurs, don’t hesitate to consult the Rlog file

2 Exercise 2

Differential and enrichment analyses

  • create a new story

  • upload data

    • metadata.txt, chose tsv while importing
    • Exo2.zip

This data comes from a previously published article, Hardiville et al. (2020).

In short : we want to compare 3 replicates of 3 different samples, wild-type cells (without mutation), T114A (a mutant with a Threonine-to-Alanine substitution at position 114) and S158A (a mutant with a Serine-to-Alanine substitution at position 158) of the TATA-Box Binding Protein.

The FastQ files have been analysed using RNA-Seq pipeline from NF-Core.

We provide you the count table generated by the pipeline.

T114A and S158A have been renamed Mutant A and Mutant B for an easier comprehension.
And a subset of genes was randomly selected to reduce the size of the data to import in Galaxy.

  • Visualize the metadata.txt file and check that

    • it gets 9 lines
    • it is in tsv or tabular format

2.1 Differential analysis

  • launch the analysis with condition and WT as ‘Factor of interest’ and ‘Reference biological condition’

It’s possible that the final job appears with errors / in red, but the report is generated anyway.

  • open the report

    • look at the graphs generated
    • 3 comparisons are performed, as there are 3 modalities in the condition variable

2.2 Enrichment analysis

We will perform the enrichment analysis in this web site.
No need to create an account, just to indicate your email address.

In this analysis, we will identify pathways in which genes are overrepresented among the list of overexpressed genes in the comparison Mutation A vs Mutation B.

2.2.1 Data preprocessing

The files generated by SarTools are not exploitable directly in Galaxy. We have to download them and re-import in Galaxy.

As we’re interested in overexpressed genes, we will retrieve the file indexed by ‘up’ in the results of the Mutation A vs Mutation B comparison.

  • click on the ‘View’ icon of the ‘SarTools tables’

  • right click on ‘MutationBvsMutationA.up.txt’ and save the file on your computer (enregistrer la cible du lien sous)

  • import this file in Galaxy :

    • use the ‘upload’ tool
    • specify tabular format
  • we have to get the first column with gene IDs

    • use the ‘Cut’ tool
    • in ‘Cut columns’, indicate c1
    • leave the others parameters and launch the tool

We then get the SYMBOL IDs of overexpressed genes.

2.2.2 Enrichment analysis

On the web site :

  • copy the list of genes from Galaxy
  • paste it on MSigDB website, on the left panel
  • select the pathways of interest (HALLMARK and Gene Ontology for example)
  • click on ‘compute overlaps’

In the results you can find :

  • the conversion report, as this tool needs the ENTREZID, and performed a conversion from SYMBOL IDs

  • the list of pathways for whom the genes are overrepresented in the list of differentially expressed genes

  • The overlap matrix between overexpressed genes and pathways

You can download the results in a tsv format.

2.2.3 Conclusion

You can perform the enrichment analysis on the MSigDB website.

This analysis can be performed on R directly, with more flexibility.