Compare means from baredSC results

baredSC outputs can be used to evaluate the fold-change of expression between 2 conditions.

Inputs

We use the same inputs as in previous input descriptions.

Run baredSC on each subpopulation

Let’s focus on cells of group 0.0 and group 3.0 (they correspond to each of the 2 gaussians found previously). To increase the speed we will use less bins (--nx 25):

$ for group in 0 3; do
    for nnorm in 1 2; do
      baredSC_1d \
        --input example/nih3t3_generated_2d_2.txt \
        --metadata1ColName 0.5_0_0_0.5_group \
        --metadata1Values ${group}.0 \
        --geneColName 0.5_0_0_0.5_x \
        --output example/first_example_1d_group${group}_${nnorm}gauss_25_neff200 \
        --nnorm ${nnorm} --nx 25 --minNeff 200 \
        --figure example/first_example_1d_group${group}_${nnorm}gauss_25_neff200.png \
        --title "first gene ${nnorm} gauss group${group} 25 bins neff200" \
        --logevidence example/first_example_1d_group${group}_${nnorm}gauss_25_neff200_logevid.txt
    done
    combineMultipleModels_1d \
      --input example/nih3t3_generated_2d_2.txt \
      --metadata1ColName 0.5_0_0_0.5_group \
      --metadata1Values ${group}.0 \
      --geneColName 0.5_0_0_0.5_x --nx 25 \
      --outputs example/first_example_1d_group${group}_1gauss_25_neff200 \
      example/first_example_1d_group${group}_2gauss_25_neff200 \
      --figure example/first_example_1d_group${group}_1-2gauss_25.png \
      --title "first gene group$group 25 bins 1 and 2 gauss"
  done

In one case, 100 000 samples were not sufficient.

We check the QC. We can now compare the results:

../../_images/first_example_1d_group0_1-2gauss_25.png ../../_images/first_example_1d_group3_1-2gauss_25.png

We can see that the tool fits relatively nicely the gaussians which were in inputs.

If we use another metadata which is just 300 and 500 random cells:

$ for group in 1 2; do
    for nnorm in 1 2 3; do
      baredSC_1d \
        --input example/nih3t3_generated_2d_2.txt \
        --metadata1ColName group \
        --metadata1Values group${group} \
        --geneColName 0.5_0_0_0.5_x \
        --output example/first_example_1d_cellgroup${group}_${nnorm}gauss_25_neff200 \
        --nnorm ${nnorm} --nx 25 --minNeff 200 \
        --figure example/first_example_1d_cellgroup${group}_${nnorm}gauss_25_neff200.png \
        --title "first gene ${nnorm} gauss cellgroup${group}" \
        --logevidence example/first_example_1d_cellgroup${group}_${nnorm}gauss_25_neff200_logevid.txt
    done
    combineMultipleModels_1d \
      --input example/nih3t3_generated_2d_2.txt \
      --metadata1ColName group \
      --metadata1Values group${group} \
      --geneColName 0.5_0_0_0.5_x --nx 25 \
      --outputs example/first_example_1d_cellgroup${group}_1gauss_25_neff200 \
      example/first_example_1d_cellgroup${group}_2gauss_25_neff200 \
      example/first_example_1d_cellgroup${group}_3gauss_25_neff200 \
      --figure example/first_example_1d_cellgroup${group}_1-3gauss_25.png \
      --title "first gene cellgroup$group 25 bins 1, 2 and 3 gauss"
  done

We see that the results are different in both groups but in both cases the confidence interval is quite large:

../../_images/first_example_1d_cellgroup1_1-3gauss_25.png ../../_images/first_example_1d_cellgroup2_1-3gauss_25.png

One of the output is *_means.txt.gz. Each line correspond to the value of the mean expression evaluated at each sample of the MCMC. It can be used to estimate:

  • A confidence interval on the mean value (in the used axis log(1 + 10^4 * expression))

  • A confidence interval on the fold change (delogged).

In [1]: import numpy as np
   ...: def delog(x):
   ...:     return(1e-4 * (np.exp(x) - 1))
   ...: my_quantiles = [0.5 - 0.6827 / 2, 0.5, 0.5 + 0.6827 / 2]
   ...: # Get data
   ...: means1 = np.genfromtxt('../example/first_example_1d_group0_1-2gauss_25_means.txt.gz')
   ...: print(f'Values of mean in group0: {means1}')
   ...: means2 = np.genfromtxt('../example/first_example_1d_group3_1-2gauss_25_means.txt.gz')
   ...: print(f'Values of mean in group3: {means2}')
   ...: # Shuffle means2
   ...: np.random.shuffle(means2)
   ...: print(f'Mean log expression in group0: {np.quantile(means1, my_quantiles)}')
   ...: print(f'Mean log expression in group3: {np.quantile(means2, my_quantiles)}')
   ...: fc = [delog(x1) / delog(x2) for x1, x2 in zip(means1, means2)]
   ...: print(f'Estimation of fold-change: {np.quantile(fc,  my_quantiles)}')
   ...: 
Values of mean in group0: [0.35776993 0.35776993 0.35776993 ... 0.34672964 0.34672964 0.38877004]
Values of mean in group3: [1.00746002 1.02085388 1.02085388 ... 1.00369734 1.01363153 1.02256951]
Mean log expression in group0: [0.33838793 0.34953488 0.36060358]
Mean log expression in group3: [0.99479228 1.00628287 1.01751424]
Estimation of fold-change: [0.23110761 0.24112418 0.25130651]

We can use matplotlib to display the results graphically:

In [2]: import matplotlib.pyplot as plt
   ...: x1, x2 = np.random.normal(1, 0.1, len(means1)), np.random.normal(3, 0.1, len(means2))
   ...: fig, axs = plt.subplots(1, 2)
   ...: axs[0].scatter(x1, means1, s=1, alpha=0.01)
   ...: axs[0].scatter(x2, means2, s=1, alpha=0.01)
   ...: axs[0].set_ylim(0, )
   ...: axs[0].set_ylabel('log(1 + expression * 10\'000)')
   ...: axs[0].set_xticks([1, 3])
   ...: axs[0].set_xticklabels(['group0', 'group3'], fontsize='x-large')
   ...: axs[1].scatter(x1[:len(fc)], fc, s=1, alpha=0.01)
   ...: axs[1].set_xticks([1])
   ...: axs[1].set_xticklabels(['fold-change'], fontsize='x-large')
   ...: axs[1].set_ylim(0, )
   ...: axs[1].set_ylabel('Fold-change (not log transformed)')
   ...: fig.tight_layout()
   ...: 
../../_images/fc_group.png

On the first example with group0 and group3 where each is a different gaussian, we see a mean log expression around the expected 0.375 value in group 0, 1 in group 3. We see that the fold-change is around 25% (this is the real fold-change, not log transformed).

Now if we have a look to the subsamples of cells (300 and 500 cells) and perform the same analysis:

In [3]: # Get data
   ...: means1 = np.genfromtxt('../example/first_example_1d_cellgroup1_1-3gauss_25_means.txt.gz')
   ...: print(f'Values of mean in cellgroup1: {means1}')
   ...: means2 = np.genfromtxt('../example/first_example_1d_cellgroup2_1-3gauss_25_means.txt.gz')
   ...: print(f'Values of mean in cellgroup2: {means2}')
   ...: # Shuffle means2
   ...: np.random.shuffle(means2)
   ...: print(f'Mean log expression in cellgroup1: {np.quantile(means1, my_quantiles)}')
   ...: print(f'Mean log expression in cellgroup2: {np.quantile(means2, my_quantiles)}')
   ...: fc = [delog(x1) / delog(x2) for x1, x2 in zip(means1, means2)]
   ...: print(f'Estimation of fold-change: {np.quantile(fc,  my_quantiles)}')
   ...: 
Values of mean in cellgroup1: [0.66035127 0.66035127 0.62547552 ... 0.62783495 0.62783495 0.62783495]
Values of mean in cellgroup2: [0.68203872 0.69214061 0.71198453 ... 0.70698152 0.70698152 0.70698152]
Mean log expression in cellgroup1: [0.6161945  0.64620379 0.67630369]
Mean log expression in cellgroup2: [0.66763529 0.69110072 0.71426981]
Estimation of fold-change: [0.8423754  0.91214824 0.9865969 ]
In [4]: x1, x2 = np.random.normal(1, 0.1, len(means1)), np.random.normal(3, 0.1, len(means2))
   ...: fig, axs = plt.subplots(1, 2)
   ...: axs[0].scatter(x1, means1, s=1, alpha=0.01)
   ...: axs[0].scatter(x2, means2, s=1, alpha=0.01)
   ...: axs[0].set_ylim(0, )
   ...: axs[0].set_ylabel('log(1 + expression * 10\'000)')
   ...: axs[0].set_xticks([1, 3])
   ...: axs[0].set_xticklabels(['cellgroup1', 'cellgroup2'], fontsize='x-large')
   ...: axs[1].scatter(x1[:len(fc)], fc, s=1, alpha=0.01)
   ...: axs[1].set_xticks([1])
   ...: axs[1].set_xticklabels(['fold-change'], fontsize='x-large')
   ...: axs[1].set_ylim(0, )
   ...: axs[1].set_ylabel('Fold-change (not log transformed)')
   ...: fig.tight_layout()
   ...: 
../../_images/fc_cellgroup.png

Here, we can see that the fold-change is slightly below 1.