Specifying paths to FCS data

To get cracking, we need some flow cytometry data (i.e., .fcs files). A few FCS data files have been bundled with this python package. To use these files, enter the following in your ipython terminal:

In [1]: import FlowCytometryTools

In [2]: from FlowCytometryTools import test_data_dir, test_data_file

In [3]: datadir = test_data_dir

In [4]: datafile = test_data_file

datadir will point to a directory containing flow cytometry data while datafile will point to a specific FCS file.

To analyze your own data, simply set the ‘datafile’ variable to the path of your .fcs file:

>>> datafile=r'C:\data\my_very_awesome_data.fcs'

Windows Users Windows uses the backslash character (‘’) for paths. However, the backslash character is a special character in python that is used for formatting strings. In order to specify paths correctly, you must precede the path with the character ‘r’.

Good:

>>> datafile=r'C:\data\my_very_awesome_data.fcs'

Bad:

>>> datafile='C:\data\my_very_awesome_data.fcs'

Working with individual FCS files

Below is a brief description, for more information see FCMeasurement.

Loading the file

Once datadir and datafile have been defined, we can proceed to load the data:

In [5]: from FlowCytometryTools import FCMeasurement

In [6]: sample = FCMeasurement(ID='Test Sample', datafile=datafile)

Channel Information

Want to know which channels were measured? No problem.

In [7]: print sample.channel_names
(u'HDR-T', u'FSC-A', u'FSC-H', u'FSC-W', u'SSC-A', u'SSC-H', u'SSC-W', u'V2-A', u'V2-H', u'V2-W', u'Y2-A', u'Y2-H', u'Y2-W', u'B1-A', u'B1-H', u'B1-W')
In [8]: print sample.channels
                 $PnN  $PnB $PnG                  $PnE   $PnS    $PnR
Channel Number                                                       
1               HDR-T    32    1  [0.000000, 0.000000]  HDR-T  262144
2               FSC-A    32    1  [0.000000, 0.000000]  FSC-A  262144
3               FSC-H    32    1  [0.000000, 0.000000]  FSC-H  262144
4               FSC-W    32    1  [0.000000, 0.000000]  FSC-W  262144
5               SSC-A    32    1  [0.000000, 0.000000]  SSC-A  262144
6               SSC-H    32    1  [0.000000, 0.000000]  SSC-H  262144
7               SSC-W    32    1  [0.000000, 0.000000]  SSC-W  262144
8                V2-A    32    1  [0.000000, 0.000000]   V2-A  262144
9                V2-H    32    1  [0.000000, 0.000000]   V2-H  262144
10               V2-W    32    1  [0.000000, 0.000000]   V2-W  262144
11               Y2-A    32    1  [0.000000, 0.000000]   Y2-A  262144
12               Y2-H    32    1  [0.000000, 0.000000]   Y2-H  262144
13               Y2-W    32    1  [0.000000, 0.000000]   Y2-W  262144
14               B1-A    32    1  [0.000000, 0.000000]   B1-A  262144
15               B1-H    32    1  [0.000000, 0.000000]   B1-H  262144
16               B1-W    32    1  [0.000000, 0.000000]   B1-W  262144

Full Meta Data

In addition to the raw data, the FCS files contain “meta” information about the measurements, which may be useful for data analysis. You can access this information using the meta property, which will return to you a python dictionary. The meaning of the fields in meta data is explained in the FCS format specifications (google for the specification).

In [9]: print type(sample.meta)
<type 'dict'>

In [10]: print sample.meta.keys()
[u'$ETIM', u'$ENDDATA', u'$BYTEORD', u'$SRC', '_channels_', u'$SYS', u'$PAR', u'$ENDANALYSIS', u'$CYT', '__header__', u'$BTIM', u'$OP', u'$BEGINANALYSIS', u'$BEGINDATA', u'$ENDSTEXT', u'$CYTSN', u'$DATE', u'$MODE', u'$FIL', u'$BEGINSTEXT', u'$NEXTDATA', u'$CELLS', u'$TOT', '_channel_names_', u'$DATATYPE']

In [11]: print sample.meta['$SRC']
A3

Accessing Raw Data

The data is accessible as a pandas.DataFrame.

In [12]: print type(sample.data)
<class 'pandas.core.frame.DataFrame'>

In [13]: print sample.data[['Y2-A', 'FSC-A']][:10]
          Y2-A        FSC-A
0   109.946274   459.962982
1  5554.108398  -267.174652
2    81.835281  -201.582336
3   -54.120304   291.259888
4   -10.542595  -397.168579
5    34.791252  -213.151566
6  6132.993164   -25.696339
7    33.034771    61.271519
8  6688.348633  1462.823608
9    14.344463   490.025482

If you do not know how to work with DataFrames, you can get the underlying numpy array using the values property.

In [14]: print type(sample.data.values)
<type 'numpy.ndarray'>

In [15]: print sample.data[['Y2-A', 'FSC-A']][:10].values
[[  109.9463   459.963 ]
 [ 5554.1084  -267.1747]
 [   81.8353  -201.5823]
 [  -54.1203   291.2599]
 [  -10.5426  -397.1686]
 [   34.7913  -213.1516]
 [ 6132.9932   -25.6963]
 [   33.0348    61.2715]
 [ 6688.3486  1462.8236]
 [   14.3445   490.0255]]

Of course, working directly with a pandas DataFrame is very convenient. For example,

In [16]: data = sample.data

In [17]: print data['Y2-A'].describe()
count    10000.000000
mean      2540.119141
std       3330.651123
min       -144.069305
25%         10.865871
50%         82.575161
75%       5133.238525
max      68489.953125
Name: Y2-A, dtype: float64

Want to know how many events are in the data?

In [18]: print data.shape[0]
10000

Want to calculate the median fluorescence level measured on the Y2-A channel?

In [19]: print data['Y2-A'].median()
82.5751647949

Transformations

The presence of both very dim and very bright cell populations in flow cytometry data can make it difficult to simultaneously visualize both populations on the same plot. To address this problem, flow cytometry analysis programs typically apply a transformation to the data to make it easy to visualize and interpret properly.

Rather than having this transformation applied automatically and without your knowledge, this package provides a few of the common transformations (e.g., hlog, tlog), but ultimately leaves it up to you to decide how to manipulate your data.

As an example, let’s apply the ‘hlog’ transformation to the Y2-A, B1-A and V2-A channels, and save the transformed sample in tsample.

In [20]: tsample = sample.transform('hlog', channels=['Y2-A', 'B1-A', 'V2-A'], b=500.0)

In the hlog transformation, the parameter b controls the location where the transformation shifts from linear to log. The optimal value for this parameter depends on the range of your data. For smaller ranges, try smaller values of b. So if your population doesn’t show up well, just adjust b.

For more information see FCMeasurement.transform(), FlowCytometryTools.core.transforms.hlog().

Throughout the rest of the tutorial we’ll always be working with hlog-transformed data.

Note

For more details see the API documentation section.

You can read more about transformations here:
  • Bagwell. Cytometry Part A, 2005.
  • Parks, Roederer, and Moore. Cytometry Part A, 2006.
  • Trotter, Joseph. In Current Protocols in Cytometry. John Wiley & Sons, Inc., 2001.

Compensation and custom transformations

If you want to compensate your data or perform a custom transformation there’s an example in the gallery.

HINT: It’s very easy!

Plotting

Let’s import matplotlib’s plotting functions.

In [21]: from pylab import *

For more information see FCMeasurement.plot().

1d histograms

In [22]: figure();

In [23]: tsample.plot('Y2-A', bins=100);
_images/1d_plot.png
In [24]: figure();

In [25]: grid(True)

In [26]: tsample.plot('Y2-A', color='green', alpha=0.7, bins=100);
_images/1d_plot_prettier.png

2d histograms

If you provide the plot function with the names of two channels, you’ll get a 2d histogram.

In [27]: figure();

In [28]: tsample.plot(['B1-A', 'Y2-A']);
_images/2d_plot_A.png

As with 1d histograms, the plot function accepts all the same arguments as does the matplotlib histogram function.

In [29]: figure();

In [30]: tsample.plot(['B1-A', 'Y2-A'], cmap=cm.Oranges, colorbar=False);
_images/2d_plot_B.png

2d scatter plots

To create a scatter plot, provide the argument kind='scatter to the plot command.

This makes the plot function behave like the matplotlib scatter function (and accept all the same arguments).

In [31]: figure();

In [32]: tsample.plot(['B1-A', 'Y2-A'], kind='scatter', color='red', s=1, alpha=0.3);
_images/2d_scatter_plot_A.png

Gating

See the API for a list of available gates.

Let’s import a few of those gates.

In [33]: from FlowCytometryTools import ThresholdGate, PolyGate

Creating Gates

Programmatically

To create a gate you need to provide the following information:

  • set(s) of coordinates
  • name(s) of channel(s)
  • region

Let’s create a ThresholdGate that passes events with a Y2-A (RFP) value above the 1000.0.

In [34]: y2_gate = ThresholdGate(1000.0, 'Y2-A', region='above')

While we’re at it, here’s another gate for events with a B1-A value (CFP) above 2000.0.

In [35]: b1_gate = ThresholdGate(2000.0, 'B1-A', region='above')

Using the GUI

You can launch the GUI for creating the gates, by calling the view_interactively() method of an FCMeasurement instance.

>>> tsample.view_interactively()

Warning

wxpython must be installed in order to use the GUI. (It may already be installed, so just try it out.)

Please note that there’s no way to apply transformations through the GUI, so view samples through the view_interactively method. (Don’t load new FCS files directly from the GUI, if you need those files transformed.)

Plotting Gates

In [36]: figure();

In [37]: tsample.plot('Y2-A', gates=[y2_gate], bins=100);

In [38]: title('Gate Plotted');
_images/gate_A_plot.png

Applying Gates

In [39]: gated_sample = tsample.gate(y2_gate)

In [40]: print gated_sample.get_data().shape[0]
4433

The gated_sample is also an instance of FCMeasurement, so it supports the same routines as does the ungated sample. For example, we can plot it:

In [41]: figure();

In [42]: gated_sample.plot('Y2-A', color='y', bins=100);

In [43]: title('Gated Sample');
_images/gated_sample_only_plot.png

Let’s compare the gated and ungated side by side.

In [44]: figure();

In [45]: subplots_adjust(hspace=0.4)

In [46]: ax1 = subplot(211);

In [47]: tsample.plot('Y2-A', color='gray', bins=100, gates=[y2_gate]);

In [48]: title('Original Sample');

In [49]: ax2 = subplot(212, sharey=ax1, sharex=ax1);

In [50]: gated_sample.plot('Y2-A', color='y', bins=100, gates=[y2_gate]);

In [51]: title('Gated Sample');
_images/gate_D_plot.png

Combining gates

Gates can be inverted and combined together. See FlowCytometryTools.core.gates.CompositeGate.

Working with plates

For high-throughput analysis of flow cytometery data, loading single FCS files is silly. Don’t be silly!

Instead, we provided you with the awesome FCPlate class (also called FCOrderedCollection).

Loading Data

The code below loads two FCS samples and shows two different methods of placing them on a plate container.

In [52]: from FlowCytometryTools import FCPlate

In [53]: sample1 = FCMeasurement('B1', datafile=datafile);

In [54]: sample2 = FCMeasurement('D2', datafile=datafile);

# Use the samples ID as their position on a plate:
In [55]: plate1 = FCPlate('demo plate', [sample1, sample2], 'name', shape=(4, 3))

In [56]: print plate1
ID:
demo plate

Data:
    1   2 3
A          
B  B1      
C          
D      D2  

Alternatively, one can specify the sample positions independent of the samples IDs using a dictionary:

In [57]: plate2 = FCPlate('demo plate', {'C2' : sample1, 'C3' : sample2}, 'name', shape=(4, 3))

In [58]: print plate2
ID:
demo plate

Data:
  1   2   3
A          
B          
C    B1  D2
D          

In [59]: print plate2['C2']
<FCMeasurement 'B1'>

Having overwritten the default mapping based on the samples ID, we now have to access sample1 using the key C2. So the command plate['C2'] should return sample1 (whose ID should still remain B1).

In [60]: plate3 = FCPlate('demo plate', {3 : sample1, 4 : sample2}, 'row_first_enumerator', shape=(4, 3))

In [61]: print plate3
ID:
demo plate

Data:
    1 2 3
A        
B        
C  B1    
D  D2    

In [62]: print plate3[3]
<FCMeasurement 'B1'>

Note

Below we show a much faster way of loading data into a plate format. This method should make life simpler for most users by removing a bunch of boilerplate code. So keep on reading!

Loading Data (better way)

What we’d like to do is to construct a flow cytometry plate object by loading all the ‘*.fcs’ files in a directory.

To do that, FCPlate needs to know where on the plate each file belongs; i.e., we need to tell it that the file ‘My_awesome_data_Well_C9.fcs’ corresponds to well ‘C9’, which corresponds to the third row and the ninth column on the plate.

The process by which the filename is mapped to a position on the plate is the following:

  1. The filename is fed into a parser which extracts a key from it. For example, given ‘My_awesome_data_Well_C9.fcs’, the parser returns ‘C9’
  2. The key is fed into a position mapper which returns a matrix position. For example, given ‘C9’ the position_mapper should return (2, 8). The reason it’s (2, 8) rather than (3, 9) is because counting starts from 0 rather than 1. (So ‘A’ corresponds to 0.)

Here’s a summary table that’ll help you decide what to do (depending on your file name format):

File Name Format parser Comments
‘[whatever]_Well_C9_[blah].fcs’ ‘name’  
‘[blah]_someothertagC9_[blah].fcs’ ‘name’ But to use must modify ID_kwargs (see API) for details
‘[whatever].025.fcs’ ‘number’ 001 -> A1, 002 -> B1, ... etc. i.e., advances by row first
Some other naming format dict dict with keys = filenames, values = positions
Some other naming format function Provide a function that accepts a filename and returns a position (e.g., ‘C9’)

For the analysis below, we’ll be using the ‘name’ parser since our files match that convention. OK. Let’s try it out:

In [63]: from FlowCytometryTools import FCPlate

In [64]: plate = FCPlate.from_dir(ID='Demo Plate', path=datadir, parser='name')

In [65]: plate = plate.transform('hlog', channels=['Y2-A', 'B1-A'])

Let’s see what data got loaded.

In [66]: print plate
ID:
Demo Plate

Data:
  1  2   3   4  5   6   7  8  9  10 11 12
A        A3  A4     A6  A7               
B        B3  B4                          
C                       C7               
D                                        
E                                        
F                                        
G                                        
H                                        

And just to confirm, let’s look at the naming of our files to see that parser='name' should indeed work.

In [67]: import os

In [68]: print os.path.basename(plate['A3'].datafile)
RFP_Well_A3.fcs

This plate contains many empty rows and columns. The method FCPlate.dropna() allow us to drop empty rows and columns for convenience:

In [69]: plate = plate.dropna()

In [70]: print plate
ID:
Demo Plate

Data:
    3   4   6   7
A  A3  A4  A6  A7
B  B3  B4        
C              C7

Plotting

Please import matplotlib’s plotting functions if you haven’t yet...

In [71]: from pylab import *

For more information see FCPlate.plot().

In [72]: figure();

In [73]: plate.plot('Y2-A', bins=100);
_images/plate_plot_A.png
In [74]: figure();

In [75]: plate.plot(('B1-A', 'Y2-A'), bins=100, wspace=0.2, hspace=0.2);
_images/plate_plot_B.png

Accessing Single Wells

FCPlate supports indexing to make it easier to work with single wells.

In [76]: figure();

In [77]: print plate['A3']
<FCMeasurement 'A3'>

In [78]: plate['A3'].plot('Y2-A', bins=100);
_images/plate_indexing_A.png

Examples

Counting using the counts method

Use the counts method to count total number of events.

In [79]: total_counts = plate.counts()

In [80]: print total_counts
       3      4      6      7
A  10000  10000  10000  10000
B  10000  10000    NaN    NaN
C    NaN    NaN    NaN  10000

Let’s count the number of events that pass a certain gate:

In [81]: y2_counts = plate.gate(y2_gate).counts()

In [82]: print y2_counts
      3   4    6   7
A  4433  50  420   6
B  5496   7  NaN NaN
C   NaN NaN  NaN   0

What about the events that land outside the y2_gate?

In [83]: outside_of_y2_counts = plate.gate(~y2_gate).counts()

In [84]: print outside_of_y2_counts
      3     4     6      7
A  5567  9950  9580   9994
B  4504  9993   NaN    NaN
C   NaN   NaN   NaN  10000

Counting on our own

To learn how to do more complex calculations, let’s start by writing our own counting function. At the end of this, we better get the same result as was produced using the counts method above.

Here’s the function:

In [85]: def count_events(well):
   ....:     """ Counts the number of events inside of a well. """
   ....:     data = well.get_data()
   ....:     count = data.shape[0]
   ....:     return count
   ....: 

Let’s check that it works on a single well.

In [86]: print count_events(plate['A3'])
10000

Now, to use it in a calculation we need to feed it to the apply method. Like this:

In [87]: print plate['A3'].apply(count_events)
10000

The apply method is a functional, which is just a fancy way of saying that it accepts functions as inputs. If you’ve got a function that can operate on a well (like the function count_events), you can feed it into the apply method. Check the API for the apply method for more details.

Anyway, the apply method is particularly useful because it works on plates.

In [88]: total_counts_using_our_function = plate.apply(count_events)

In [89]: print type(total_counts_using_our_function)
<class 'pandas.core.frame.DataFrame'>

In [90]: print total_counts_using_our_function
       3      4      6      7
A  10000  10000  10000  10000
B  10000  10000    NaN    NaN
C    NaN    NaN    NaN  10000

Holy Rabbit!

First, for wells without data, a nan was produced. Second, the output that was returned is a DataFrame.

Also, as you can see the total counts we computed (total_counts_using_our_function) agrees with the counts computed using the counts methods in the previous example.

Now, let’s apply a gate and count again.

In [91]: print plate.gate(y2_gate).apply(count_events)
      3   4    6   7
A  4433  50  420   6
B  5496   7  NaN NaN
C   NaN NaN  NaN   0

Calculating median fluorescence

Ready for some more baking?

Let’s calculate the median Y2-A fluorescence.

In [92]: def calculate_median_rfp(well):
   ....:     """ Calculates the median on the RFP channel. """
   ....:     data = well.get_data()
   ....:     return data['Y2-A'].median()
   ....: 

The median rfp fluorescence for all events in well ‘A3’.

In [93]: print calculate_median_rfp(plate['A3'])
303.091125488

The median rfp fluorescence for all events in the plate is:

In [94]: print plate.apply(calculate_median_rfp)
             3           4           6           7
A   303.091125  105.875381  136.326202  104.436157
B  4701.629395  126.765717         NaN         NaN
C          NaN         NaN         NaN   97.277603

The median rfp fluorescence for all events that pass the y2_gate is:

In [95]: print plate.gate(y2_gate).apply(calculate_median_rfp)
             3            4            6            7
A  6573.452148  6240.961426  9346.646484  9294.400391
B  6544.949219  6413.048340          NaN          NaN
C          NaN          NaN          NaN          NaN