In support of "Gamma-crystallins of the chicken lens: remnants of an ancient vertibrate gene family in birds"

authors: Chen, Sagar, Len, Peterson, Fan, Mishra, McMurtry, Wilmarth, David, and Wistow
December 2015

David_120114 Chicken Lens Proteome
prepared by Phil Wilmarth, David Lab, OHSU, 11/26/2015
Descriptions of worksheet tabs and contents:
Protein Intensities
Protein Counts
Peptide Counts
Soluble Peptides
Insoluble Peptides
Ensemble Matches
Crystallins



Settings, Parameters, and Stats

Sample:
Project code
DAVID_chicken_soluble
DAVID_chicken_insoluble
Species
Gallus gallus
Gallus gallus
Cell Type
eye lens fiber cells
eye lens fiber cells
Mass Spec:
Instrument
Thermo Orbitrap Fusion
Thermo Orbitrap Fusion
LC System
Thermo Unitimate 3000
Thermo Unitimate 3000
LC Duration (min)
120
120
Capillary Temperature (degree C)
275
275
Ion Source
Thermo Easy-Spray
Thermo Easy-Spray
Source potential
2.4 kV
2.4 kV
Lock mass
yes (user defined mass)
yes (user defined mass)
Full m/z range (Th)
400 - 1600
400 - 1600
Full Resolution
120,000
120,000
Full AGC Target
400,000
400,000
Full Maximum Inject Time (millisec)
50
50
Number of dependent MS2 scans
Top N, 3 sec cycle
Top N, 3 sec cycle
MS2 collision type
HCD
HCD
MS2 AGC
10,000
10,000
MS2 Maximum Inject Time (millisec)
35
35
Isolation Window (th)
2
2
Minimum Ion intensity
50,000
50,000
Normalized Collision Energy
30
30
MS2 m/z range (Th)
120 - 2000
120 - 2000
Rejeced Charge States
1+ and greater than 7+
1+ and greater than 7+
MIPS filter
on
on
Exclusion Repeat Count
1
1
Exclusion mass low (PPM)
10
10
Exclusion mass high (PPM)
10
10
Exclusion List size
500?
500?
Exclusion Duration (second)
30
30
FASTA protein database:
Database source
UniProt (proteome_UP000000539)
UniProt (proteome_UP000000539)
Download date
12-Dec-14
12-Dec-14
Number of protein sequences
17,623
17,623
Include common contaminants
yes (179 sequences)
yes (179 sequences)
Decoy sequence type
reversed
reversed
Database preparation software
Peak list creation:
MSConvert version
3.0.6618
3.0.6618
Minimum ion count
15
15
Minimum intensity (absolute)
100
100
MSn level
2
2
Number of MS2 scans
64,121
101,530
Search engine:
Comet version
2014.02 rev. 2
2014.02 rev. 2
Database
uniprot-proteome_UP000000539_chick_both.fasta
uniprot-proteome_UP000000539_chick_both.fasta
Parent ion mass tolerance
1.25
1.25
Parent ion mass type
monoisotopic
monoisotopic
Search enzyme
trypsin strict (KR not if P)
trypsin strict (KR not if P)
Fully tryptic termini
yes
yes
Maximum missed cleavages
2
2
Variable modifications
M+15.9949
M+15.9949
Maximum number of variable mods per peptide
3
3
Fargment ion tolerance
1.0005
1.0005
Fragment ion bin offset
0.4
0.4
Fragment ion mass type
monoisotopic
monoisotopic
Ion series used in scoring
y, b, NL
y, b, NL
Maximum parent ion charge state
4
4
Maximum fragment ion charge state
3
3
Theoretical digest mass range
600 - 5000
600 - 5000
Static modifications
C+57.0215
C+57.0215
Peptide to protein mapping:
Software
PAW pipeline (in-house)
PAW pipeline (in-house)
Number scans passing score thresholds
29,913
47,308
Number decoys passing score thresholds
369
560
PSM FDR (scan-based)
1.20%
1.20%
Parsimony analysis
basic subset removal
basic subset removal
Minimum peptides per protein
2
2
Minimum unique peptides per protein
0
0
Number of identified proteins (non-redundant, without grouping)
1078 (39 contaminants)
1759 (54 contaminants)
Number of decoy proteins (non-redundant, without grouping)
6
12
Preliminary protein FDR
0.60%
0.50%
Protein grouping method
extended parsimony (in-house software)
extended parsimony (in-house software)


Table Descriptions


Protein Intensities Table

Header Description
ProtGroup arbitrary protein group number (groups have more than one member for redundant proteins or homologous families)
Counter column used to count visible rows in results files using Excel
Accession protein database accession string
Identical accessions of all redundant proteins (indistinguishable peptide sets)
Similar accessions of all proteins having sufficiently similar peptide sets (homologous families)
OtherLoci list of all protein accessions having any shared peptides in common with respective protein
Link hyperlink Excel function to UniProt protein database
Filter text column for flagging contaminants, decoys, etc.
Sort column for controlling row order
Coverage protein sequence coverage in %
SeqLength number of amino acids in protein database entry
MW calculated molecular weight (average mass in Da) for protein database entry
Description protein database description string
Gene UniProt gene name
CountsTot grand total of all intensities (shared and unique) across all N samples
UniqueTot grand total of all unique intensities across all N samples
UniqFrac ratio of UniqueTot to CountsTot (small fractions indicate many shared peptides)
DAVID_chicken_in_20141201 total peptide intensities for water-insoluble fraction
DAVID_chicken_sol_20141202 total peptide intensities for water-insoluble fraction
DAVID_chicken_insol_20141201 total unique peptide intensities for water-insoluble fraction
DAVID_chicken_sol_20141202 total unique peptide intensities for water-insoluble fraction
DAVID_chicken_insol_20141201 total corrected peptide intensities for water-insoluble fraction (shared peptide intensities fractionally split based on relative unique intensities)
DAVID_chicken_sol_20141202 total corrected peptide intensities for water-insoluble fraction (shared peptide intensities fractionally split based on relative unique intensities)
AccessionClean used to lookup Ensembl information
query_acc full UniProt accession key (accession, identifier, and database source)
query_desc full UniProt protein description string
hit_acc Ensembl database accession
hit_desc Ensembl database description
blast_scores BLAST alignment scores between UniProt and Ensembl best matches
status BLAST alignment status (OK, partial matches, or no match)


Protein Counts Table-- protein summary after homology grouping

Header Description
ProtGroup arbitrary protein group number (groups have more than one member for redundant proteins or homologous families)
Counter column used to count visible rows in results files using Excel
Accession protein database accession string
Identical accessions of all redundant proteins (indistinguishable peptide sets)
Similar accessions of all proteins having sufficiently similar peptide sets (homologous families)
OtherLoci list of all protein accessions having any shared peptides in common with respective protein
Link not used
Filter text column for flagging contaminants, decoys, etc.
Coverage protein sequence coverage in %
SeqLength number of amino acids in protein database entry
MW calculated molecular weight (average mass in Da) for protein database entry
Description protein database description string
Gene UniProt gene name
CountsTot grand total of all counts (shared and unique) across all N samples
UniqueTot grand total of all unique counts across all N samples
UniqFrac ratio of UniqueTot to CountsTot (small fractions indicate many shared peptides)
DAVID_chicken_insol_20141201 total peptide counts for water-insoluble fraction
DAVID_chicken_sol_20141202 total peptide counts for water-insoluble fraction
DAVID_chicken_insol_20141201 total unique peptide counts for water-insoluble fraction
DAVID_chicken_sol_20141202 total unique peptide counts for water-insoluble fraction
DAVID_chicken_insol_20141201 total corrected peptide counts for water-insoluble fraction (shared peptide counts fractionally split based on relative unique counts)
DAVID_chicken_sol_20141202 total corrected peptide counts for water-insoluble fraction (shared peptide counts fractionally split based on relative unique counts)



Peptide Counts Table-- peptide summary (including counts per sample)

Header Description
ProtGroup arbitrary protein group number (groups have more than one member for redundant proteins or homologous families). Cross references with protein reports.
Accession protein database accession string
Sequence peptide sequence (M* denotes oxidized Met)
Begin beginning amino acid residue number
End ending amino acid number
Unique indicates whether peptide is matched to a single proptein or not
NTT number of peptide termini consistent with tryptic cleavage
Z peptide charge state
TotCount total number of MS2 spectra mapped to peptide in the respective charge state
DAVID_chicken_insol_20141201 total count in the insoluble fraction
DAVID_chicken_sol_20141202 total count in the soluble fraction
OtherLoci list of all protein accessions having any shared peptides in common with respective protein



Soluble Peptide Details-- more detailed information on the peptide identifications

Header Description
ProtGroup arbitrary protein group number (groups have more than one member for redundant proteins or homologous families). Cross references with protein reports.
Accession protein database accession string
Sequence peptide sequence (M* denotes oxidized Met)
Unique indicates whether peptide is matched to a single proptein or not
TotCount total number of MS2 spectra mapped to peptide in the respective charge state
NTT number of peptide termini consistent with tryptic cleavage
XCorr SEQUEST score for the top-scoring match to this peptide sequence
DeltaCN delta XCorr score for the top-scoring match to this peptide sequence
SpRank preliminary scoring rank for the top scoring match to this peptide
NewDisc discrimiant function score for the top scoring match to this peptide
Z peptide charge state
Delta_Mass difference between experimental and calculated masses for the top scoring match to this peptide
Exp_Mass experimentally measured mass for the top scoring match to this peptide
Calc_Mass calculated mass for this peptide sequence
DTA_filename the equivalent DTA name for the top scoring match to this peptide



Insoluble Peptide Details-- more detailed information on the peptide identifications

Header Description
ProtGroup arbitrary protein group number (groups have more than one member for redundant proteins or homologous families). Cross references with protein reports.
Accession protein database accession string
Sequence peptide sequence (M* denotes oxidized Met)
Unique indicates whether peptide is matched to a single proptein or not
TotCount total number of MS2 spectra mapped to peptide in the respective charge state
NTT number of peptide termini consistent with tryptic cleavage
XCorr SEQUEST score for the top-scoring match to this peptide sequence
DeltaCN delta XCorr score for the top-scoring match to this peptide sequence
SpRank preliminary scoring rank for the top scoring match to this peptide
NewDisc discrimiant function score for the top scoring match to this peptide
Z peptide charge state
Delta_Mass difference between experimental and calculated masses for the top scoring match to this peptide
Exp_Mass experimentally measured mass for the top scoring match to this peptide
Calc_Mass calculated mass for this peptide sequence
DTA_filename the equivalent DTA name for the top scoring match to this peptide



Ensemble Matches

Header Description
Ident UniProt identifier (lookup key)
query_number arbitrary query number during BLAST run
query_acc UniProt accession
query_desc UniProt description string
hit_acc Ensembl accession for top BLAST match
hit_desc Ensembl description string for top BLAST match
blast_scores a composite string with various BLAST match scores
status text column designating quality of BLAST match based on set of matches
Acc UniProt stable accession key
Ident UniProt more informative identifier key



Crystallins

Header Description
Accession protein database accession string
Coverage protein sequence coverage in %
Length number of amino acids in protein database entry
UniProt Description UniProt protein database accession string
Ensembl Description Ensembl protein database accession string
Gene UniProt gene name
Water-Insoluble (intensities) water-insoluble lens fraction (raw intensities)
Water-Soluble (intensities) water-soluble lens fraction (raw intensities)
Water-Insoluble (relative abundances) water-insoluble lens fraction (relative crystallin abundances using raw intensities)
Water-Soluble (relative abundances) water-soluble lens fraction (relative crystallin abundances using raw intensities)
Water-Insoluble (length normalized) water-insoluble lens fraction (raw intensities normalized by sequence length)
Water-Soluble (length normalized) water-soluble lens fraction (raw intensities normalized by sequence length)
Water-Insoluble (relative abundances) water-insoluble lens fraction (relative crystallin abundances using raw intensities normalized by sequence length)
Water-Soluble (relative abundances) water-soluble lens fraction (relative crystallin abundances using raw intensities normalized by sequence length)