==============================================
3자리문자 - 3자리 문자의 조합이라면
group by unicode 를 아래와 같이
group by left(unicode,3)
Posted by gwlee
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/34
출처: vicjung
'GRANT' 명령을 이용하면 쉽게 사용자 추가 및 권한 설정이 가능합니다. (MySQL Manual 4.3.5)
mysql> grant all privileges on dbuser.* to dbuser@localhost
identified by 'password' with grant option;
mysql> grant all privileges on `dbuser_%`.* to dbuser@localhost
identified by 'password' with grant option;
'dbuser_'으로 시작되는 데이터베이스에 대한 모든 권한을 가지는 'dbuser' 사용자 계정 추가%%% 이런식으로 계정을 만들면 새로운 사용자에 대한 개별적인 데이터베이스 권한 설정을 생략할 수 있습니다. -- 이현진
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/32
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/30
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/25
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/21
ALTER TABLE tablename ENGINE = MyISAM;Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/13
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/11
자료출처: http://ddpk.blogspot.com/2008/06/protein-databases.html
PROTEIN DATABASES
Protein databases are more specialized than primary sequence databases.
They contain information derived from the primary sequence databases.
Some contain protein translations of the nucleic acid sequences.
Some contain sets of patterns and motifs derived from sequence homologs.
GenBank - the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
PIRProtein Information Resource -a comprehensive, non-redundant, expertlyannotated, fully classified and extensively cross-referenced proteinsequence database.
SWISS-PROT & TrEMBL- SWISS-PROT is a curated protein sequence database. is acomputer-annotated supplement of SWISS-PROT that contains all thetranslations of EMBL nucleotide sequence entries not yet integrated inSWISS-PROT.
TIGR - a collection of curated databases containing DNA and protein sequence, gene expression, cellular role, protein family, and taxonomic data for microbes, plants and humans.
MOTIF, PATTERN & PROFILE DATABASES
ALIGN - a compendium of sequence alignments: it is a companion resource to PRINTS.
BLOCKS - multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
DOMO - a database of homologous protein domain families.
HOMSTRAD - a curated database of structure-based alignments for homologous protein families.
InterPro- Integrated Resource of Protein Domains and Functional Sites- InterPro is an integrated documentation resource for proteinfamilies, domains and sites, developed initially as a means ofrationalising the complementary efforts of the PROSITE, PRINTS, Pfamand ProDom database projects. Each combined InterPro entry includesfunctional descriptions and literature references, and links are madeback to the relevant member database(s), allowing users to see at aglance whether a particular family or domain has associated patterns,profiles, fingerprints, etc. Merged and individual entries (i.e., thosethat have no counterpart in the companion resources) are assignedunique accession numbers. Each InterPro entry lists all the matchesagainst SWISS-PROT and TrEMBL (more than 1,000,000 hits intotal).InterPro aims to reduce duplication of effort in thelabour-intensive, rate-limiting process of annotation, and willfacilitate communication between the disparate resources. By unitingthese databases, we capitalise on their individual strengths, producinga single entity that is far greater than the sum of its parts.
PFam- a database of multiple alignments of protein domains or conservedprotein regions. The alignments represent some evolutionary conservedstructure which has implications for the protein's function. Profilehidden Markov models (profile HMMs) built from the Pfam alignments canbe very useful for automatically recognizing that a new protein belongsto an existing protein family, even if the homology is weak.
PRINTS ñ Protein Fingerprint Database - a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family.
PRINTS-S ñ relational cousin of the PRINTS Database
ProDom- an automatic compilation of homologous domains.ProDom families weregenerated automatically using PSI-BLAST with a profile built from theseed alignments of Pfam-A 4.3 families.
ProSite - is a database of protein families and domains
consisting of biologically significant sites, patterns and profiles.
Protein Profiles - online cross-references to the Oxford University Press Protein Profiles project.
ProtoMap- site offers an exhaustive classification of all the proteins in theSWISSPROT and TrEMBL databases, into groups of related proteins.Theresulting classification splits the protein space into well definedgroups of proteins, most of them are closely correlated with naturalbiological families and superfamiliesfor comprehensive evaluationresults). The hierarchical organization may help to detect finersubfamilies that make up known families of proteins as well asinteresting relations between protein families.
SBASE- protein domain library sequences that contains 237.937 annotatedstructural, functional, ligand-binding and topogenic segments ofproteins, cross-referenced to all major sequence databases and sequencepattern collections.
SYSTERS- SYSTERS cluster set contains sequences from SWISS-PROT , TrEMBL, PIR,Wormpep, and MIPS Yeast protein translations which are sorted intodisjoint clusters. fragmental sequences build single sequence clusters,while the remaining sequences are contained in clusters ofnon-redundant sequences per cluster.
PROTEIN STRUCTURE DATABASES
CATH Protein Structure Classification ñ a hierarchical domain classification of protein structures in the Brookhaven protein databank.
FSSP Fold Classification based on Structure-Structure Alignment of Proteins - based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB).
Library of Protein Family Cores- structural alignments of protein families and computed average corestructures for each family.Useful for building models, threading, andexploratory analysis.
ModBase a database of three-dimensional protein models calculated by comparative modeling.
PRESAGE- a database of proteins, each of which has a collection of annotationsreflecting current experimental status, structural assignments models,and suggestions.
RCSB Protein Data Bank- single international repository for the processing and distributionof 3-D macromolecular structure data primarily determinedexperimentally.
Protein Loop Classification - Conformational clusters and consensus sequences for protein loops derived by computational analysis of their structures.
SCOP ñ Structural Classification of Proteins- a detailed and comprehensive description of the structural andevolutionary relationships between all proteins whose structure isknown.
Sloop Database ñ Sloop Database of Super Secondary Fragments - a classification of protein loops.
3 Dee ñ Database of Protein Domain Definitions- contains structural domain definitions for all protein chains in theProtein Databank (PDB)that have 20 or more residues and are nottheoretical models.
GENOMES
DEAMBULUM ñ contains the GENOMES: Viruses, Archaea,Bacteria, Fungi, Plants, Animals, and Man.
FlyBase - a comprehensive database for information on the genetics and molecular biology of Drosophila. It includes data from the Drosophila Genome Projects and data curated from the literature.
GeneCards - database of human genes, their products and their involvement in diseases.
GenDis ñ Human Genetic Disease Database
Genome Database- Regions of the human genome, including genes, clones, amplimers (PCRmarkers), breakpoints, cytogenetic markers, fragile sites, ESTs,syndromic regions, contigs and repeats. Maps of the human genome,including cytogenetic maps, linkage maps, radiation hybrid maps,content contig maps, and integrated maps. These maps can be displayedgraphically via the Web.Variations within the human genome includingmutations and polymorphisms, plus allele frequency data.
KEGG: Kyoto Encyclopedia of Genes and Genomes- information pathways that consist of interacting molecules or genesand to provide links from the gene catalogs produced by genomesequencing projects.
PROTEOME ñ The BioKnowledge Library of Public Human PSD, Caenorhabditis elegans (WormPD), Saccharomyces cerevisiae (YPD) and S. pombe (PombePD).
Saccharomyces Genome Database - a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae.
WhiteHead Institute for Genomic Researchñ information on the Neurospora crassa Genome Database, Human SNPDatabase, Human Physical Mapping Project, Mouse Genetic and PhysicalMapping Project,Rat Genetic Mapping Project, Mouse RH Mapping Project,Genome Center ftp Archive (Data)
WORMBASE - a repository of mapping, sequencing and phenotypic information about the C. elegans nematode
TRANSCRIPTIONAL REGULATION DATABASES & ALGORITHMS
COMPEL - Database on composite regulatory elements affecting gene transcription in eukaryotes.
EDPñ Eukaryotic Promoter Database - an annotated non-redundant collectionof eukaryotic POL II promoters, for which the transcription start sitehas been determined experimentally.
RegulonDB ñ A database on transcriptional regulation in Escherichia coli.
TRANSFAC ñ The Transcription Factor Database
TRDD ñ Transcription Regulatory Region Database
FastM and ModelInspector A program for the generation of models for regulatory regions in DNA sequences.
FunSiteP - Recognition and classification of eukaryotic promoters.
PatSearch Search for potential transcription factor binding sites.
Promoter Inspector - Prediction of promoter regions in mammalian genomic sequences.
Mat Inspector - Search for potential transcription factor binding sites.
RSATools Regulatory Sequence Analysis Tools
S_Compsearch for NFATp/AP-1 Comp. Elements
TRADAT- TRAnscription Databases and Analysis Tools
2Zip - Computational Approaches to Identify Leucine Zippers
OTHER
BIND - full descriptions of interactions, molecular complexes and pathways.
BioMagRes Bank ñ NMR-derived protein structures.
Cytomer ñ A relational database of physiological systems, organs and cell types.
ENZYME ñ Enzyme Nomenclature Database
Enzyme Structures Database - contains the known enzyme structures that have been deposited in the Brookhaven Protein Data Bank (the PDB).
Gene Ontology Consortium ñ attempts to produce a dynamic controlled vocabulary that can be applied to all eukaryotes.
Human Transcript Database a curated source for information related to RNA molecules that have been sequenced.
LIGAND- Database for enzymes, compounds, and reactions.
Metabolic Pathways of Biochemistry - graphically represents all major metabolic pathways, primarily those important to human biochemistry.
NDB ñ Nucleic Acid Database Project - assembles and distributes structural information about nucleic acids.
PMDñ Protein Mutant Database - covers natural as well as artificialmutants, including random and site-directed ones, for all proteinsexcept members of the globin and immunoglobulin families.
REBaseñ Restriction Enzyme Database ñ contains detailed information aboutrestriction enzymes, methylases, the microorganisms from which theyhave been isolated, recognition sequences, cleavage sites, methylationspecificity, the commercial availability of the enzymes, and references.
Radar ñ Rapid Automatic Detection and Alignment of Repeats
in protein sequences.
rRNA Database ñ all about ribosomal RNA.
S/MARtDB - information about scaffold/matrix attached regions.
TargetDB -database of peptides targeting proteins to cellular locations.
Transpathñ Signal Transduction Browser - an information system ongene-regulatory pathways. Focuses on pathways involved in theregulation of transcription factors in different species, mainly human,mouse and rat.Elements of the relevant signal transduction pathwayslike hormones, receptors, enzymes and transcription factors are storedtogether with information about their interaction and references in anobject-oriented database.
TOOLS
CLUSTALW ñ Multiple sequence alignment tool
ProteinProspector - Proteomics tools for mining sequence databases in conjunction with Mass Spectrometry experiments.
ReBASE Information Tool -ReBASE query tool.
SeqHound ñ database sequence fetch program.
SignalP- predicts the presence and location of signal peptide cleavage sitesin amino acid sequences from different organisms: Gram-positiveprokaryotes, Gram-negative prokaryotes, and eukaryotes.
SIMILARITY, HOMOLOGY SEARCH
Thesealgorithms are designed for the comparison of a protein sequenceagainst sequence databases to detect similar or homologousproteins.Conserved regions usually have similar amino acid sequenceand/or structural similarities.Perform at least three separate searchesusing different algorithms.If default settings do not detect anysimilar proteins, try varying the PAM matrix values.Lower matrix valuesare best for identifying short regions of sequence with very highsimilarity. Higher PAM matrices are able to detect longer, weakermatches.Simultaneously, adjust the gap penalty value around the defaultvalue.
BLAST-The BLAST programs have been designed for speed, with a minimalsacrifice of sensitivity to distant sequence relationships.The BLASTsearch algorithm is designed to find close matches rapidly. It isfaster than the S-W algorithm.
BLITZperformsa sensitive and extremely fast comparison of a protein sequence againstthe SWISS-PROT protein sequence database using the Smith-Watermanalgorithm.The Smith-Waterman algorithm is able to detect short matchingregions such as binding sites in the middle of long sequences.
Bic-sw - Smith & Waterman algorithm implementation for protein database searches
FASTAñ detects patches of regional similarity rather than the best alignmentbetween the query sequence and the database sequences. Very fast, butcomplete sensitivity is sacrificed.
GeneMatcher- The Smith-Waterman (S-W) search algorithm used by the FDF server isabout 5% more sensitive towards divergent matches than the BLASTalgorithm. This significantly increases the chances of finding distanthomologs of your query sequence in the databases.FDF softwareincorporates a frameshift-tolerant search algorithm. This feature isparticularly useful when searching for potential coding sequences inlow-quality DNA sequences, such as those found in EST databases.
MPsearch- MPSRCH is a biological sequence comparison tool that implements thetrue Smith and Waterman algorithm. This algorithm exhaustively comparesevery letter in a query sequence with every letter in the database.
Paralign and SWMMX- searches a number of sequence databases for sequences similar to youramino acid query sequence using two very sensitive algorithms. You canchoose between the well-known Smith-Waterman optimal local alignmentalgorithm or a new algorithm called ParAlign, which is much faster butstill almost as sensitive.
Pfam ñ HMM Search - Unlike standard pairwise alignment methods (e.g. BLAST, FASTA), Pfam HMMs deal sensibly with multidomain proteins.
SASñ Sequences Annotated by Structure - will perform a FASTA search of thegiven sequence against the proteins of known structure in the PDB andreturn a multiple alignment of all hits, each annotated by structuralfeatures.
Scanps 2.3 - Fast implementation of the true Smith & Waterman algorithm for protein database searches.
MOTIF, PATTERN & PROFILE SEARCH
Thereare a limited number of families into which most proteins aregrouped.Proteins within a given family generally have a sharedfunction.Conserved regions are usually important for function or formaintaining a specific 3D structure. Conserved regions usually havesimilar amino acid sequence and/or structural similarities.Domains aredistinct functional regions of a protein, often linked together by aflexible region.Motifs are recurring substructures found in manyproteins.Proteins of 500 or more amino acids most likely containdiscrete functional domains.Regions of low complexity often separatedomains.Long stretches of repeated residues, particularly proline,glutamine, serine, or threonine, often indicate linkersequences.Approximately 2000-3000, out of a predicted 10,000-20,000,different protein families have been characterized.Roughly, half of theproteins encoded in a new genome can be placed in a known family basedon their amino acid sequence.
CDD A Conserved Domain Database and Search Service
eMatrix ñ fast and accurate sequence analysis using minimal-risk scoring matrices.
eMotif Scan ñ sequence database search using eMatrix regular expressions.
eMotif Search ñ protein classification search.
InterProScan ñ queries a protein sequence against InterPro.
Kangaroo- Kangaroo is a pattern search program. Given a sequence pattern theprogram will find all the records that contain that pattern.
MEME ñ Multiple EM for Motif Elicitation- MEME is a tool for discovering motifs in a group of related DNA orprotein sequences.Takes as input a group of DNA or protein sequences(the training set) and outputs as many motifs as requested.MEME uses statistical modeling techniques to automatically choose thebest width, number of occurrences, and description for each motif.
MOTIF- findssequence motifs in a query sequence, also provides functionaland genomic information of the found motifs using DBGET and LinkDB asthe hyperlinked annotations. Results presented graphically, and, whereavailable, 3D structures of the found motifs can be examined by RasMolprogram when the hits are found in PROSITE database.Also, given aprofile generated from the multiple sequence alignment, or, retrievedfrom a motif library such as PROSITE or Pfam, you can align a proteinsequence with the profile.
Network Protein Sequence Analysis-this multi-algorithm server offers two pattern and signature searches:PATTINPROT: scan a protein sequence or a protein database for one orseveral pattern(s) andPROSCAN: scan a sequence for sites/signaturesagainst PROSITE database.
PFam HMM Search - Analyzes a protein query sequence to find Pfam domain matches.
PPSearch - Protein motifs searches
PredictProtein - this multi-algorithm server searches the PROSITE Database to detect functional motifs and PRODOM to detect protein domains.
ProDom BLAST ñ BLAST homology search against all domain sequences in ProDom.
ProfileScan Server- compares a protein or nucleic acid sequence against a profilelibrary (PROSITE or Pfam).
ProtoMap ñ classifies a new protein sequence.
Pscan - uses information derived from the PRINTS database to detect functional fingerprints in protein.
P-val FingerPRINTScan - find the closest matching PRINTS fingerprint/s to a query sequence.
ScanProsite - Scans a protein sequence for the occurrence of patterns stored in the PROSITE database.
SMART ñ Simple Modular Architecture Research Tool
SPRINT ñ Search the PRINTS-S Database.
3motif ñ searches by eMOTIF, PDB Structure or BLOCKS accession number.
SECONDARY SEARCH
Foldingand coiling due to H-bond formation determines secondarystructure.H-bonds form between carboxyl and amino groups of nonadjacentamino acids.A single polypeptide can have both helical and sheetregions.Non-helix and sheet regions can form bends, loops or turns.
BTPRED ñ The Beta-Turn Prediction Server ñ temporarily down
CPHModels - predicts protein structure using comparative (homology) modelling.
COILS- compares a sequence to a database of known parallel two-strandedcoiled-coils and derives a similarity score. By comparing this score tothe distribution of scores in globular and coiled-coil proteins, theprogram then calculates the probability that the sequence will adopt acoiled-coil conformation.
Garnier Peptide Structure Tool- is an implementation of the original Garnier Osguthorpe Robsonalgorithm (GOR I) for predicting protein secondary structure. Secondarystructure prediction is notoriously difficult to do accurately. The GORI alogorithm is one of the first semi-successful methods.
HTH - gives a practical estimation of the probability that the sequence is a helix-turn-helix motif.
Jpred2- takes either a protein sequence or a mulitple alignment of proteinsequences, and predicts secondary structure. It works by combining anumber of modern, high quality prediction methods to form a consensus.
META PredictProtein ñ this multi-algorithm server utilizes eight different algorithms for predicting secondary structure.
MultiCoil- program predicts the location of coiled-coil regions in amino acidsequences and classifies the predictions as dimeric or trimeric. Themethod is based on the PairCoil algorithm.
PairCoil - predicts the location of coiled-coil regions in amino acid sequences by use of Pairwise Residue Correlations.
PredictProtein ñ this multi-algorithm server utilizes two algorithms to predict secondary structure.
PREDATOR- an accurate algorithm for secondary structure prediction based onrecognition of potentially hydrogen-bonded residues in the amino acidsequence.
PSA Protein Structure Prediction Server - determines the probable placement of secondary structural elements along a query sequence.
Structure Prediction Server ñ this multi-algorithm server uses the PHD algorithm to predict secondary structure.
SSpro - Protein secondary structure prediction based on Bidirectional Recurrent Neural Networks (BRNNs).
Tandem Repeats Finder- a program to locate and display tandem repeats (two or more adjacent,approximate copies of a pattern of nucleotides) in DNA sequences.
Tmpred ñ Prediction of Transmembrane Regions and Orientation- makes a prediction of membrane-spanning regions and theirorientation. The algorithm is based on the statistical analysis ofTMbase, a database of naturally occuring transmembrane proteins. Theprediction is made using a combination of several weight-matrices forscoring.
TMHMM - predicts transmembrane helices and the predicted location of the intervening loop regions.
TERTIARY STRUCTURE
Tertiarystructure results from folding of thesecondary structuralelements.Tertiary structure is stabilized by bonds formed between aminoacid R groups (H-bonds, ionic interactions, covalent bonds, hydrophobicinteractions).
Dali- compares the coordinates of a query protein structure andcomparesthem against those in the Protein Data Bank. The output consists of amultiple alignment of structural neighbours.
SWEET - a program for constructing 3D models of saccharides
from their sequences using standard nomenclature.
3D-pssm- A Fast, Web-based Method for Protein Fold Recognition using 1D and 3DSequence Profiles coupled with Secondary Structure and SolvationPotential Information.
TraDES - a New Way to Customize and Explore Protein Conformational Space.
PROTEIN CHEMISTRY
CUTTER: A tool to generate and analyze proteolytic fragments.
FindMod Tool- predicts potential protein post-translational modifications (PTM) andfind potential single amino acid substitutions in peptides.
GlycoMod Tool - predicts the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.
PEPSTATS: Protein Statistics- outputs a report of simple protein sequence information including:molecular weight, number of residues, average residue weight, charge,isoelectric point, for each type of amino acid: number, molar percent,DayhoffStat, for each physico-chemical class of amino acid: number,molar percent.
PredAcc - Protein side chains relative solvent accessibility prediction.
ProtParam Tool- allows the computation of various physical and chemical parametersfor a given protein stored in SWISS-PROT or TrEMBL or for a userentered sequence.
YinOYang 1.2 Prediction Server - produces neural network predictions for O-þ-GlcNAc attachment sites in eukaryotic protein sequences.
Posted by gwlee
Trackback URL : http://thegreatgoodplace.com/tt/study/trackback/10
gwLee's Study story
- gwlee