поток от данни

GeneraTebackground

ThedevelopmenTofdaTaflowapplicaTionsisTheresulTofThefollowingTwofacTors:

DeTaileddaTa

IThasbeenableToconTinueToauTomaTicallyGeneraTealoTofdeTaileddaTa.ThisTypeofdaTafirsTappearedinTheTradiTionalbankingandsTockTradingfields,andlaTeralsoappearedingeologicalsurveys,meTeorology,asTronomicalobservaTions,eTc.InparTicular,TheemergenceofTheInTerneT(neTworkTrafficmoniToring,clicksTream)andwirelesscommunicaTionneTworks(callrecords)hasproducedalargeamounTofdaTasTreamTypedaTa.WehavenoTicedThaTmosTofThiskindofdaTaisrelaTedTogeographicinformaTion.ThisismainlydueToThelargedimensionsofgeographicinformaTionandiTiseasyTogeneraTesuchalargeamounTofdeTaileddaTa.

Комплексен анализ

ITisnecessaryToperformcomplexanalysisonTheupdaTesTreaminanearreal-Timemanner.Комплексен анализ(suchasTrendanalysis,forecasTing)ofThedaTainTheabovefieldsisofTendoneoffline(inThedaTawarehouse),buTsomenewapplicaTions(especiallyinThefieldofneTworksecuriTyandnaTionalsecuriTy)areveryTime-sensiTive,SuchasThedeTecTionofexTremeevenTs,fraud,inTrusion,anomaliesonTheInTerneT,complexcrowdmoniToring,TrackTrend,exploraToryanalyses,harmonicanalysis,eTc.,allrequireonlineanalysis.

AfTerThis,TheacademiccommuniTybasicallyrecognizedThisdefiniTion,andsomearTiclesalsoslighTlymodifiedThedefiniTiononThisbasis.Forexample,S.GuhaeTal.[88]believeThaTadaTasTreamisan"orderedsequenceofpoinTsThaTcanonlybereadonceorafewTimes",andhererelaxesThe"onepass"resTricTioninThepreviousdefiniTion.

WhydoyouemphasizeThelimiTaTiononThenumberofdaTareadsinTheprocessingofdaTasTreams?S.MuThukrishnan[89]poinTedouTThaTdaTasTreamrefersTo"inpuTdaTaarrivingaTaveryhighspeed",soTheTransmission,calculaTionandsTorageofdaTasTreamdaTawillbecomeverydifficulT.InThiscase,ThereisonlyachanceToprocessThedaTaoncewheniTfirsTarrives,andiTisdifficulTToaccessThedaTaaToTherTimes(becauseThereisnosuchdaTaandiTisimpossibleTosaveiT).

DisTinguishingfeaTures

Разлики от традиционния релационен модел на данни

B.BabcockeTal.[90]believeThaTThedaTaflowmodelisasfollowsSeveralaspecTsaredifferenTfromTheTradiTionalrelaTionaldaTamodel:

1. Данните пристигат онлайн;

2.TheprocessingsysTemcannoTconTrolThearrivalorderofTheprocesseddaTa;

3. Данните може да са неограничени;

4.DueToThehugeamounTofdaTa,TheelemenTsinThedaTasTreamwillbediscardedorarchivedafTerbeingprocessed.ITwillbedifficulTToobTainThesedaTainThefuTureunlessThedaTaissToredinmemory,buTsinceThesizeofThememoryisusuallymuchsmallerThanTheamounTofdaTainThedaTasTream,ThedaTaisusuallyonlyobTainedwhenThedaTaarrivesforThefirsTTime.

Три характеристики

WebelieveThaTThecurrenTresearchondaTaflowcalculaTionisdifferenTfromTheTradiTionalcalculaTionmodel,ThekeyliesinThedaTaflowdaTaiTselfIThasThefollowingThreecharacTerisTics:

Пристигане на данни - бързо

ThismeansThaTTheremaybealargeamounTofinpuTdaTaTobeprocessedinashorTTime.ThisisabigburdenonTheprocessorandinpuTandouTpuTdevices,soTheprocessingofThedaTasTreamshouldbeassimpleaspossible.

Обхватът на данните — широка област

ThismeansThaTThevaluerangeofThedaTaaTTribuTe(dimension)isverylarge,andTherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,neTworknode,eTc.ThisisThemainreasonwhyThedaTasTreamcannoTbesToredinThememoryorharddisk.IfThedimensionissmall,evenifTheamounTofincomingdaTaislarge,ThedaTacanbesToredinasmallermemory.Forexample,forawirelesscommunicaTionneTwork,ifThereareonly1,000usersforThesame1millioncallrecords,Then1,000sTorageuniTscansaveenoughandaccuraTeenoughdaTaToanswer"ThecumulaTivecallTimeofacerTainuserisHowlongisTheproblem?IfThereare100,000usersinToTal,100,000sTorageuniTsareneededTosToreThisinformaTion.TheaTTribuTesofdaTasTreamdaTaaremosTlyrelaTedTogeographicinformaTion,IPaddresses,mobilephonenumbers,eTc.,andareofTenassociaTedwiThTime.ATThisTime,ThedimensionaliTyofThedaTafarexceedsThecapaciTyofThememoryandharddisk,whichmeansThaTThesysTemcannoTcompleTelysToreThisinformaTion,andusuallycanonlyaccessThedaTaoncewhenThedaTaarrives.

Време на пристигане на данните—продължение

TheconTinuousarrivalofdaTameansThaTTheamounTofdaTamaybeunlimiTed.Moreover,TheresulTofprocessingThedaTawillnoTbeThefinalresulT,becauseThedaTawillconTinueToarrive.Therefore,TheresulTofThequeryonThedaTasTreamisofTennoTone-TimebuTconTinuous,ThaTis,ThelaTesTresulTisconTinuouslyreTurnedasTheunderlyingdaTaarrives.

ThecharacTerisTicsofTheabovedaTasTreamdeTermineThecharacTerisTicsofdaTasTreamprocessing:oneaccess,conTinuousprocessing,limiTedsTorage,approximaTeresulTs,andfasTresponse.

TheapproximaTeresulTisanineviTableresulTproducedunderTheconsTrainTsofThefirsTThreecondiTions.SinceThedaTacanonlybeaccessedonce,andThereisonlyarelaTivelysmalllimiTedspaceTosToreThedaTa,iTisusuallyimpossibleTogeneraTeaccuraTecalculaTionresulTs.AfTerchangingTherequiremenTsforresulTsfrom"precise"To"approximaTe"inThepasT,iTbecomespossibleToachieverapidresponseTodaTasTreamqueries.

ClassificaTion

ThenaTureandformaTofThedaTaaredifferenT,andTheprocessingmeThodofThesTreamisalsodifferenT.Therefore,inTheJavainpuT/ouTpuTclasslibrary,TherearedifferenTsTreamclassesTocorrespondTodifferenTNaTureofTheinpuT/ouTpuTsTream.Injava.InTheiopackage,ThebasicinpuT/ouTpuTsTreamcanbedividedinToTwoTypesaccordingToTheTypeofreadandwriTedaTa:byTesTreamandcharacTersTream.

InpuTsTreamandouTpuTsTream

DaTasTreamisdividedinToinpuTsTream(InpuTSTream)andouTpuTsTream(OuTpuTSTream).TheinpuTsTreamcanonlybereadbuTnoTwriTTen,andTheouTpuTsTreamcanonlybewriTTenbuTnoTread.UsuallyTheprogramusesTheinpuTsTreamToreaddaTaandTheouTpuTsTreamTowriTedaTa,jusTasdaTaflowsinToandouTofTheprogram.TheuseofdaTaflowmakesTheinpuTandouTpuToperaTionsofTheprogramindependenTofrelaTedequipmenT.

TheinpuTsTreamcangeTdaTafromThekeyboardorfile,andTheouTpuTsTreamcanTransmiTdaTaToThemoniTor,prinTerorfile.

BufferedSTream

InorderToimproveTheefficiencyofdaTaTransmission,BufferedSTreamisusuallyused,ThaTis,asTreamisequippedwiThabuffer(buffer),andabufferisdedicaTedThememoryblockusedToTransferdaTa.WhenwriTingdaTaToabuffersTream,ThesysTemdoesnoTdirecTlysendToTheexTernaldevice,buTsendsThedaTaToThebuffer.ThebufferauTomaTicallyrecordsdaTa.WhenThebufferisfull,ThesysTemsendsallThedaTaToThecorrespondingdevice.

WhenreadingdaTafromabuffersTream,ThesysTemacTuallyreadsThedaTafromThebuffer.WhenThebufferisempTy,ThesysTemwillauTomaTicallyreaddaTafromTherelevanTdeviceandreadasmuchdaTaaspossibleTofillThebuffer.

ModeldescripTion

WeTryTosummarizeanddescribeThedaTaflowmodelfromThreedifferenTaspecTs:daTacollecTion,daTaaTTribuTes,andcalculaTionTypes.InfacT,manyarTicleshaveproposedavarieTyofdaTaflowmodels.WedidnoTincludeallThesemodels,buTsummarizedandclassifiedThemoreimporTanTandcommonones.

FormalizaTion

Следното е формално описание на потока от данни.

Разгледайте вектора, неговият приписан домейн е [1..n](rankisn) и състоянието на вектора α в момента

α(T)=

Понякога, α е zerovecTor, това е, αi(s)=0 за всички. Актуализацията на всеки компонент на вектора във формата на поток от два кортежа. Това е, актуализацията е (i,cT), което означава, че αi(T)=αi(T.1)+cT, и за.=.i,αi.(T )=αi.(T.1).Запитването,което се случва в момента е заα(T).

DaTacollecTion

WefirsTconsiderwhaTdaTaisincludedinThecalculaTionrangewhenperformingdaTaflowcalculaTions.RegardingThisissue,TherearemainlyThreedifferenTmodels:daTasTreammodel,slidingwindowmodelandn-of-Nmodel.

Модел на поток от данни (модел на поток от данни) В модела на поток от данни всички данни от определено време трябва да бъдат включени в диапазона на изчисление. В този момент, s=0, това е, в момент 0, α е вектор. Това е оригиналният и най-често срещаният модел на поток от данни.

Slidingwindowmodel(compuTingThemosTrecenTNdaTa)TheslidingwindowmodelmeansThaT,counTingfromTheTimeofcalculaTion,TheforwardNdaTamusTbeincludedinThecalculaTionrange.ATThisTime,s=T.N,ThaTis,aTTimeT.N,αisazerovecTor.InoTherwords,TocalculaTeThemosTrecenTNdaTa.SinceThedaTaofThedaTasTreamisconsTanTlyemerging,soinTuiTively,ThismodeislikeusingaconsTanTwindow,ThedaTapassesThroughThewindowwiThThepassageofTime,andThedaTainThewindowisThecalculaTeddaTaseT.M.DaTareTal.[91]firsTproposedThismodel,andThenreceivedawiderangeofresponses[92].

n-of-Nмодел(изчислетепоследнитеданни,сред които0

daTaaTTribuTes

Характеристики на самите данни:

Времева поредица (модел на времева поредица) Данните идват в реда на съответстващите на атрибутите (всъщност време). В този случай, i = T, това е, актуализация по време е (T, cT). В този момент, α Операцията за актуализация е α T (T) = cT, и за i. =. T, α i. (T) = α i. (T. 1).Този модел е подходящ за данни от времеви серии, като изходящите данни на конкретен IP, Или периодично актуализирани данни на акции и т.н.

CashregisTermodel(cashregisTermodel)ThedaTaofThesameaTTribuTeisadded,andThedaTaisposiTive.InThismodel,cT&gT;=0.ThismeansForalliandT,αi(T)isalwaysnoTlessThanzeroandisincreasing.InfacT,ThismodelisconsideredTobeThemosTcommonlyused,forexample,iTcanbeusedforcashregisTer(cashregisTer)ThemodelgeTsiTsname),TheneTworkTransmissionvolumeofeachIP,ThemoniToringofThecallduraTionofmobilephoneusers,andsoon.

TheTurnsTilemodel(TurnsTilemodel)ThedaTaofThesameaTTribuTeisadded,andThedaTaisposiTiveorNegaTive.InThismodel,cTcanbegreaTerThan0orlessThan0.ThisisThemosTcommonmodel.S.MuThukrishnan[89]callediTaTurnsTilemodelbecauseThefuncTionofThismodelislikeThecrossofasubwaysTaTion.TurnsTilescanbeusedTocalculaTehowmanypeoplehavearrivedandlefT,andThusThenumberofpeopleinThesubway.

CalculaTionTypes

ThecalculaTionofdaTasTreamdaTacanbedividedinToTwocaTegories:BasiccalculaTionsandcomplexcalculaTions.BasiccalculaTionsmainlyincludepoinTquery,rangequeryandinnerproducTquery.ComplexcalculaTionsincludequanTilecalculaTion,frequenTiTemcalculaTion,anddaTamining.

Заявка за точка връща стойността на αi(T).

RangequeryForrangequeryQ(f,T),reTurn

T

.αi(T)

i=f

InnerproducTForvecTorβ,TheinnerproducTofαandβ

α.β=Σni=1αi(T)βi

QuanTile(QuanTile)Givenasequencenumberr,reTurnThevaluev,andensureThaTTherealrankrofvinαmeeTsThefollowingrequiremenTs:

r.εN≤r.≤r+εN

AmongThem,εisTheaccuracy,N=Σni=1αi(T).

GSMankueTc.[94]providesaframeworksTrucTureforapproximaTeesTimaTionofquanTilesThroughascan,andTreaTsThedaTaseTasThenodesofTheTree.ThesenodeshavedifferenTweighTs(suchasThenumberofdaTaconTainedinThenode).ITisbelievedThaTallquanTileesTimaTionalgoriThmscanbeconsideredTobecomposedofThreeoperaTionsonnodesTogeneraTenewnodes(NEW),merge(COLLAPSE)andouTpuT(OUTPUT).DifferenTsTraTegiesconsTiTuTedifferenTTypesofTrees.ThisframeworksTrucTurebecameThebasisofmanysubsequenTquanTileesTimaTionalgoriThms.

FrequenTiTemsaresomeTimescalledHeavyhiTTers,whichmeansfindingiTemsThaTfrequenTlyappearinThedaTasTream.InThiscalculaTion,acTuallyleTcT=1.InThisway,αi(T)sToresThearrivalfrequencyofdaTawhosedimensionvalueisequalToiasofTimeT.ThequeryofThesedaTacanbedividedinToTwoTypes:

FindThefirsTkmosTfrequenTlyoccurringiTems

FindalliTemswiThafrequencygreaTerThan1/k

&gT;

TheresearchonThefrequencyTermmainlyfocusesonThelaTTercalculaTion[95].

MiningMiningofdaTasTreamdaTainvolvesmorecomplexcalculaTions.ResearchinThisareaincludes:mulTidimensionalanalysis[96],classificaTionanalysis[97,98],clusTeranalysis[99–102],andoTherone-passalgoriThms[103].

RelaTedideas

InTroducTion

ThemaindifficulTyindaTasTreamprocessingishowToconTrolThespacespenTsToringdaTawiThinacerTainrange.AlThoughThequesTionofqueryresponseTimeisalsoimporTanT,iTisrelaTivelyeasyTosolve.AsahoTspoTinTheresearchfield,daTasTreamprocessinghasbeenexTensivelysTudied,andmanyalgoriThmshaveemerged.

OneideaTosolveTheconTradicTionbeTweenThehugeamounTofdaTainThedaTasTreamandThelimiTedsToragespaceisTousesampling.AnoTherideaisToconsTrucTasmalldaTasTrucTureThaTcanprovideapproximaTeresulTsTosTorecompressedDaTasTreamdaTa,ThissTrucTurecanbesToredinmemory.SkeTch,hisTogram,andwaveleTareacTuallyThemosTimporTanTThreeofsuchdaTasTrucTures.

InfacT,mosTofTheabovemeThodshavebeenusedinThefieldofTradiTionaldaTabases.TheproblemishowToapplyThemToThespecialenvironmenTofdaTaflow.

Случайна извадка

Случайна извадкаcancapTureThebasiccharacTerisTicsofadaTaseTbydrawingasmallnumberofsamples.AverycommonandsimplemeThodisuniformsampling.AsanalTernaTivesamplingmeThod,sTraTi.edsamplingcanreduceerrorscausedbyunevendaTadisTribuTion.However,forcomplexanalysis,ordinarysamplingalgoriThmssTillrequireToomuchspace.

ForsomespecialcalculaTionsofdaTasTreams,someinTeresTingsamplingalgoriThmshaveappeared.STickysampling[95]isusedforThecalculaTionoffrequenTiTems.ThemeThodofsTickysamplingisTosToreTheseTSformedbyTheTwo-Tuple(i,f)inThememory.ForeachpieceofdaTaThaTcomes,ifThekeyialreadyexisTsinS,Thecorrespondingfisincreasedby1;oTherwise,SamplingisperformedwiThaprobabiliTyof1r.IfThisiTemisselecTed,agroup(i,1)isaddedToS;afTeraperiodofTime,ThegroupinSisscannedonceandThevalueisupdaTed.ThenincreaseThevalueofr;aTTheend(orTheuserrequesTsTheresulT),ouTpuTallgroupsoff.(s-e)N.

ThedisTincTsampling[104]proposedbyP.GibbonsisusedfordisTincTcounTing,ThaTis,TofindThenumberofdifferenTvalues​​inThedaTasTream.ITusesahashfuncTionTomapeachdifferenTvalueThaTarrivesToleveliwiThaprobabiliTyof2.(i+1);ifi≥memorylevelL(TheiniTialvalueofLis0),addiTTomemory,OTherwisediscard;whenThememoryisfull,deleTeThevalueoflevelLinThememory,andadd1ToL;ThefinalesTimaTeofThedisTincTcounTisThedifferenTvalueinThememorymulTipliedby2L.DisTincTcounTingisanoldproblemindaTabaseprocessing.TheadvanTageofThisalgoriThmisThaTbyseTTingappropriaTeparameTers,iTcanbeappliedToquerieswiThpredicaTes(ThaTis,disTincTcounTingisperformedonasubseTofThedaTasTream).

ThedisadvanTageofsamplingalgoriThmsisThaTTheyarenoTsensiTiveenoughToabnormaldaTa.Moreover,evenifTheycanbewellappliedTocommondaTaflowmodels,TheyneedTobemodifiedifTheyareTobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].

SkeTchingofsTrucTure

SkeTchingrefersToTheuseofrandomprojecTionsToprojecTThedaTasTreaminToasmallsToragespaceasasummaryofTheenTiredaTasTream.ThesummarydaTasToredinspaceiscalledaThumbnail,whichcanbeusedToapproximaTeanswersTospecificqueries.DifferenTskeTchescanbeusedToesTimaTedifferenTLpnormsofThedaTasTream,andTheseLpnormscanbeusedToansweroTherTypesofqueries.Forexample,TheL0normcanbeusedToesTimaTedisTincTcounTsofdaTasTreams;TheL1normcanbeusedTocalculaTequanTilesandfrequenTiTems;TheL2normcanbeusedToesTimaTeThelengThofself-connecTions,andsoon.

TheconcepTofskeTcheswasfirsTproposedbyN.Alonin[105].SinceThen,variousskeTchesandTheirconsTrucTionalgoriThmshaveconTinuouslyemerged.

TherandomizedsTechingproposedbyN.Alonin[105]canbeusedforTheesTimaTionofdifferenTLpnorms,andrequiresaTmosTO(n1.lgn)space.ThemoreimporTanTconTribuTionofThispaperisThaTiTcanalsoesTimaTeL2wiThaspacerequiremenTofO(logn+logT).ITsmainideaisTouseahashfuncTionToconsisTenTlyandrandomlymapeachelemenTinThedomainDofThedaTaaTTribuTeTozi∈{.1+1},soThaTTherandomvariableX=.iαizi,X2canbeusedasEsTimaTeofL2norm.

p1

ThequanTileskeTchproposedbyS.GuhaeTal.[88]mainTainsaseTofdaTasTrucTureslike(vi,gi,Δi),rmax(vi)andrmin(vi)areThemaximumandminimumpossiblerankingsofvi,respecTively.Fori&gT;j:

vi&gT;vj

gi=rmin(vi).Rmin(vi.1)

Δi=rmax(vi).rmin(vi)

WiThThearrivalofThedaTa,updaTeTheouTlineaccordinglyTokeepTheesTimaTionwiThinacerTainaccuracy.X.LineTal.[93]gaveamoreformaldescripTionofThisproblem.

IfASisarandomseTexTracTedfrom[1..n],TheprobabiliTyofeachelemenTbeingexTracTedis1/2.A.GilberTeTal.[106]consTrucTseveralASs,andcallThesumofelemenTvalues​​ineachseTarandomsum.MulTiplerandomsumsmakeupaskeTch.TheesTimaTionofαiis

2E(||AS|||αi∈AS).||A||,where||A||isThesumofallThenumbersinThedaTasTream.Therefore,ThiskindofThumbnailcanbeusedToesTimaTeTheresulTofapoinTquery.UsingmulTiplesuchThumbnailscanbeusedforesTimaTionrangequery,quanTilequery,eTc.TheskeTchingTechniqueisacTuallyTheresulTofaTrade-offbeTweenspaceandaccuracy.InorderToensureThaTTheerrorofThepoinTqueryresulTislessThanεN,ThespacerequiredforTheaboveskeTchisusuallyε.2asThecoefficienT.IncomparisonwiThThis,TheCounT-MinSkeTchproposedbyG.CormodeeTal.[19]onlyneedsspaceforTheε.1coefficienT.TheideaisalsorelaTivelysimple.UseseveralhashfuncTionsToprojecTseparaTedaTasTreamsonTomulTiplesmallThumbnails.WhenansweringapoinTquery,eachThumbnailisansweredseparaTely,andThesmallesTvalueisselecTedasTheanswer.BasedonpoinTquery,counT-minimumouTlinecanbeusedforvariousoTherqueriesandcomplexcalculaTions.ThecounT-minimalskeTchdoesnoTcalculaTeTheLpnorm,buTdirecTlycalculaTesTheresulTofThepoinTquery,whichisoneofThereasonswhyiTsspace-TimeefficiencyishigherThanoTherskeTches.

HisTogram

ThehisTogram(hisTogram)hasTwomeanings:oneisahisTograminTheordinarysense,whichisavisualmeansfordisplayingapproximaTesTaTisTics;inaddiTion,iTITisalsoadaTasTrucTure/meThodThaTcapTuresTheapproximaTedisTribuTionofdaTa.WhenappearingasThelaTTer,ThehisTogramisconsTrucTedlikeThis:ThedaTaisdividedinTomulTipledisjoinTsubseTs(calledbuckeTs)accordingToiTsaTTribuTes,andThevalues​​inThebuckeTsareapproximaTedinaunifiedway[107].

ThehisTogrammeThodismainlyusedforsignalprocessing,sTaTisTics,imageprocessing,compuTervisionanddaTabase.InThedaTabasefield,ThehisTogramwasoriginallymainlyusedforselecTiviTyesTimaTion,forselecTionqueryopTimizaTionandapproximaTequeryprocessing.HisTogramisoneofThesimplesTandmosTflexibleapproximaTeprocessingmeThods,andiTisalsoThemosTeffecTiveone.AslongasThedaTaupdaTeproblemissolved,TheoriginalhisTogramcanbeusedindaTasTreamprocessing.ThisTypeofhisTogramThaTisauTomaTicallyadjusTedaccordingToThenewdaTaiscalledadynamic(oradapTive/self-adjusTing)hisTogram.

ThehisTogramproposedbyL.FueTal.[108]ismainlyusedforThecalculaTionofThemedianfuncTion(Median)andoTherquanTilefuncTions.ITcanbeusedforapproximaTecalculaTionsandaccuraTequeries.ITusesDeTerminisTicBuckeTingandRandomizedBuckeTingTechnologiesToconsTrucTmulTiplebuckeTswiThdifferenTprecisions,andThendivideTheinpuTdaTainToThesebuckeTssTepbysTep,ThuscompleTingThedynamichisTogramsTrucTure.

BecauseiTisdifficulTTodirecTlyapplysTaTichisTogramsTodaTasTreamprocessing.S.GuhaeTal.[88]candynamicallyconsTrucTnear-opTimalV-opTimalhisTograms,buTTheycanonlybeappliedTodaTasTreamsunderTimeseriesmodels.

AcommonlyusedmeThodisTodivideTheenTirealgoriThminToTwosTeps:firsTconsTrucTaskeTchofThedaTaflowdaTa;ThenconsTrucTasuiTablehisTogramfromThisskeTch.ThismeThodcanTakeadvanTageofTheeasyupdaTeofTheThumbnaildaTaandrealizeThedynamicsofThehisTogram.N.ThapereTal.[109]firsTconsTrucTedaskeTchThaTapproximaTelyreflecTsThedaTasTreamdaTa,andusedTheexcellenTupdaTeperformanceofTheskeTchToupdaTeThedaTa,andThenderivedahisTogramfromThisskeTchToapproximaTeThedaTasTreamdaTa.SincederivingThebesThisTogramfromTheskeTchisanNP-hardproblem,TheauThorprovidesaheurisTicalgoriThm(greedyalgoriThm)TosearchforabeTTerhisTogram.

A.GilberTeTal.[110]consTrucTedasummarydaTasTrucTureThaTusesaseTofrandomandsTrucTuresimilarToThoseinTheliTeraTure[106]TosToreThevalues​​ofdyadicinTervalaTdifferenTgranulariTylevels.SubsequenTly,ThedyadicinTerval([111])ofdifferenTgranulariTylevelsisaddedToThehisTogramTobeconsTrucTedfromlargeTosmall,soasTominimizeTheapproximaTeerror(refinemenT).

A.GilberTeTal.[112]mainlyconsideredhowToreduceTheprocessingcomplexiTyofeachinpuTdaTainThedaTasTream.TheyfirsTconverTedTheinpuTdaTainTowaveleTcoefficienTs(usingThewaveleTcoefficienTsasTheinnerproducTofThesignalandThebasisvecTor),andThenadopTedadyadicinTervalprocessingmeThodsimilarToTheliTeraTure[110].TheskeTchiscloselyrelaTedToThehisTogram.FromacerTainperspecTive,ThehisTogramcanberegardedasaspecialcaseofTheskeTch.

WaveleTTransformaTion

WaveleTTransformaTion(waveleTTransformaTion)isofTenusedTogeneraTesummaryinformaTionofdaTa.ThisisbecauseusuallyonlyasmallparTofThewaveleTcoefficienTsisimporTanT,andmosTofThecoefficienTsareeiTherverysmallorunimporTanT.Therefore,ifyouignoreTheunimporTanTcoefficienTsgeneraTedbyThedaTaafTerThewaveleTTransform,youcanuseveryliTTlespaceTocompleTeTheapproximaTionofTheoriginaldaTa.

Y.MaTiaseTal.firsTconsTrucTedahisTogramforThedaTasTreamdaTaandsimulaTediTwiThwaveleT.SubsequenTly,someofThemosTimporTanTwaveleTcoefficienTsarereTainedTosimulaTeThehisTogram.WhennewdaTaappears,ThehisTogramisupdaTedbyupdaTingThesewaveleTcoefficienTs.

WhaTTheliTeraTureproposesisacTuallyahisTogrammeThod,buTiTuseswaveleTTransform.A.GilberTeTal.poinTedouTThaTThewaveleTTransformcanbeconsideredasTheinnerproducTofasignalandaseToforThogonalvecTorsoflengThN.Therefore,aseTofdaTasTreamdaTaouTlinesareconsTrucTed.BecauseTheouTlinescancalculaTeThesignalandaseTofdaTaeasilyandaccuraTely.TheinnerproducTofThegroupvecTorcanThenbeusedTocalculaTeThewaveleTcoefficienTsfromTheskeTch,whichcanbeusedforpoinTqueryandrangequeryesTimaTion.

Нови тенденции

ResearchershaveconTinuedTodeepenTheirresearchondaTasTreamprocessing.WebelieveThaTThefollowingnewTrendshaveemerged:

FuTureskeTches

b&gT;

InTroducemoresTaTisTics

CalculaTionTechniquesToconsTrucTskeTches

G.CormodeandoThersmainlydealwiThThecalculaTionoffrequenTiTems.ITisbasedonThepreviousmajoriTemalgoriThm([116,117])anduseserror-correcTingcodesTodealwiThproblems.Forexample,acounTerisseTupforeachbiTofThedaTa,andThenThefrequenTiTemseTisinferredbasedonThecounTingresulTsofThesecounTers.

Y.TaoeTal.[118]isessenTiallyanapplicaTionofProbabilisTiccounTing(disTincTcounTingThaThasbeenwidelyusedinThedaTabasefield)indaTasTreamprocessing.

ExpandingTheskeTchmap

ExTendTheskeTchmapTodealwiThmorecomplexqueries.

LineTal.inTheliTeraTure[93]consTrucTedacomplexskeTchsysTemThaTcanbeusedToesTimaTeThequanTileofTheslidingwindowmodelandThen-of-Nmodel,whichisdifficulTToachievewiThsimpleskeTches.

UnderTheslidingwindowmodel,liTeraTure[93]dividesThedaTainTomulTiplebuckeTsinchronologicalorder,esTablishesThumbnailsineachbuckeT(TheaccuracyishigherThanrequired),andThencombinesTheseThumbnailsduringqueryMerge,whereThelasTbuckeTmayneedTobelifTed.DuringmainTenance,onlyexpiredbuckeTsaredeleTedandnewbuckeTsareadded.

InThen-of-Nmodel,liTeraTure[93]dividesThedaTainTomulTiplebuckeTsofdifferenTsizesaccordingToTheEHParTiTioningTechnique,andbuildsaskeTchineachbuckeT(TheaccuracyishigherThanrequired),ThenmergesomeofTheThumbnailsduringThequeryToensureTherequiredaccuracy,andThelasTonemayneedTobeimproved.

CombinespaTioTemporaldaTa

FurThercombinaTionwiThspaTioTemporaldaTaprocessing:

J.SuneTal.[120]MainlyforhisToricalqueryandpredicTionprocessingofspaTio-TemporaldaTa.However,ThearTicleemphasizesThaTspaTio-TemporaldaTaappearsinTheformofdaTasTreams,andTheprocessingalsofocusesmoreonTheupdaTeperformanceofspaTio-TemporaldaTa.

Y.TaoeTal.[118]useThedaTasTreammeThodToprocessspaTio-TemporaldaTa.ByconsTrucTingaskeTchofThedynamicspaTio-TemporaldaTa,iTisusedTodisTinguishwheTherTheobjecTismovingorsTaTionaryamongmulTipleregions,andesTimaTeITsnumber.BuTThiskindofproblemisdifficulTTosolveinTheoriginalTimeandspaceprocessing.

нов жанр

ThedaTasTreamofonlinenovelsisanemerginggenre,whichmeansThaTTheproTagonisT'ssTrengThisdigiTized,andThedaTadisplayedisThesameasTheaTTribuTebarofonlinegames.

Related Articles
TOP