data flow

Generatebackground

Thedevelopmentofdataflowapplicationsistheresultofthefollowingtwofactors:

Detaileddata

IthasbeenabletocontinuetoautomaticallyGeneratealotofdetaileddata.Thistypeofdatafirstappearedinthetraditionalbankingandstocktradingfields,andlateralsoappearedingeologicalsurveys,meteorology,astronomicalobservations,etc.Inparticular,theemergenceoftheInternet(networktrafficmonitoring,clickstream)andwirelesscommunicationnetworks(callrecords)hasproducedalargeamountofdatastreamtypedata.Wehavenoticedthatmostofthiskindofdataisrelatedtogeographicinformation.Thisismainlyduetothelargedimensionsofgeographicinformationanditiseasytogeneratesuchalargeamountofdetaileddata.

Complexanalysis

Itisnecessarytoperformcomplexanalysisontheupdatestreaminanearreal-timemanner.Complexanalysis(suchastrendanalysis,forecasting)ofthedataintheabovefieldsisoftendoneoffline(inthedatawarehouse),butsomenewapplications(especiallyinthefieldofnetworksecurityandnationalsecurity)areverytime-sensitive,Suchasthedetectionofextremeevents,fraud,intrusion,anomaliesontheInternet,complexcrowdmonitoring,tracktrend,exploratoryanalyses,harmonicanalysis,etc.,allrequireonlineanalysis.

Afterthis,theacademiccommunitybasicallyrecognizedthisdefinition,andsomearticlesalsoslightlymodifiedthedefinitiononthisbasis.Forexample,S.Guhaetal.[88]believethatadatastreamisan"orderedsequenceofpointsthatcanonlybereadonceorafewtimes",andhererelaxesthe"onepass"restrictioninthepreviousdefinition.

Whydoyouemphasizethelimitationonthenumberofdatareadsintheprocessingofdatastreams?S.Muthukrishnan[89]pointedoutthatdatastreamrefersto"inputdataarrivingataveryhighspeed",sothetransmission,calculationandstorageofdatastreamdatawillbecomeverydifficult.Inthiscase,thereisonlyachancetoprocessthedataoncewhenitfirstarrives,anditisdifficulttoaccessthedataatothertimes(becausethereisnosuchdataanditisimpossibletosaveit).

Distinguishingfeatures

Differencesfromthetraditionalrelationaldatamodel

B.Babcocketal.[90]believethatthedataflowmodelisasfollowsSeveralaspectsaredifferentfromthetraditionalrelationaldatamodel:

1.Thedataarrivesonline;

2.Theprocessingsystemcannotcontrolthearrivalorderoftheprocesseddata;

3.Thedatamaybeunlimited;

4.Duetothehugeamountofdata,theelementsinthedatastreamwillbediscardedorarchivedafterbeingprocessed.Itwillbedifficulttoobtainthesedatainthefutureunlessthedataisstoredinmemory,butsincethesizeofthememoryisusuallymuchsmallerthantheamountofdatainthedatastream,thedataisusuallyonlyobtainedwhenthedataarrivesforthefirsttime.

Threecharacteristics

Webelievethatthecurrentresearchondataflowcalculationisdifferentfromthetraditionalcalculationmodel,thekeyliesinthedataflowdataitselfIthasthefollowingthreecharacteristics:

Dataarrival—fast

Thismeansthattheremaybealargeamountofinputdatatobeprocessedinashorttime.Thisisabigburdenontheprocessorandinputandoutputdevices,sotheprocessingofthedatastreamshouldbeassimpleaspossible.

Therangeofdata—widearea

Thismeansthatthevaluerangeofthedataattribute(dimension)isverylarge,andtherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,networknode,etc.Thisisthemainreasonwhythedatastreamcannotbestoredinthememoryorharddisk.Ifthedimensionissmall,eveniftheamountofincomingdataislarge,thedatacanbestoredinasmallermemory.Forexample,forawirelesscommunicationnetwork,ifthereareonly1,000usersforthesame1millioncallrecords,then1,000storageunitscansaveenoughandaccurateenoughdatatoanswer"ThecumulativecalltimeofacertainuserisHowlongistheproblem?Ifthereare100,000usersintotal,100,000storageunitsareneededtostorethisinformation.Theattributesofdatastreamdataaremostlyrelatedtogeographicinformation,IPaddresses,mobilephonenumbers,etc.,andareoftenassociatedwithtime.Atthistime,thedimensionalityofthedatafarexceedsthecapacityofthememoryandharddisk,whichmeansthatthesystemcannotcompletelystorethisinformation,andusuallycanonlyaccessthedataoncewhenthedataarrives.

Timeofdataarrival—continuation

Thecontinuousarrivalofdatameansthattheamountofdatamaybeunlimited.Moreover,theresultofprocessingthedatawillnotbethefinalresult,becausethedatawillcontinuetoarrive.Therefore,theresultofthequeryonthedatastreamisoftennotone-timebutcontinuous,thatis,thelatestresultiscontinuouslyreturnedastheunderlyingdataarrives.

Thecharacteristicsoftheabovedatastreamdeterminethecharacteristicsofdatastreamprocessing:oneaccess,continuousprocessing,limitedstorage,approximateresults,andfastresponse.

Theapproximateresultisaninevitableresultproducedundertheconstraintsofthefirstthreeconditions.Sincethedatacanonlybeaccessedonce,andthereisonlyarelativelysmalllimitedspacetostorethedata,itisusuallyimpossibletogenerateaccuratecalculationresults.Afterchangingtherequirementsforresultsfrom"precise"to"approximate"inthepast,itbecomespossibletoachieverapidresponsetodatastreamqueries.

Classification

Thenatureandformatofthedataaredifferent,andtheprocessingmethodofthestreamisalsodifferent.Therefore,intheJavainput/outputclasslibrary,therearedifferentstreamclassestocorrespondtodifferentNatureoftheinput/outputstream.Injava.Intheiopackage,thebasicinput/outputstreamcanbedividedintotwotypesaccordingtothetypeofreadandwritedata:bytestreamandcharacterstream.

Inputstreamandoutputstream

Datastreamisdividedintoinputstream(InputStream)andoutputstream(OutputStream).Theinputstreamcanonlybereadbutnotwritten,andtheoutputstreamcanonlybewrittenbutnotread.Usuallytheprogramusestheinputstreamtoreaddataandtheoutputstreamtowritedata,justasdataflowsintoandoutoftheprogram.Theuseofdataflowmakestheinputandoutputoperationsoftheprogramindependentofrelatedequipment.

Theinputstreamcangetdatafromthekeyboardorfile,andtheoutputstreamcantransmitdatatothemonitor,printerorfile.

BufferedStream

Inordertoimprovetheefficiencyofdatatransmission,BufferedStreamisusuallyused,thatis,astreamisequippedwithabuffer(buffer),andabufferisdedicatedThememoryblockusedtotransferdata.Whenwritingdatatoabufferstream,thesystemdoesnotdirectlysendtotheexternaldevice,butsendsthedatatothebuffer.Thebufferautomaticallyrecordsdata.Whenthebufferisfull,thesystemsendsallthedatatothecorrespondingdevice.

Whenreadingdatafromabufferstream,thesystemactuallyreadsthedatafromthebuffer.Whenthebufferisempty,thesystemwillautomaticallyreaddatafromtherelevantdeviceandreadasmuchdataaspossibletofillthebuffer.

Modeldescription

Wetrytosummarizeanddescribethedataflowmodelfromthreedifferentaspects:datacollection,dataattributes,andcalculationtypes.Infact,manyarticleshaveproposedavarietyofdataflowmodels.Wedidnotincludeallthesemodels,butsummarizedandclassifiedthemoreimportantandcommonones.

Formalization

Thefollowingisaformaldescriptionofthedataflow.

Considervectorα,itsattributedomainis[1..n](rankisn),andthestateofvectorαattimet

α(t)=

Attimes,αisazerovector,thatis,αi(s)=0foralli.Theupdateofeachcomponentofthevectorisintheformofastreamoftwo-tuples.Thatis,thetthupdateis(i,ct),whichmeansthatαi(t)=αi(t.1)+ct,andfori.=.i,αi.(t)=αi.(t.1).Thequerythatoccursattimetisforα(t).

Datacollection

Wefirstconsiderwhatdataisincludedinthecalculationrangewhenperformingdataflowcalculations.Regardingthisissue,therearemainlythreedifferentmodels:datastreammodel,slidingwindowmodelandn-of-Nmodel.

Datastreammodel(datastreammodel)Inthedatastreammodel,alldatafromacertaintimemustbeincludedinthecalculationrange.Atthistime,s=0,thatis,attime0,αisa0vector.Thatis,thisistheoriginalandmostcommonmodelofdataflow.

Slidingwindowmodel(computingthemostrecentNdata)Theslidingwindowmodelmeansthat,countingfromthetimeofcalculation,theforwardNdatamustbeincludedinthecalculationrange.Atthistime,s=t.N,thatis,attimet.N,αisazerovector.Inotherwords,tocalculatethemostrecentNdata.Sincethedataofthedatastreamisconstantlyemerging,sointuitively,thismodeislikeusingaconstantwindow,thedatapassesthroughthewindowwiththepassageoftime,andthedatainthewindowisthecalculateddataset.M.Dataretal.[91]firstproposedthismodel,andthenreceivedawiderangeofresponses[92].

n-of-Nmodel(calculatethemostrecentndata,amongwhich0

dataattributes

Characteristicsofthedataitself:

Timeseries(timeseriesmodel)Thedatacomesintheorderofitsattributes(actuallytime).Inthiscase,i=t,thatis,anupdateattimetis(t,ct).Atthistime,αTheupdateoperationisαt(t)=ct,andfori.=.t,αi.(t)=αi.(t.1).Thismodelissuitablefortimeseriesdata,suchastheoutgoingdataofaspecificIP,Orperiodicupdatedataofstocks,etc..

Cashregistermodel(cashregistermodel)Thedataofthesameattributeisadded,andthedataispositive.Inthismodel,ct>=0.ThismeansForalliandt,αi(t)isalwaysnotlessthanzeroandisincreasing.Infact,thismodelisconsideredtobethemostcommonlyused,forexample,itcanbeusedforcashregister(cashregister)Themodelgetsitsname),thenetworktransmissionvolumeofeachIP,themonitoringofthecalldurationofmobilephoneusers,andsoon.

Theturnstilemodel(turnstilemodel)Thedataofthesameattributeisadded,andthedataispositiveorNegative.Inthismodel,ctcanbegreaterthan0orlessthan0.Thisisthemostcommonmodel.S.Muthukrishnan[89]calleditaturnstilemodelbecausethefunctionofthismodelislikethecrossofasubwaystation.Turnstilescanbeusedtocalculatehowmanypeoplehavearrivedandleft,andthusthenumberofpeopleinthesubway.

Calculationtypes

Thecalculationofdatastreamdatacanbedividedintotwocategories:Basiccalculationsandcomplexcalculations.Basiccalculationsmainlyincludepointquery,rangequeryandinnerproductquery.Complexcalculationsincludequantilecalculation,frequentitemcalculation,anddatamining.

Pointqueryreturnsthevalueofαi(t).

RangequeryForrangequeryQ(f,t),return

t

.αi(t)

i=f

InnerproductForvectorβ,theinnerproductofαandβ

α.β=Σni=1αi(t)βi

Quantile(Quantile)Givenasequencenumberr,returnthevaluev,andensurethattherealrankrofvinαmeetsthefollowingrequirements:

r.εN≤r.≤r+εN

Amongthem,εistheaccuracy,N=Σni=1αi(t).

GSMankuetc.[94]providesaframeworkstructureforapproximateestimationofquantilesthroughascan,andtreatsthedatasetasthenodesofthetree.Thesenodeshavedifferentweights(suchasthenumberofdatacontainedinthenode).Itisbelievedthatallquantileestimationalgorithmscanbeconsideredtobecomposedofthreeoperationsonnodestogeneratenewnodes(NEW),merge(COLLAPSE)andoutput(OUTPUT).Differentstrategiesconstitutedifferenttypesoftrees.Thisframeworkstructurebecamethebasisofmanysubsequentquantileestimationalgorithms.

FrequentitemsaresometimescalledHeavyhitters,whichmeansfindingitemsthatfrequentlyappearinthedatastream.Inthiscalculation,actuallyletct=1.Inthisway,αi(t)storesthearrivalfrequencyofdatawhosedimensionvalueisequaltoiasoftimet.Thequeryofthesedatacanbedividedintotwotypes:

Findthefirstkmostfrequentlyoccurringitems

Findallitemswithafrequencygreaterthan1/k

>

Theresearchonthefrequencytermmainlyfocusesonthelattercalculation[95].

MiningMiningofdatastreamdatainvolvesmorecomplexcalculations.Researchinthisareaincludes:multidimensionalanalysis[96],classificationanalysis[97,98],clusteranalysis[99–102],andotherone-passalgorithms[103].

Relatedideas

Introduction

Themaindifficultyindatastreamprocessingishowtocontrolthespacespentstoringdatawithinacertainrange.Althoughthequestionofqueryresponsetimeisalsoimportant,itisrelativelyeasytosolve.Asahotspotintheresearchfield,datastreamprocessinghasbeenextensivelystudied,andmanyalgorithmshaveemerged.

Oneideatosolvethecontradictionbetweenthehugeamountofdatainthedatastreamandthelimitedstoragespaceistousesampling.AnotherideaistoconstructasmalldatastructurethatcanprovideapproximateresultstostorecompressedDatastreamdata,thisstructurecanbestoredinmemory.Sketch,histogram,andwaveletareactuallythemostimportantthreeofsuchdatastructures.

Infact,mostoftheabovemethodshavebeenusedinthefieldoftraditionaldatabases.Theproblemishowtoapplythemtothespecialenvironmentofdataflow.

Randomsampling

Randomsamplingcancapturethebasiccharacteristicsofadatasetbydrawingasmallnumberofsamples.Averycommonandsimplemethodisuniformsampling.Asanalternativesamplingmethod,strati.edsamplingcanreduceerrorscausedbyunevendatadistribution.However,forcomplexanalysis,ordinarysamplingalgorithmsstillrequiretoomuchspace.

Forsomespecialcalculationsofdatastreams,someinterestingsamplingalgorithmshaveappeared.Stickysampling[95]isusedforthecalculationoffrequentitems.ThemethodofstickysamplingistostorethesetSformedbythetwo-tuple(i,f)inthememory.Foreachpieceofdatathatcomes,ifthekeyialreadyexistsinS,thecorrespondingfisincreasedby1;otherwise,Samplingisperformedwithaprobabilityof1r.Ifthisitemisselected,agroup(i,1)isaddedtoS;afteraperiodoftime,thegroupinSisscannedonceandthevalueisupdated.Thenincreasethevalueofr;attheend(ortheuserrequeststheresult),outputallgroupsoff.(s-e)N.

Thedistinctsampling[104]proposedbyP.Gibbonsisusedfordistinctcounting,thatis,tofindthenumberofdifferentvalues​​inthedatastream.Itusesahashfunctiontomapeachdifferentvaluethatarrivestoleveliwithaprobabilityof2.(i+1);ifi≥memorylevelL(theinitialvalueofLis0),addittomemory,Otherwisediscard;whenthememoryisfull,deletethevalueoflevelLinthememory,andadd1toL;thefinalestimateofthedistinctcountisthedifferentvalueinthememorymultipliedby2L.Distinctcountingisanoldproblemindatabaseprocessing.Theadvantageofthisalgorithmisthatbysettingappropriateparameters,itcanbeappliedtoquerieswithpredicates(thatis,distinctcountingisperformedonasubsetofthedatastream).

Thedisadvantageofsamplingalgorithmsisthattheyarenotsensitiveenoughtoabnormaldata.Moreover,eveniftheycanbewellappliedtocommondataflowmodels,theyneedtobemodifiediftheyaretobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].

Sketchingofstructure

Sketchingreferstotheuseofrandomprojectionstoprojectthedatastreamintoasmallstoragespaceasasummaryoftheentiredatastream.Thesummarydatastoredinspaceiscalledathumbnail,whichcanbeusedtoapproximateanswerstospecificqueries.DifferentsketchescanbeusedtoestimatedifferentLpnormsofthedatastream,andtheseLpnormscanbeusedtoanswerothertypesofqueries.Forexample,theL0normcanbeusedtoestimatedistinctcountsofdatastreams;theL1normcanbeusedtocalculatequantilesandfrequentitems;theL2normcanbeusedtoestimatethelengthofself-connections,andsoon.

TheconceptofsketcheswasfirstproposedbyN.Alonin[105].Sincethen,varioussketchesandtheirconstructionalgorithmshavecontinuouslyemerged.

TherandomizedstechingproposedbyN.Alonin[105]canbeusedfortheestimationofdifferentLpnorms,andrequiresatmostO(n1.lgn)space.ThemoreimportantcontributionofthispaperisthatitcanalsoestimateL2withaspacerequirementofO(logn+logt).ItsmainideaistouseahashfunctiontoconsistentlyandrandomlymapeachelementinthedomainDofthedataattributetozi∈{.1+1},sothattherandomvariableX=.iαizi,X2canbeusedasEstimateofL2norm.

p1

ThequantilesketchproposedbyS.Guhaetal.[88]maintainsasetofdatastructureslike(vi,gi,Δi),rmax(vi)andrmin(vi)arethemaximumandminimumpossiblerankingsofvi,respectively.Fori>j:

vi>vj

gi=rmin(vi).Rmin(vi.1)

Δi=rmax(vi).rmin(vi)

Withthearrivalofthedata,updatetheoutlineaccordinglytokeeptheestimationwithinacertainaccuracy.X.Linetal.[93]gaveamoreformaldescriptionofthisproblem.

IfASisarandomsetextractedfrom[1..n],theprobabilityofeachelementbeingextractedis1/2.A.Gilbertetal.[106]constructseveralASs,andcallthesumofelementvalues​​ineachsetarandomsum.Multiplerandomsumsmakeupasketch.Theestimationofαiis

2E(||AS|||αi∈AS).||A||,where||A||isthesumofallthenumbersinthedatastream.Therefore,thiskindofthumbnailcanbeusedtoestimatetheresultofapointquery.Usingmultiplesuchthumbnailscanbeusedforestimationrangequery,quantilequery,etc.Thesketchingtechniqueisactuallytheresultofatrade-offbetweenspaceandaccuracy.InordertoensurethattheerrorofthepointqueryresultislessthanεN,thespacerequiredfortheabovesketchisusuallyε.2asthecoefficient.Incomparisonwiththis,theCount-MinSketchproposedbyG.Cormodeetal.[19]onlyneedsspacefortheε.1coefficient.Theideaisalsorelativelysimple.Useseveralhashfunctionstoprojectseparatedatastreamsontomultiplesmallthumbnails.Whenansweringapointquery,eachthumbnailisansweredseparately,andthesmallestvalueisselectedastheanswer.Basedonpointquery,count-minimumoutlinecanbeusedforvariousotherqueriesandcomplexcalculations.Thecount-minimalsketchdoesnotcalculatetheLpnorm,butdirectlycalculatestheresultofthepointquery,whichisoneofthereasonswhyitsspace-timeefficiencyishigherthanothersketches.

Histogram

Thehistogram(histogram)hastwomeanings:oneisahistogramintheordinarysense,whichisavisualmeansfordisplayingapproximatestatistics;inaddition,itItisalsoadatastructure/methodthatcapturestheapproximatedistributionofdata.Whenappearingasthelatter,thehistogramisconstructedlikethis:thedataisdividedintomultipledisjointsubsets(calledbuckets)accordingtoitsattributes,andthevalues​​inthebucketsareapproximatedinaunifiedway[107].

Thehistogrammethodismainlyusedforsignalprocessing,statistics,imageprocessing,computervisionanddatabase.Inthedatabasefield,thehistogramwasoriginallymainlyusedforselectivityestimation,forselectionqueryoptimizationandapproximatequeryprocessing.Histogramisoneofthesimplestandmostflexibleapproximateprocessingmethods,anditisalsothemosteffectiveone.Aslongasthedataupdateproblemissolved,theoriginalhistogramcanbeusedindatastreamprocessing.Thistypeofhistogramthatisautomaticallyadjustedaccordingtothenewdataiscalledadynamic(oradaptive/self-adjusting)histogram.

ThehistogramproposedbyL.Fuetal.[108]ismainlyusedforthecalculationofthemedianfunction(Median)andotherquantilefunctions.Itcanbeusedforapproximatecalculationsandaccuratequeries.ItusesDeterministicBucketingandRandomizedBucketingtechnologiestoconstructmultiplebucketswithdifferentprecisions,andthendividetheinputdataintothesebucketsstepbystep,thuscompletingthedynamichistogramstructure.

Becauseitisdifficulttodirectlyapplystatichistogramstodatastreamprocessing.S.Guhaetal.[88]candynamicallyconstructnear-optimalV-optimalhistograms,buttheycanonlybeappliedtodatastreamsundertimeseriesmodels.

Acommonlyusedmethodistodividetheentirealgorithmintotwosteps:firstconstructasketchofthedataflowdata;thenconstructasuitablehistogramfromthissketch.Thismethodcantakeadvantageoftheeasyupdateofthethumbnaildataandrealizethedynamicsofthehistogram.N.Thaperetal.[109]firstconstructedasketchthatapproximatelyreflectsthedatastreamdata,andusedtheexcellentupdateperformanceofthesketchtoupdatethedata,andthenderivedahistogramfromthissketchtoapproximatethedatastreamdata.SincederivingthebesthistogramfromthesketchisanNP-hardproblem,theauthorprovidesaheuristicalgorithm(greedyalgorithm)tosearchforabetterhistogram.

A.Gilbertetal.[110]constructedasummarydatastructurethatusesasetofrandomandstructuresimilartothoseintheliterature[106]tostorethevalues​​ofdyadicintervalatdifferentgranularitylevels.Subsequently,thedyadicinterval([111])ofdifferentgranularitylevelsisaddedtothehistogramtobeconstructedfromlargetosmall,soastominimizetheapproximateerror(refinement).

A.Gilbertetal.[112]mainlyconsideredhowtoreducetheprocessingcomplexityofeachinputdatainthedatastream.Theyfirstconvertedtheinputdataintowaveletcoefficients(usingthewaveletcoefficientsastheinnerproductofthesignalandthebasisvector),andthenadoptedadyadicintervalprocessingmethodsimilartotheliterature[110].Thesketchiscloselyrelatedtothehistogram.Fromacertainperspective,thehistogramcanberegardedasaspecialcaseofthesketch.

WaveletTransformation

Wavelettransformation(wavelettransformation)isoftenusedtogeneratesummaryinformationofdata.Thisisbecauseusuallyonlyasmallpartofthewaveletcoefficientsisimportant,andmostofthecoefficientsareeitherverysmallorunimportant.Therefore,ifyouignoretheunimportantcoefficientsgeneratedbythedataafterthewavelettransform,youcanuseverylittlespacetocompletetheapproximationoftheoriginaldata.

Y.Matiasetal.firstconstructedahistogramforthedatastreamdataandsimulateditwithwavelet.Subsequently,someofthemostimportantwaveletcoefficientsareretainedtosimulatethehistogram.Whennewdataappears,thehistogramisupdatedbyupdatingthesewaveletcoefficients.

Whattheliteratureproposesisactuallyahistogrammethod,butituseswavelettransform.A.Gilbertetal.pointedoutthatthewavelettransformcanbeconsideredastheinnerproductofasignalandasetoforthogonalvectorsoflengthN.Therefore,asetofdatastreamdataoutlinesareconstructed.Becausetheoutlinescancalculatethesignalandasetofdataeasilyandaccurately.Theinnerproductofthegroupvectorcanthenbeusedtocalculatethewaveletcoefficientsfromthesketch,whichcanbeusedforpointqueryandrangequeryestimation.

NewTrends

Researchershavecontinuedtodeepentheirresearchondatastreamprocessing.Webelievethatthefollowingnewtrendshaveemerged:

Futuresketches

b>

Introducemorestatistics

Calculationtechniquestoconstructsketches

G.Cormodeandothersmainlydealwiththecalculationoffrequentitems.Itisbasedonthepreviousmajoritemalgorithm([116,117])anduseserror-correctingcodestodealwithproblems.Forexample,acounterissetupforeachbitofthedata,andthenthefrequentitemsetisinferredbasedonthecountingresultsofthesecounters.

Y.Taoetal.[118]isessentiallyanapplicationofProbabilisticcounting(distinctcountingthathasbeenwidelyusedinthedatabasefield)indatastreamprocessing.

Expandingthesketchmap

Extendthesketchmaptodealwithmorecomplexqueries.

Linetal.intheliterature[93]constructedacomplexsketchsystemthatcanbeusedtoestimatethequantileoftheslidingwindowmodelandthen-of-Nmodel,whichisdifficulttoachievewithsimplesketches.

Undertheslidingwindowmodel,literature[93]dividesthedataintomultiplebucketsinchronologicalorder,establishesthumbnailsineachbucket(theaccuracyishigherthanrequired),andthencombinesthesethumbnailsduringqueryMerge,wherethelastbucketmayneedtobelifted.Duringmaintenance,onlyexpiredbucketsaredeletedandnewbucketsareadded.

Inthen-of-Nmodel,literature[93]dividesthedataintomultiplebucketsofdifferentsizesaccordingtotheEHPartitioningtechnique,andbuildsasketchineachbucket(theaccuracyishigherthanrequired),Thenmergesomeofthethumbnailsduringthequerytoensuretherequiredaccuracy,andthelastonemayneedtobeimproved.

Combinespatiotemporaldata

Furthercombinationwithspatiotemporaldataprocessing:

J.Sunetal.[120]Mainlyforhistoricalqueryandpredictionprocessingofspatio-temporaldata.However,thearticleemphasizesthatspatio-temporaldataappearsintheformofdatastreams,andtheprocessingalsofocusesmoreontheupdateperformanceofspatio-temporaldata.

Y.Taoetal.[118]usethedatastreammethodtoprocessspatio-temporaldata.Byconstructingasketchofthedynamicspatio-temporaldata,itisusedtodistinguishwhethertheobjectismovingorstationaryamongmultipleregions,andestimateItsnumber.Butthiskindofproblemisdifficulttosolveintheoriginaltimeandspaceprocessing.

Novelgenre

Thedatastreamofonlinenovelsisanemerginggenre,whichmeansthattheprotagonist'sstrengthisdigitized,andthedatadisplayedisthesameastheattributebarofonlinegames.

Related Articles
TOP