BIG DATABASE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

BIG DATABASE

Yaya Bamba
Hello,

Is it possible with R to work on a big database without loading it? If yes,
how can I do do it?

Thanks.

--
Yaya BAMBA

Elève Ingénieur Statisticien Economiste (ISE)

Ecole Nationale Supérieure de Statistique et d'Economie Appliquée (ENSEA),
Abidjan (Côte d'Ivoire)

Tél: +225 87 89 76 89

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Andres Diaz
Hello Yaya,

Many years ago I work with a database in MySQL connected to R through the
package RMySQL​. The data was stored in the MySQL and I was connecting and
using the data from R

you should have a look in:

https://cran.r-project.org/web/packages/RMySQL/index.html

Cheers,

Andres

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Yaya Bamba
Thanks to all of you. I will try with the package  RMySQL and see.

2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:

> Hello Yaya,
>
> Many years ago I work with a database in MySQL connected to R through the
> package RMySQL​. The data was stored in the MySQL and I was connecting and
> using the data from R
>
> you should have a look in:
>
> https://cran.r-project.org/web/packages/RMySQL/index.html
>
> Cheers,
>
> Andres
>



--
Yaya BAMBA

Elève Ingénieur Statisticien Economiste (ISE)

Ecole Nationale Supérieure de Statistique et d'Economie Appliquée (ENSEA),
Abidjan (Côte d'Ivoire)

Tél: +225 87 89 76 89

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Yaya Bamba
 Elisa Rose , I just wan to use some variables from a database that is
huge, without loading it, for my computer doesn't have much memory capacity.

2018-05-24 11:45 GMT+00:00 Yaya Bamba <[hidden email]>:

> Thanks to all of you. I will try with the package  RMySQL and see.
>
> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
>
>> Hello Yaya,
>>
>> Many years ago I work with a database in MySQL connected to R through the
>> package RMySQL​. The data was stored in the MySQL and I was connecting and
>> using the data from R
>>
>> you should have a look in:
>>
>> https://cran.r-project.org/web/packages/RMySQL/index.html
>>
>> Cheers,
>>
>> Andres
>>
>
>
>
> --
> Yaya BAMBA
>
> Elève Ingénieur Statisticien Economiste (ISE)
>
> Ecole Nationale Supérieure de Statistique et d'Economie Appliquée (ENSEA),
> Abidjan (Côte d'Ivoire)
>
> Tél: +225 87 89 76 89
>



--
Yaya BAMBA

Elève Ingénieur Statisticien Economiste (ISE)

Ecole Nationale Supérieure de Statistique et d'Economie Appliquée (ENSEA),
Abidjan (Côte d'Ivoire)

Tél: +225 87 89 76 89

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Roger Bivand
Administrator
In reply to this post by Yaya Bamba
On Thu, 24 May 2018, Yaya Bamba wrote:

> Thanks to all of you. I will try with the package  RMySQL and see.

Maybe look more generally through the packages depending on and importing
from DBI (https://cran.r-project.org/package=DBI) to see what is available
- there are many more than RMySQL.

and use the Official Statistics and HPC Task Views:

https://cran.r-project.org/view=OfficialStatistics

https://cran.r-project.org/view=HighPerformanceComputing

to see how typical workflows (not necessarily DB-based) can be handled.
The HPC TV has a section on large memory and out-of-memory approaches. If
your data are spatial in raster format, the raster package provides some
out-of-memory functionality. In sf, spatial vector data may be read from
databases too.

Roger

>
> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
>
>> Hello Yaya,
>>
>> Many years ago I work with a database in MySQL connected to R through the
>> package RMySQL​. The data was stored in the MySQL and I was connecting and
>> using the data from R
>>
>> you should have a look in:
>>
>> https://cran.r-project.org/web/packages/RMySQL/index.html
>>
>> Cheers,
>>
>> Andres
>>
>
>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Tom Philippi
What Roger said (as always).

Note that if you use tidyverse and magrittr, dplyr and tidyverse tools work
well with databases via DBI.  sqldf also works with multiple SQL database
backends if you're an ol dog like me and don't use tidyverse much.

Also, since this is r-sig-*GEO*, note that postgreSQL has postGIS for
spatial data, which does far more than the automatic tiling of large
rasters in package raster.  I'm seeing wonderful performance working with a
340M observation >100GB dataset of bird observation data in R via postGIS,
even with "only" 32GB RAM and constrained to running win7, not linux/unix.

One alternative is that if your database is running on massive hardware
(tons of memory, many cores, etc.), it is possible to run R within both
postgreSQL and now MS SQL Server, the first free, the second an additional
cost add-on, and both usually at the cost of painful negotiations with DA
administrators for permissions to run your ad hoc R code on their SQL
server.  If you have the hardware, you can even run R with hadoop, although
I've never done that with spatial data.

Tom 0


On Thu, May 24, 2018 at 5:04 AM, Roger Bivand <[hidden email]> wrote:

> On Thu, 24 May 2018, Yaya Bamba wrote:
>
> Thanks to all of you. I will try with the package  RMySQL and see.
>>
>
> Maybe look more generally through the packages depending on and importing
> from DBI (https://cran.r-project.org/package=DBI) to see what is
> available - there are many more than RMySQL.
>
> and use the Official Statistics and HPC Task Views:
>
> https://cran.r-project.org/view=OfficialStatistics
>
> https://cran.r-project.org/view=HighPerformanceComputing
>
> to see how typical workflows (not necessarily DB-based) can be handled.
> The HPC TV has a section on large memory and out-of-memory approaches. If
> your data are spatial in raster format, the raster package provides some
> out-of-memory functionality. In sf, spatial vector data may be read from
> databases too.
>
> Roger
>
>
>
>> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
>>
>> Hello Yaya,
>>>
>>> Many years ago I work with a database in MySQL connected to R through the
>>> package RMySQL​. The data was stored in the MySQL and I was connecting
>>> and
>>> using the data from R
>>>
>>> you should have a look in:
>>>
>>> https://cran.r-project.org/web/packages/RMySQL/index.html
>>>
>>> Cheers,
>>>
>>> Andres
>>>
>>>
>>
>>
>>
>>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Javier Moreira
Can I use this answer to ask exactly for what it's mentioned.
R and Postgis mostly for Easter files.
Can you point books, online courses, tutorials, GitHub pages, anything, to
better understand this?
I had been struggling to find info.

Thanks!

El vie., 25 may. 2018 1:35, Tom Philippi <[hidden email]> escribió:

> What Roger said (as always).
>
> Note that if you use tidyverse and magrittr, dplyr and tidyverse tools work
> well with databases via DBI.  sqldf also works with multiple SQL database
> backends if you're an ol dog like me and don't use tidyverse much.
>
> Also, since this is r-sig-*GEO*, note that postgreSQL has postGIS for
> spatial data, which does far more than the automatic tiling of large
> rasters in package raster.  I'm seeing wonderful performance working with a
> 340M observation >100GB dataset of bird observation data in R via postGIS,
> even with "only" 32GB RAM and constrained to running win7, not linux/unix.
>
> One alternative is that if your database is running on massive hardware
> (tons of memory, many cores, etc.), it is possible to run R within both
> postgreSQL and now MS SQL Server, the first free, the second an additional
> cost add-on, and both usually at the cost of painful negotiations with DA
> administrators for permissions to run your ad hoc R code on their SQL
> server.  If you have the hardware, you can even run R with hadoop, although
> I've never done that with spatial data.
>
> Tom 0
>
>
> On Thu, May 24, 2018 at 5:04 AM, Roger Bivand <[hidden email]> wrote:
>
> > On Thu, 24 May 2018, Yaya Bamba wrote:
> >
> > Thanks to all of you. I will try with the package  RMySQL and see.
> >>
> >
> > Maybe look more generally through the packages depending on and importing
> > from DBI (https://cran.r-project.org/package=DBI) to see what is
> > available - there are many more than RMySQL.
> >
> > and use the Official Statistics and HPC Task Views:
> >
> > https://cran.r-project.org/view=OfficialStatistics
> >
> > https://cran.r-project.org/view=HighPerformanceComputing
> >
> > to see how typical workflows (not necessarily DB-based) can be handled.
> > The HPC TV has a section on large memory and out-of-memory approaches. If
> > your data are spatial in raster format, the raster package provides some
> > out-of-memory functionality. In sf, spatial vector data may be read from
> > databases too.
> >
> > Roger
> >
> >
> >
> >> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
> >>
> >> Hello Yaya,
> >>>
> >>> Many years ago I work with a database in MySQL connected to R through
> the
> >>> package RMySQL​. The data was stored in the MySQL and I was connecting
> >>> and
> >>> using the data from R
> >>>
> >>> you should have a look in:
> >>>
> >>> https://cran.r-project.org/web/packages/RMySQL/index.html
> >>>
> >>> Cheers,
> >>>
> >>> Andres
> >>>
> >>>
> >>
> >>
> >>
> >>
> > --
> > Roger Bivand
> > Department of Economics, Norwegian School of Economics,
> > Helleveien 30, N-5045 Bergen, Norway.
> > voice: +47 55 95 93 55; e-mail: [hidden email]
> > http://orcid.org/0000-0003-2392-6140
> > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Roger Bivand
Administrator
On Fri, 25 May 2018, Javier Moreira wrote:

> Can I use this answer to ask exactly for what it's mentioned.
> R and Postgis mostly for Easter files.
> Can you point books, online courses, tutorials, GitHub pages, anything, to
> better understand this?
> I had been struggling to find info.

For rpostgis, see:

https://journal.r-project.org/archive/2018/RJ-2018-025/index.html

and the supplementary material linked there to replicate the results in
the online article (should be in the 2018-1 issue).

Roger

>
> Thanks!
>
> El vie., 25 may. 2018 1:35, Tom Philippi <[hidden email]> escribió:
>
>> What Roger said (as always).
>>
>> Note that if you use tidyverse and magrittr, dplyr and tidyverse tools work
>> well with databases via DBI.  sqldf also works with multiple SQL database
>> backends if you're an ol dog like me and don't use tidyverse much.
>>
>> Also, since this is r-sig-*GEO*, note that postgreSQL has postGIS for
>> spatial data, which does far more than the automatic tiling of large
>> rasters in package raster.  I'm seeing wonderful performance working with a
>> 340M observation >100GB dataset of bird observation data in R via postGIS,
>> even with "only" 32GB RAM and constrained to running win7, not linux/unix.
>>
>> One alternative is that if your database is running on massive hardware
>> (tons of memory, many cores, etc.), it is possible to run R within both
>> postgreSQL and now MS SQL Server, the first free, the second an additional
>> cost add-on, and both usually at the cost of painful negotiations with DA
>> administrators for permissions to run your ad hoc R code on their SQL
>> server.  If you have the hardware, you can even run R with hadoop, although
>> I've never done that with spatial data.
>>
>> Tom 0
>>
>>
>> On Thu, May 24, 2018 at 5:04 AM, Roger Bivand <[hidden email]> wrote:
>>
>>> On Thu, 24 May 2018, Yaya Bamba wrote:
>>>
>>> Thanks to all of you. I will try with the package  RMySQL and see.
>>>>
>>>
>>> Maybe look more generally through the packages depending on and importing
>>> from DBI (https://cran.r-project.org/package=DBI) to see what is
>>> available - there are many more than RMySQL.
>>>
>>> and use the Official Statistics and HPC Task Views:
>>>
>>> https://cran.r-project.org/view=OfficialStatistics
>>>
>>> https://cran.r-project.org/view=HighPerformanceComputing
>>>
>>> to see how typical workflows (not necessarily DB-based) can be handled.
>>> The HPC TV has a section on large memory and out-of-memory approaches. If
>>> your data are spatial in raster format, the raster package provides some
>>> out-of-memory functionality. In sf, spatial vector data may be read from
>>> databases too.
>>>
>>> Roger
>>>
>>>
>>>
>>>> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
>>>>
>>>> Hello Yaya,
>>>>>
>>>>> Many years ago I work with a database in MySQL connected to R through
>> the
>>>>> package RMySQL​. The data was stored in the MySQL and I was connecting
>>>>> and
>>>>> using the data from R
>>>>>
>>>>> you should have a look in:
>>>>>
>>>>> https://cran.r-project.org/web/packages/RMySQL/index.html
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Andres
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: [hidden email]
>>> http://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: BIG DATABASE

Javier Moreira
Thanks so much!

El vie., 25 may. 2018 9:51, Roger Bivand <[hidden email]> escribió:

> On Fri, 25 May 2018, Javier Moreira wrote:
>
> > Can I use this answer to ask exactly for what it's mentioned.
> > R and Postgis mostly for Easter files.
> > Can you point books, online courses, tutorials, GitHub pages, anything,
> to
> > better understand this?
> > I had been struggling to find info.
>
> For rpostgis, see:
>
> https://journal.r-project.org/archive/2018/RJ-2018-025/index.html
>
> and the supplementary material linked there to replicate the results in
> the online article (should be in the 2018-1 issue).
>
> Roger
>
> >
> > Thanks!
> >
> > El vie., 25 may. 2018 1:35, Tom Philippi <[hidden email]>
> escribió:
> >
> >> What Roger said (as always).
> >>
> >> Note that if you use tidyverse and magrittr, dplyr and tidyverse tools
> work
> >> well with databases via DBI.  sqldf also works with multiple SQL
> database
> >> backends if you're an ol dog like me and don't use tidyverse much.
> >>
> >> Also, since this is r-sig-*GEO*, note that postgreSQL has postGIS for
> >> spatial data, which does far more than the automatic tiling of large
> >> rasters in package raster.  I'm seeing wonderful performance working
> with a
> >> 340M observation >100GB dataset of bird observation data in R via
> postGIS,
> >> even with "only" 32GB RAM and constrained to running win7, not
> linux/unix.
> >>
> >> One alternative is that if your database is running on massive hardware
> >> (tons of memory, many cores, etc.), it is possible to run R within both
> >> postgreSQL and now MS SQL Server, the first free, the second an
> additional
> >> cost add-on, and both usually at the cost of painful negotiations with
> DA
> >> administrators for permissions to run your ad hoc R code on their SQL
> >> server.  If you have the hardware, you can even run R with hadoop,
> although
> >> I've never done that with spatial data.
> >>
> >> Tom 0
> >>
> >>
> >> On Thu, May 24, 2018 at 5:04 AM, Roger Bivand <[hidden email]>
> wrote:
> >>
> >>> On Thu, 24 May 2018, Yaya Bamba wrote:
> >>>
> >>> Thanks to all of you. I will try with the package  RMySQL and see.
> >>>>
> >>>
> >>> Maybe look more generally through the packages depending on and
> importing
> >>> from DBI (https://cran.r-project.org/package=DBI) to see what is
> >>> available - there are many more than RMySQL.
> >>>
> >>> and use the Official Statistics and HPC Task Views:
> >>>
> >>> https://cran.r-project.org/view=OfficialStatistics
> >>>
> >>> https://cran.r-project.org/view=HighPerformanceComputing
> >>>
> >>> to see how typical workflows (not necessarily DB-based) can be handled.
> >>> The HPC TV has a section on large memory and out-of-memory approaches.
> If
> >>> your data are spatial in raster format, the raster package provides
> some
> >>> out-of-memory functionality. In sf, spatial vector data may be read
> from
> >>> databases too.
> >>>
> >>> Roger
> >>>
> >>>
> >>>
> >>>> 2018-05-24 11:33 GMT+00:00 Andres Diaz Loaiza <[hidden email]>:
> >>>>
> >>>> Hello Yaya,
> >>>>>
> >>>>> Many years ago I work with a database in MySQL connected to R through
> >> the
> >>>>> package RMySQL​. The data was stored in the MySQL and I was
> connecting
> >>>>> and
> >>>>> using the data from R
> >>>>>
> >>>>> you should have a look in:
> >>>>>
> >>>>> https://cran.r-project.org/web/packages/RMySQL/index.html
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Andres
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>> --
> >>> Roger Bivand
> >>> Department of Economics, Norwegian School of Economics,
> >>> Helleveien 30, N-5045 Bergen, Norway.
> >>> voice: +47 55 95 93 55; e-mail: [hidden email]
> >>> http://orcid.org/0000-0003-2392-6140
> >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> [hidden email]
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >>>
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> [hidden email]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo